At work, I've been maintaining a perl script that needs to run a number of steps as part of a release workflow.
Initially, that script was very simple, but over time it has grown to do a number of things. And then some of those things did not need to be run all the time. And then we wanted to do this one exceptional thing for this one case. And so on; eventually the script became a big mess of configuration options and unreadable flow, and so I decided that I wanted it to be more configurable. I sat down and spent some time on this, and eventually came up with what I now realize is a domain-specific language (DSL) in JSON, implemented by creating objects in Moose, extensible by writing more object classes.
Let me explain how it works.
In order to explain, however, I need to explain some perl and Moose basics first. If you already know all that, you can safely skip ahead past the "Preliminaries" section that's next.
Preliminaries
Moose object creation, references.
In Moose, creating a class is done something like this:
package Foo;
use v5.40;
use Moose;
has 'attribute' => (
is => 'ro',
isa => 'Str',
required => 1
);
sub say_something {
my $self = shift;
say "Hello there, our attribute is " . $self->attribute;
}
The above is a class that has a single attribute called attribute
.
To create an object, you use the Moose constructor on the class, and
pass it the attributes you want:
use v5.40;
use Foo;
my $foo = Foo->new(attribute => "foo");
$foo->say_something;
(output: Hello there, our attribute is foo
)
This creates a new object with the attribute attribute
set to bar
.
The attribute
accessor is a method generated by Moose, which functions
both as a getter and a setter (though in this particular case we made
the attribute "ro", meaning read-only, so while it can be set at object
creation time it cannot be changed by the setter anymore). So yay, an
object.
And it has methods, things that we set ourselves. Basic OO, all that.
One of the peculiarities of perl is its concept of "lists". Not to be confused with the lists of python -- a concept that is called "arrays" in perl and is somewhat different -- in perl, lists are enumerations of values. They can be used as initializers for arrays or hashes, and they are used as arguments to subroutines. Lists cannot be nested; whenever a hash or array is passed in a list, the list is "flattened", that is, it becomes one big list.
This means that the below script is functionally equivalent to the above script that uses our "Foo" object:
use v5.40;
use Foo;
my %args;
$args{attribute} = "foo";
my $foo = Foo->new(%args);
$foo->say_something;
(output: Hello there, our attribute is foo
)
This creates a hash %args
wherein we set the attributes that we want
to pass to our constructor. We set one attribute in %args
, the one
called attribute
, and then use %args
and rely on list flattening to
create the object with the same attribute set (list flattening turns a
hash into a list of key-value pairs).
Perl also has a concept of "references". These are scalar values that point to other values; the other value can be a hash, a list, or another scalar. There is syntax to create a non-scalar value at assignment time, called anonymous references, which is useful when one wants to remember non-scoped values. By default, references are not flattened, and this is what allows you to create multidimensional values in perl; however, it is possible to request list flattening by dereferencing the reference. The below example, again functionally equivalent to the previous two examples, demonstrates this:
use v5.40;
use Foo;
my $args = {};
$args->{attribute} = "foo";
my $foo = Foo->new(%$args);
$foo->say_something;
(output: Hello there, our attribute is foo
)
This creates a scalar $args
, which is a reference to an anonymous
hash. Then, we set the key attribute
of that anonymous hash to bar
(note the use arrow operator here, which is used to indicate that we
want to dereference a reference to a hash), and create the object using
that reference, requesting hash dereferencing and flattening by using a
double sigil, %$
.
As a side note, objects in perl are references too, hence the fact that we have to use the dereferencing arrow to access the attributes and methods of Moose objects.
Moose attributes don't have to be strings or even simple scalars. They can also be references to hashes or arrays, or even other objects:
package Bar;
use v5.40;
use Moose;
extends 'Foo';
has 'hash_attribute' => (
is => 'ro',
isa => 'HashRef[Str]',
predicate => 'has_hash_attribute',
);
has 'object_attribute' => (
is => 'ro',
isa => 'Foo',
predicate => 'has_object_attribute',
);
sub say_something {
my $self = shift;
if($self->has_object_attribute) {
$self->object_attribute->say_something;
}
$self->SUPER::say_something unless $self->has_hash_attribute;
say "We have a hash attribute!"
}
This creates a subclass of Foo
called Bar
that has a hash
attribute called hash_attribute
, and an object attribute called
object_attribute
. Both of them are references; one to a hash, the
other to an object. The hash ref is further limited in that it requires
that each value in the hash must be a string (this is optional but can
occasionally be useful), and the object ref in that it must refer to an
object of the class Foo
, or any of its subclasses.
The predicates
used here are extra subroutines that Moose provides if
you ask for them, and which allow you to see if an object's attribute
has a value or not.
The example script would use an object like this:
use v5.40;
use Bar;
my $foo = Foo->new(attribute => "foo");
my $bar = Bar->new(object_attribute => $foo, attribute => "bar");
$bar->say_something;
(output: Hello there, our attribute is foo
)
This example also shows object inheritance, and methods implemented in child classes.
Okay, that's it for perl and Moose basics. On to...
Moose Coercion
Moose has a concept of "value coercion". Value coercion allows you to tell Moose that if it sees one thing but expects another, it should convert is using a passed subroutine before assigning the value.
That sounds a bit dense without example, so let me show you how it
works. Reimaginging the Bar
package, we could use coercion to
eliminate one object creation step from the creation of a Bar
object:
package "Bar";
use v5.40;
use Moose;
use Moose::Util::TypeConstraints;
extends "Foo";
coerce "Foo",
from "HashRef",
via { Foo->new(%$_) };
has 'hash_attribute' => (
is => 'ro',
isa => 'HashRef',
predicate => 'has_hash_attribute',
);
has 'object_attribute' => (
is => 'ro',
isa => 'Foo',
coerce => 1,
predicate => 'has_object_attribute',
);
sub say_something {
my $self = shift;
if($self->has_object_attribute) {
$self->object_attribute->say_something;
}
$self->SUPER::say_something unless $self->has_hash_attribute;
say "We have a hash attribute!"
}
Okay, let's unpack that a bit.
First, we add the Moose::Util::TypeConstraints
module to our package.
This is required to declare coercions.
Then, we declare a coercion to tell Moose how to convert a HashRef
to
a Foo
object: by using the Foo
constructor on a flattened list
created from the hashref that it is given.
Then, we update the definition of the object_attribute
to say that it
should use coercions. This is not the default, because going through the
list of coercions to find the right one has a performance penalty, so if
the coercion is not requested then we do not do it.
This allows us to simplify declarations. With the updated Bar
class,
we can simplify our example script to this:
use v5.40;
use Bar;
my $bar = Bar->new(attribute => "bar", object_attribute => { attribute => "foo" });
$bar->say_something
(output: Hello there, our attribute is foo
)
Here, the coercion kicks in because the value object_attribute
, which
is supposed to be an object of class Foo
, is instead a hash ref.
Without the coercion, this would produce an error message saying that
the type of the object_attribute
attribute is not a Foo
object. With
the coercion, however, the value that we pass to object_attribute
is
passed to a Foo constructor using list flattening, and then the
resulting Foo
object is assigned to the object_attribute
attribute.
Coercion works for more complicated things, too; for instance, you can use coercion to coerce an array of hashes into an array of objects, by creating a subtype first:
package MyCoercions;
use v5.40;
use Moose;
use Moose::Util::TypeConstraints;
use Foo;
subtype "ArrayOfFoo", as "ArrayRef[Foo]";
subtype "ArrayOfHashes", as "ArrayRef[HashRef]";
coerce "ArrayOfFoo", from "ArrayOfHashes", via { [ map { Foo->create(%$_) } @{$_} ] };
Ick. That's a bit more complex.
What happens here is that we use the map
function to iterate over a
list of values.
The given list of values is @{$_}
, which is perl for "dereference the
default value as an array reference, and flatten the list of values in
that array reference".
So the ArrayRef
of HashRef
s is dereferenced and flattened, and each
HashRef
in the ArrayRef is passed to the map
function.
The map function then takes each hash ref in turn and passes it to the
block of code that it is also given. In this case, that block is
{ Foo->create(%$_) }
. In other words, we invoke the create
factory
method with the flattened hashref as an argument. This returns an object
of the correct implementation (assuming our hash ref has a type
attribute set), and with all attributes of their object set to the
correct value. That value is then returned from the block (this could be
made more explicit with a return
call, but that is optional, perl
defaults a return value to the rvalue of the last expression in a
block).
The map
function then returns a list of all the created objects, which
we capture in an anonymous array ref (the []
square brackets), i.e.,
an ArrayRef of Foo object, passing the Moose requirement of
ArrayRef[Foo]
.
Usually, I tend to put my coercions in a special-purpose package. Although it is not strictly required by Moose, I find that it is useful to do this, because Moose does not allow a coercion to be defined if a coercion for the same type had already been done in a different package. And while it is theoretically possible to make sure you only ever declare a coercion once in your entire codebase, I find that doing so is easier to remember if you put all your coercions in a specific package.
Okay, now you understand Moose object coercion! On to...
Dynamic module loading
Perl allows loading modules at runtime. In the most simple case, you just use require inside a stringy eval:
my $module = "Foo";
eval "require $module";
This loads "Foo" at runtime. Obviously, the $module string could be a computed value, it does not have to be hardcoded.
There are some obvious downsides to doing things this way, mostly in the fact that a computed value can basically be anything and so without proper checks this can quickly become an arbitrary code vulnerability. As such, there are a number of distributions on CPAN to help you with the low-level stuff of figuring out what the possible modules are, and how to load them.
For the purposes of my script, I used Module::Pluggable. Its API is fairly simple and straightforward:
package Foo;
use v5.40;
use Moose;
use Module::Pluggable require => 1;
has 'attribute' => (
is => 'ro',
isa => 'Str',
);
has 'type' => (
is => 'ro',
isa => 'Str',
required => 1,
);
sub handles_type {
return 0;
}
sub create {
my $class = shift;
my %data = @_;
foreach my $impl($class->plugins) {
if($impl->can("handles_type") && $impl->handles_type($data{type})) {
return $impl->new(%data);
}
}
die "could not find a plugin for type " . $data{type};
}
sub say_something {
my $self = shift;
say "Hello there, I am a " . $self->type;
}
The new concept here is the plugins
class method, which is added by
Module::Pluggable
, and which searches perl's library paths for all
modules that are in our namespace. The namespace is configurable, but by
default it is the name of our module; so in the above example, if there
were a package "Foo::Bar" which
- has a subroutine
handles_type
- that returns a truthy value when passed the value of the
type
key in a hash that is passed to thecreate
subroutine, - then the
create
subroutine creates a new object with the passed key/value pairs used as attribute initializers.
Let's implement a Foo::Bar
package:
package Foo::Bar;
use v5.40;
use Moose;
extends 'Foo';
has 'type' => (
is => 'ro',
isa => 'Str',
required => 1,
);
has 'serves_drinks' => (
is => 'ro',
isa => 'Bool',
default => 0,
);
sub handles_type {
my $class = shift;
my $type = shift;
return $type eq "bar";
}
sub say_something {
my $self = shift;
$self->SUPER::say_something;
say "I serve drinks!" if $self->serves_drinks;
}
We can now indirectly use the Foo::Bar
package in our script:
use v5.40;
use Foo;
my $obj = Foo->create(type => bar, serves_drinks => 1);
$obj->say_something;
output:
Hello there, I am a bar.
I serve drinks!
Okay, now you understand all the bits and pieces that are needed to understand how I created the DSL engine. On to...
Putting it all together
We're actually quite close already. The create
factory method in the
last version of our Foo
package allows us to decide at run time which
module to instantiate an object of, and to load that module at run time.
We can use coercion and list flattening to turn a reference to a hash
into an object of the correct type.
We haven't looked yet at how to turn a JSON data structure into a hash, but that bit is actually ridiculously trivial:
use JSON::MaybeXS;
my $data = decode_json($json_string);
Tada, now $data is a reference to a deserialized version of the JSON string: if the JSON string contained an object, $data is a hashref; if the JSON string contained an array, $data is an arrayref, etc.
So, in other words, to create an extensible JSON-based DSL that is implemented by Moose objects, all we need to do is create a system that
- takes hash refs to set arguments
has factory methods to create objects, which
- uses
Module::Pluggable
to find the available object classes, and - uses the
type
attribute to figure out which object class to use to create the object
- uses
uses coercion to convert hash refs into objects using these factory methods
In practice, we could have a JSON file with the following structure:
{
"description": "do stuff",
"actions": [
{
"type": "bar",
"serves_drinks": true,
},
{
"type": "bar",
"serves_drinks": false,
}
]
}
... and then we could have a Moose object definition like this:
package MyDSL;
use v5.40;
use Moose;
use MyCoercions;
has "description" => (
is => 'ro',
isa => 'Str',
);
has 'actions' => (
is => 'ro',
isa => 'ArrayOfFoo'
coerce => 1,
required => 1,
);
sub say_something {
say "Hello there, I am described as " . $self->description . " and I am performing my actions: ";
foreach my $action(@{$self->actions}) {
$action->say_something;
}
}
Now, we can write a script that loads this JSON file and create a new object using the flattened arguments:
use v5.40;
use MyDSL;
use JSON::MaybeXS;
my $input_file_name = shift;
my $args = do {
local $/ = undef;
open my $input_fh, "<", $input_file_name or die "could not open file";
<$input_fh>;
};
$args = decode_json($args);
my $dsl = MyDSL->new(%$args);
$dsl->say_something
Output:
Hello there, I am described as do stuff and I am performing my actions:
Hello there, I am a bar
I am serving drinks!
Hello there, I am a bar
In some more detail, this will:
- Read the JSON file and deserialize it;
- Pass the object keys in the JSON file as arguments to a constructor of
the
MyDSL
class; - The
MyDSL
class then uses those arguments to set its attributes, using Moose coercion to convert the "actions" array of hashes into an array ofFoo::Bar
objects. - Perform the
say_something
method on theMyDSL
object
Once this is written, extending the scheme to also support a "quux" type
simply requires writing a Foo::Quux
class, making sure it has a method
handles_type
that returns a truthy value when called with quux
as
the argument, and installing it into the perl library path. This is
rather easy to do.
It can even be extended deeper, too; if the quux
type requires a list
of arguments rather than just a single argument, it could itself also
have an array attribute with relevant coercions. These coercions could
then be used to convert the list of arguments into an array of objects
of the correct type, using the same schema as above.
The actual DSL is of course somewhat more complex, and also actually does something useful, in contrast to the DSL that we define here which just says things.
Creating an object that actually performs some action when required is left as an exercise to the reader.