Writing an extensible JSON-based DSL with Moose

At work, I've been maintaining a perl script that needs to run a number of steps as part of a release workflow.

Initially, that script was very simple, but over time it has grown to do a number of things. And then some of those things did not need to be run all the time. And then we wanted to do this one exceptional thing for this one case. And so on; eventually the script became a big mess of configuration options and unreadable flow, and so I decided that I wanted it to be more configurable. I sat down and spent some time on this, and eventually came up with what I now realize is a domain-specific language (DSL) in JSON, implemented by creating objects in Moose, extensible by writing more object classes.

Let me explain how it works.

In order to explain, however, I need to explain some perl and Moose basics first. If you already know all that, you can safely skip ahead past the "Preliminaries" section that's next.

Preliminaries

Moose object creation, references.

In Moose, creating a class is done something like this:

package Foo;

use v5.40;
use Moose;

has 'attribute' => (
    is  => 'ro',
    isa => 'Str',
    required => 1
);

sub say_something {
    my $self = shift;
    say "Hello there, our attribute is " . $self->attribute;
}

The above is a class that has a single attribute called attribute. To create an object, you use the Moose constructor on the class, and pass it the attributes you want:

use v5.40;
use Foo;

my $foo = Foo->new(attribute => "foo");

$foo->say_something;

(output: Hello there, our attribute is foo)

This creates a new object with the attribute attribute set to bar. The attribute accessor is a method generated by Moose, which functions both as a getter and a setter (though in this particular case we made the attribute "ro", meaning read-only, so while it can be set at object creation time it cannot be changed by the setter anymore). So yay, an object.

And it has methods, things that we set ourselves. Basic OO, all that.

One of the peculiarities of perl is its concept of "lists". Not to be confused with the lists of python -- a concept that is called "arrays" in perl and is somewhat different -- in perl, lists are enumerations of values. They can be used as initializers for arrays or hashes, and they are used as arguments to subroutines. Lists cannot be nested; whenever a hash or array is passed in a list, the list is "flattened", that is, it becomes one big list.

This means that the below script is functionally equivalent to the above script that uses our "Foo" object:

use v5.40;
use Foo;

my %args;

$args{attribute} = "foo";

my $foo = Foo->new(%args);

$foo->say_something;

(output: Hello there, our attribute is foo)

This creates a hash %args wherein we set the attributes that we want to pass to our constructor. We set one attribute in %args, the one called attribute, and then use %args and rely on list flattening to create the object with the same attribute set (list flattening turns a hash into a list of key-value pairs).

Perl also has a concept of "references". These are scalar values that point to other values; the other value can be a hash, a list, or another scalar. There is syntax to create a non-scalar value at assignment time, called anonymous references, which is useful when one wants to remember non-scoped values. By default, references are not flattened, and this is what allows you to create multidimensional values in perl; however, it is possible to request list flattening by dereferencing the reference. The below example, again functionally equivalent to the previous two examples, demonstrates this:

use v5.40;
use Foo;

my $args = {};

$args->{attribute} = "foo";

my $foo = Foo->new(%$args);

$foo->say_something;

(output: Hello there, our attribute is foo)

This creates a scalar $args, which is a reference to an anonymous hash. Then, we set the key attribute of that anonymous hash to bar (note the use arrow operator here, which is used to indicate that we want to dereference a reference to a hash), and create the object using that reference, requesting hash dereferencing and flattening by using a double sigil, %$.

As a side note, objects in perl are references too, hence the fact that we have to use the dereferencing arrow to access the attributes and methods of Moose objects.

Moose attributes don't have to be strings or even simple scalars. They can also be references to hashes or arrays, or even other objects:

package Bar;

use v5.40;
use Moose;

extends 'Foo';

has 'hash_attribute' => (
    is => 'ro',
    isa => 'HashRef[Str]',
    predicate => 'has_hash_attribute',
);

has 'object_attribute' => (
    is => 'ro',
    isa => 'Foo',
    predicate => 'has_object_attribute',
);

sub say_something {
    my $self = shift;

    if($self->has_object_attribute) {
        $self->object_attribute->say_something;
    }

    $self->SUPER::say_something unless $self->has_hash_attribute;

    say "We have a hash attribute!"
}

This creates a subclass of Foo called Bar that has a hash attribute called hash_attribute, and an object attribute called object_attribute. Both of them are references; one to a hash, the other to an object. The hash ref is further limited in that it requires that each value in the hash must be a string (this is optional but can occasionally be useful), and the object ref in that it must refer to an object of the class Foo, or any of its subclasses.

The predicates used here are extra subroutines that Moose provides if you ask for them, and which allow you to see if an object's attribute has a value or not.

The example script would use an object like this:

use v5.40;
use Bar;

my $foo = Foo->new(attribute => "foo");

my $bar = Bar->new(object_attribute => $foo, attribute => "bar");

$bar->say_something;

(output: Hello there, our attribute is foo)

This example also shows object inheritance, and methods implemented in child classes.

Okay, that's it for perl and Moose basics. On to...

Moose Coercion

Moose has a concept of "value coercion". Value coercion allows you to tell Moose that if it sees one thing but expects another, it should convert is using a passed subroutine before assigning the value.

That sounds a bit dense without example, so let me show you how it works. Reimaginging the Bar package, we could use coercion to eliminate one object creation step from the creation of a Bar object:

package "Bar";

use v5.40;

use Moose;
use Moose::Util::TypeConstraints;

extends "Foo";

coerce "Foo",
    from "HashRef",
    via { Foo->new(%$_) };

has 'hash_attribute' => (
    is => 'ro',
    isa => 'HashRef',
    predicate => 'has_hash_attribute',
);

has 'object_attribute' => (
    is => 'ro',
    isa => 'Foo',
    coerce => 1,
    predicate => 'has_object_attribute',
);

sub say_something {
    my $self = shift;

    if($self->has_object_attribute) {
        $self->object_attribute->say_something;
    }

    $self->SUPER::say_something unless $self->has_hash_attribute;

    say "We have a hash attribute!"
}

Okay, let's unpack that a bit.

First, we add the Moose::Util::TypeConstraints module to our package. This is required to declare coercions.

Then, we declare a coercion to tell Moose how to convert a HashRef to a Foo object: by using the Foo constructor on a flattened list created from the hashref that it is given.

Then, we update the definition of the object_attribute to say that it should use coercions. This is not the default, because going through the list of coercions to find the right one has a performance penalty, so if the coercion is not requested then we do not do it.

This allows us to simplify declarations. With the updated Bar class, we can simplify our example script to this:

use v5.40;

use Bar;

my $bar = Bar->new(attribute => "bar", object_attribute => { attribute => "foo" });

$bar->say_something

(output: Hello there, our attribute is foo)

Here, the coercion kicks in because the value object_attribute, which is supposed to be an object of class Foo, is instead a hash ref. Without the coercion, this would produce an error message saying that the type of the object_attribute attribute is not a Foo object. With the coercion, however, the value that we pass to object_attribute is passed to a Foo constructor using list flattening, and then the resulting Foo object is assigned to the object_attribute attribute.

Coercion works for more complicated things, too; for instance, you can use coercion to coerce an array of hashes into an array of objects, by creating a subtype first:

package MyCoercions;
use v5.40;

use Moose;
use Moose::Util::TypeConstraints;

use Foo;

subtype "ArrayOfFoo", as "ArrayRef[Foo]";
subtype "ArrayOfHashes", as "ArrayRef[HashRef]";

coerce "ArrayOfFoo", from "ArrayOfHashes", via { [ map { Foo->create(%$_) } @{$_} ] };

Ick. That's a bit more complex.

What happens here is that we use the map function to iterate over a list of values.

The given list of values is @{$_}, which is perl for "dereference the default value as an array reference, and flatten the list of values in that array reference".

So the ArrayRef of HashRefs is dereferenced and flattened, and each HashRef in the ArrayRef is passed to the map function.

The map function then takes each hash ref in turn and passes it to the block of code that it is also given. In this case, that block is { Foo->create(%$_) }. In other words, we invoke the create factory method with the flattened hashref as an argument. This returns an object of the correct implementation (assuming our hash ref has a type attribute set), and with all attributes of their object set to the correct value. That value is then returned from the block (this could be made more explicit with a return call, but that is optional, perl defaults a return value to the rvalue of the last expression in a block).

The map function then returns a list of all the created objects, which we capture in an anonymous array ref (the [] square brackets), i.e., an ArrayRef of Foo object, passing the Moose requirement of ArrayRef[Foo].

Usually, I tend to put my coercions in a special-purpose package. Although it is not strictly required by Moose, I find that it is useful to do this, because Moose does not allow a coercion to be defined if a coercion for the same type had already been done in a different package. And while it is theoretically possible to make sure you only ever declare a coercion once in your entire codebase, I find that doing so is easier to remember if you put all your coercions in a specific package.

Okay, now you understand Moose object coercion! On to...

Dynamic module loading

Perl allows loading modules at runtime. In the most simple case, you just use require inside a stringy eval:

my $module = "Foo";
eval "require $module";

This loads "Foo" at runtime. Obviously, the $module string could be a computed value, it does not have to be hardcoded.

There are some obvious downsides to doing things this way, mostly in the fact that a computed value can basically be anything and so without proper checks this can quickly become an arbitrary code vulnerability. As such, there are a number of distributions on CPAN to help you with the low-level stuff of figuring out what the possible modules are, and how to load them.

For the purposes of my script, I used Module::Pluggable. Its API is fairly simple and straightforward:

package Foo;

use v5.40;
use Moose;

use Module::Pluggable require => 1;

has 'attribute' => (
    is => 'ro',
    isa => 'Str',
);

has 'type' => (
    is => 'ro',
    isa => 'Str',
    required => 1,
);

sub handles_type {
    return 0;
}

sub create {
    my $class = shift;
    my %data = @_;

    foreach my $impl($class->plugins) {
        if($impl->can("handles_type") && $impl->handles_type($data{type})) {
            return $impl->new(%data);
        }
    }
    die "could not find a plugin for type " . $data{type};
}

sub say_something {
    my $self = shift;
    say "Hello there, I am a " . $self->type;
}

The new concept here is the plugins class method, which is added by Module::Pluggable, and which searches perl's library paths for all modules that are in our namespace. The namespace is configurable, but by default it is the name of our module; so in the above example, if there were a package "Foo::Bar" which

has a subroutine handles_type
that returns a truthy value when passed the value of the type key in a hash that is passed to the create subroutine,
then the create subroutine creates a new object with the passed key/value pairs used as attribute initializers.

Let's implement a Foo::Bar package:

package Foo::Bar;

use v5.40;
use Moose;

extends 'Foo';

has 'type' => (
    is => 'ro',
    isa => 'Str',
    required => 1,
);

has 'serves_drinks' => (
    is => 'ro',
    isa => 'Bool',
    default => 0,
);

sub handles_type {
    my $class = shift;
    my $type = shift;

    return $type eq "bar";
}

sub say_something {
    my $self = shift;
    $self->SUPER::say_something;
    say "I serve drinks!" if $self->serves_drinks;
}

We can now indirectly use the Foo::Bar package in our script:

use v5.40;
use Foo;

my $obj = Foo->create(type => bar, serves_drinks => 1);

$obj->say_something;

output:

Hello there, I am a bar.
I serve drinks!

Okay, now you understand all the bits and pieces that are needed to understand how I created the DSL engine. On to...

Putting it all together

We're actually quite close already. The create factory method in the last version of our Foo package allows us to decide at run time which module to instantiate an object of, and to load that module at run time. We can use coercion and list flattening to turn a reference to a hash into an object of the correct type.

We haven't looked yet at how to turn a JSON data structure into a hash, but that bit is actually ridiculously trivial:

use JSON::MaybeXS;

my $data = decode_json($json_string);

Tada, now $data is a reference to a deserialized version of the JSON string: if the JSON string contained an object, $data is a hashref; if the JSON string contained an array, $data is an arrayref, etc.

So, in other words, to create an extensible JSON-based DSL that is implemented by Moose objects, all we need to do is create a system that

takes hash refs to set arguments
has factory methods to create objects, which
- uses Module::Pluggable to find the available object classes, and
- uses the type attribute to figure out which object class to use to create the object
uses coercion to convert hash refs into objects using these factory methods

In practice, we could have a JSON file with the following structure:

{
    "description": "do stuff",
    "actions": [
        {
            "type": "bar",
            "serves_drinks": true,
        },
        {
            "type": "bar",
            "serves_drinks": false,
        }
    ]
}

... and then we could have a Moose object definition like this:

package MyDSL;

use v5.40;
use Moose;

use MyCoercions;

has "description" => (
    is => 'ro',
    isa => 'Str',
);

has 'actions' => (
    is => 'ro',
    isa => 'ArrayOfFoo'
    coerce => 1,
    required => 1,
);

sub say_something {
    say "Hello there, I am described as " . $self->description . " and I am performing my actions: ";

    foreach my $action(@{$self->actions}) {
        $action->say_something;
    }
}

Now, we can write a script that loads this JSON file and create a new object using the flattened arguments:

use v5.40;
use MyDSL;
use JSON::MaybeXS;

my $input_file_name = shift;

my $args = do {
    local $/ = undef;

    open my $input_fh, "<", $input_file_name or die "could not open file";
    <$input_fh>;
};

$args = decode_json($args);

my $dsl = MyDSL->new(%$args);

$dsl->say_something

Output:

Hello there, I am described as do stuff and I am performing my actions:
Hello there, I am a bar
I am serving drinks!
Hello there, I am a bar

In some more detail, this will:

Read the JSON file and deserialize it;
Pass the object keys in the JSON file as arguments to a constructor of the MyDSL class;
The MyDSL class then uses those arguments to set its attributes, using Moose coercion to convert the "actions" array of hashes into an array of Foo::Bar objects.
Perform the say_something method on the MyDSL object

Once this is written, extending the scheme to also support a "quux" type simply requires writing a Foo::Quux class, making sure it has a method handles_type that returns a truthy value when called with quux as the argument, and installing it into the perl library path. This is rather easy to do.

It can even be extended deeper, too; if the quux type requires a list of arguments rather than just a single argument, it could itself also have an array attribute with relevant coercions. These coercions could then be used to convert the list of arguments into an array of objects of the correct type, using the same schema as above.

The actual DSL is of course somewhat more complex, and also actually does something useful, in contrast to the DSL that we define here which just says things.

Creating an object that actually performs some action when required is left as an exercise to the reader.