You are watching the part of my weblog that is syndicated to Planet Debian. You may instead be interested in the full weblog

I have a rather sizeable DVD collection. The database that I created of them a few years back after I'd had a few episodes where I accidentally bought the same movie more than once claims there's over 300 movies in the cabinet. Additionally, I own a number of TV shows on DVD, which, if you count individual disks, will probably end up being about the same number.

A few years ago, I decided that I was tired of walking to the DVD cabinet, taking out a disc, and placing it in the reader. That instead, I wanted to digitize them and use kodi to be able to watch a movie whenever I felt like it. So I made some calculations, and came up with a system with enough storage (on ZFS, of course) to store all the DVDs without needing to re-encode them.

I got started on ripping most of the DVDs using dvdbackup, but it quickly became apparent that I'd made a miscalculation; where I thought that most of the DVDs would be 4.7G ones, it turns out that most commercial DVDs are actually of the 9G type. Come to think of it, that does make a lot of sense. Additionally, now that I had a home server that had some significant reduntant storage, I found that I had some additional uses for such things. The storage that I had, vast enough though it may be, wouldn't suffice.

So, I gave this some more thought, but then life interfered and nothing happened for a few years.

Recently however, I've picked it up again, changing my workflow. I started using handbrake to re-encode the DVDs so they wouldn't take up quite so much space; having chosen VP9 as my preferred codec, I end up storing the DVDs as about 1 to 2 G per main feature, rather than the 8 to 9 that it used to be -- a significant gain. However, my first workflow wasn't very efficient; I would run the handbrake GUI from my laptop on ssh -X sessions to multiple machines, encoding the videos directly from DVD that way. That worked, but it meant I couldn't shut down my laptop to take it to work without interrupting work that was happening; also, it meant that if a DVD finished encoding in the middle of the night, I wouldn't be there to replace it, so the system would be sitting idle for several hours. Clearly some form of improvement was necessary if I was going to do this in any reasonable amount of time.

So after fooling around a bit, I came up with the following:

  • First, I use dvdbackup -M -r a to read the DVD without re-encoding anything. This can be done at the speed of the optical medium, and can therefore be done much more efficiently than to use handbrake directly from the DVD. The -M option tells dvdbackup to read everything from the DVD (to make a mirror of it, in effect). The -r a option tells dvdbackup to abort if it encounters a read error; I found that DVDs sometimes can be read successfully if I eject the drive and immediately reinsert it, or if I give the disk another clean, or even just try again in a different DVD reader. Sometimes the disk is just damaged, and then using dvdbackup's default mode of skipping the unreadable blocks makes sense, but not in a first attempt.
  • Then, I run a small little perl script that I wrote. It basically does two things:

    1. Run HandBrakeCLI -i <dvdbackup output> --previews 1 -t 0, parse its stderr output, and figure out what the first and the last titles on the DVD are.
    2. Run qsub -N <movie name> -v FILM=<dvdbackup output> -t <first title>-<last title> convert-film
  • The convert-film script is a bash script, which (in its first version) did this:

    mkdir -p "$OUTPUTDIR/$FILM/tmp"
    HandBrakeCLI -x "threads=1" --no-dvdnav -i "$INPUTDIR/$FILM" -e vp9 -E copy -T -t $SGE_TASK_ID --all-audio --all-subtitles -o "$OUTPUTDIR/$FILM/tmp/T${SGE_TASK_ID}.mkv"

    Essentially, that converts a single title to a VP9-encoded matroska file, with all the subtitles and audio streams intact, and forcing it to use only one thread -- having it use multiple threads is useful if you care about a single DVD converting as fast as possible, but I don't, and having four DVDs on a four-core system all convert at 100% CPU seems more efficient than having two convert at about 180% each. I did consider using HandBrakeCLI's options to only extract the "interesting" audio and subtitle tracks, but I prefer to not have dubbed audio (to have subtitled audio instead); since some of my DVDs are originally in non-English languages, doing so gets rather complex. The audio and subtitle tracks don't take up that much space, so I decided not to bother with that in the end.

The use of qsub, which submits the script into gridengine, allows me to hook up several encoder nodes (read: the server plus a few old laptops) to the same queue.

That went pretty well, until I wanted to figure out how far along something was going. HandBrakeCLI provides progress information on stderr, and I can just do a tail -f of the stderr output logs, but that really works well only for one one DVD at a time, not if you're trying to follow along with about a dozen of them.

So I made a database, and wrote another perl script. This latter will parse the stderr output of HandBrakeCLI, fish out the progress information, and put the completion percentage as well as the ETA time into a database. Then it became interesting:

  IF (TG_OP = 'INSERT') OR (TG_OP = 'UPDATE' AND (NEW.progress != OLD.progress) OR NEW.finished = TRUE) THEN
    PERFORM pg_notify('transjob', row_to_json(NEW)::varchar);
$$ LANGUAGE plpgsql;
CREATE TRIGGER transjob_tcn_trigger

This uses PostgreSQL's asynchronous notification feature to send out a notification whenever an interesting change has happened to the table.

#!/usr/bin/perl -w

use strict;
use warnings;

use Mojolicious::Lite;
use Mojo::Pg;


helper dbh => sub { state $pg = Mojo::Pg->new->dsn("dbi:Pg:dbname=transcode"); };

websocket '/updates' => sub {
    my $c = shift;
    my $cb = $c->dbh->pubsub->listen(transjob => sub { $c->send(pop) });
    $c->on(finish => sub { shift->dbh->pubsub->unlisten(transjob => $cb) });


This uses the Mojolicious framework and Mojo::Pg to send out the payload of the "transjob" notification (which we created with the FOR EACH ROW trigger inside PostgreSQL earlier, and which contains the JSON version of the table row) over a WebSocket. Then it's just a small matter of programming to write some javascript which dynamically updates the webpage whenever that happens, and Tadaa! I have an online overview of the videos that are transcoding, and how far along they are.

That only requires me to keep the queue non-empty, which I can easily do by running dvdbackup a few times in parallel every so often. That's a nice saturday afternoon project...

Posted mid-morning Tuesday, May 15th, 2018

There are many tools to implement this, and yeah, this is not the fastest. But the advantage is that you don't need extra tools beyond "bash" and "ping"...

for i in $(seq 1 254); do
  if ping -W 1 -c 1 192.168.0.$i; then
echo ${!HOST[@]}

will give you the host addresses for the machines that are live on a given network...

Posted at noon on Sunday, April 22nd, 2018

Day four was a day of pancakes and stew. Oh, and some video work, too.


Did more documentation review. She finished SReview documentation, got started on the documentation of the examples of our ansible repository.


Finished splitting out the ansible configuration from the ansible code repository. The code repository now includes an example configuration that is well documented for getting started, whereas our production configuration lives in a separate repository.


Spent much time on the debconf website, mostly working on a new upstream release of wafer.

He also helped review Kyle's documentation, and spent some time together with Tzafrir debugging our ansible test setup.


Worked on documentation, and did a test run of the ansible repository. Found and fixed issues that cropped up during that.


Spent much time trying to figure out why SReview was not doing what he was expecting it to do. Side note: I hate video codecs. Things are working now, though, and most of the fixes were implemented in a way that makes it reusable for other conferences.

There's one more day coming up today. Hopefully won't forget to blog about it tonight.

Posted late Friday morning, February 2nd, 2018

This should really have been the "day two" post, but I forgot to do that yesterday, and now it's the end of day three already, so let's just do the two together for now.


Has been hacking on the opsis so we can get audio through it, but so far without much success. In addition, he's been working a bit more on documentation, as well as splitting up some data that's currently in our ansible repository into a separate one so that other people can use our ansible configuration more easily, without having to fork too much.


Did some tests on the ansible setup, and did some documentation work, and worked on a kodi plugin for parsing the metadata that we've generated.


Did some work on the DebConf website. This wasn't meant to be much, but yak shaving sucks. Additionally, he's been doing some work on the youtube uploader as well.


Did more work reviewing our documentation, and has been working on rewording some of the more awkward bits.


Spent much time on improving the SReview installation for FOSDEM. While at it, fixed a number of bugs in some of the newer code that were exposed by full tests of the FOSDEM installation. Additionally, added code to SReview to generate metadata files that can be handed to Stefano's youtube uploader.


Although he had less time yesterday than he did on monday (and apparently no time today) to sprint remotely, Pollo still managed to add a basic CI infrastructure to lint our ansible playbooks.

Posted Wednesday evening, January 31st, 2018

I'm at the Linux Belgium training center, where this last week before FOSDEM the DebConf video team is holding a sprint. The nice folks of Linux Belgium made us feel pretty welcome:

Linux Belgium message

Yesterday was the first day of that sprint, where I had planned to blog about things, but I forgot, so here goes (first thing this morning)

Nattie and Tzafrir

Nattie and Tzafrir have been spending much of their time proofreading our documentation, and giving us feedback to improve their readability and accuracy.


Spent some time working on his youtube uploader. He didn't finish it to a committable state yet, but more is to be expected today.

He also worked on landing a gstreamer pipeline change that was suggested at LCA last week (which he also visited), and did some work on setting up the debconf18 dev website.

Finally, he fixed the irker config on salsa so that it would actually work and send commit messages to IRC after a push.


Wrote a lot of extra documentation on the opsis that we use and various other subjects, and also fixed some of the templates of the documentation, so that things would look better and link correctly.


I spent much of my time working on the FOSDEM SReview instance, which will be used next weekend; that also allowed me to improve the code quality of some of the newer stuff that I wrote over the past few months. In between things, being the local guy here, I also drove around getting a bit of stuff that we needed.


Pollo isn't here, but he's sprinting remotely from home. He spent some time setting up gitlab-ci so that it would build our documentation after pushing to salsa.

Posted mid-morning Tuesday, January 30th, 2018

Somebody recently pointed me towards a blog post by a small business owner who proclaimed to the world that using Devuan (and not Debian) is better, because it's cheaper.


Looking at creating Devuan, which means splitting of Debian, economically, you caused approximately infinite cost.

Well, no. I'm immensely grateful to the Devuan developers, because when they announced their fork, all the complaints about systemd on the debian-devel mailinglist ceased to exist. Rather than a cost, that was an immensely gratifying experience, and it made sure that I started reading the debian-devel mailinglist again, which I had stopped for a while before that. Meanwhile, life in Debian went on as it always has.

Debian values choice. Fedora may not be about choice, but Debian is. If there are two ways of doing something, Debian will include all four. If you want to run a Linux system, and you're not sure whether to use systemd, upstart, or something else, then Debian is for you! (well, except if you want to use upstart, which is in jessie but not in stretch). Debian defaults to using systemd, but it doesn't enforce it; and while it may require a bit of manual handholding to make sure that systemd never ever ever ends up on your system, this is essentially not difficult.

you@your-machine:~$ apt install equivs; equivs-control your-sanity; $EDITOR your-sanity

Now make sure that what you get looks something like this (ignoring comments):

Section: misc
Priority: standard
Standards-Version: <whatever was there>

Package: your-sanity
Essential: yes
Conflicts: systemd-sysv
Description: Make sure this system does not install what I don't want
 The packages in the Conflicts: header cannot be installed without
 very difficult steps, and apt will never offer to install them.

Install it on every system where you don't want to run systemd. You're done, you'll never run systemd. Well, except if someone types the literal phrase "Yes, do as I say!", including punctuation and everything, when asked to do so. If you do that, well, you get to keep both pieces. Also, did you see my pun there? Yes, it's a bit silly, I admit it.

But before you take that step, consider this.

Four years ago, I was an outspoken opponent of systemd. It was a bad idea, I thought. It is not portable. It will cause the death of Debian GNU/kFreeBSD, and a few other things. It is difficult to understand and debug. It comes with a truckload of other things that want to replace the universe. Most of all, their developers had a pretty bad reputation of being, pardon my French, arrogant assholes.

Then, the systemd maintainers filed bug 796633, asking me to provide a systemd unit for nbd-client, since it provided an rcS init script (which is really a very special case), and the compatibility support for that in systemd was complicated and support for it would be removed from the systemd side. Additionally, providing a systemd template unit would make the systemd nbd experience much better, without dropping support for other init systems (those cases can still use the init script). In order to develop that, I needed a system to test things on. Since I usually test things on my laptop, I installed systemd on my laptop. The intent was to remove it afterwards. However, for various reasons, that never happened, and I still run systemd as my pid1. Here's why:

  • Systemd is much faster. Where my laptop previously took 30 to 45 seconds to boot using sysvinit, it takes less than five. In fact, it took longer for it to do the POST than it took for the system to boot from the time the kernel was loaded. I changed the grub timeout from the default of five seconds to something more reasonable, because I found that five seconds was just ridiculously long if it takes about half that for the rest of the system to boot to a login prompt afterwards.
  • Systemd is much more reliable. That is, it will fail more often, but it will reliably fail. When it fails, it will tell you why it failed, so you can figure out what went wrong and fix it, making sure the system never fails again in the same fashion. The unfortunate fact of the matter is that there were many bugs in our init scripts, but they were never discovered and therefore lingered. For instance, you would not know about this race condition between two init scripts, because sysvinit is so dog slow that 99 times out of 100 it would not trigger, and therefore you don't see it. The one time you do see it, something didn't come up, but sysvinit doesn't log about such errors (it expects the init script to do so), so all you can do is go "damn, wtf happened?!?" and manually start things, allowing the bug to remain. These race conditions were much more likely to trigger with systemd, which caused it a lot of grief originally; but really, you should be thankful, because now that all these race conditions have been discovered by way of an init system that is much more verbose about such problems, they have also been fixed, and your sysvinit system is more reliable, too, as a result. There are other similar issues (dependency loops, to name one) that systemd helped fix.
  • Systemd is different, and that requires some re-schooling. When I first moved my laptop to systemd, I remember running into some kind of issue that I couldn't figure out how to fix. No, I don't remember the specifics of that issue, but they don't really matter. The point is this: at first, I thought "this is horrible, you can't debug it, how can you use such a system". And while it's true that undebuggable systems are not very useful, the systemd maintainers know this too, and therefore systemd is debuggable. It's just that you don't debug it by throwing some imperative init script code through a debugger (or, worse, something like sh -x), because there is no imperative init script code to throw through such a debugger, and therefore that makes little sense. Instead, there is a wealth of different tools to inspect the systemd state, and a lot of documentation on what the different things mean. It takes a while to internalize all that; and if you're not convinced that systemd is a good thing then it may mean some cursing while you're fighting your way through. But in the end, systemd is not more difficult to debug than simple init scripts -- in fact, it sometimes may be easier, because the system is easier to reason about.
  • While systemd comes with a truckload of extra daemons (systemd-networkd, systemd-resolved, systemd-hostnamed, etc etc etc), the systemd in their name do not imply that they are required by systemd. In fact, it's the other way around: you are required to run systemd if you want to run systemd-networkd (etc), because systemd-networkd (etc) make extensive use of the systemd infrastructure and public APIs; but nothing inside systemd requires that systemd-networkd (etc) are running. In fact, on my personal laptop, beyond systemd and udev themselves, I'm not using anything that gets built from the systemd source.

I'm not saying these reasons are universally true, and I'm not saying that you'll like systemd as much as I have. I am saying, however, that you should give it an honest attempt before you say "I'm not going to run systemd, ever," because you might be surprised by the huge gap of difference between what you expected and what you got. I know I was.

So, given all that, do I think that Devuan is a good idea? It is if you want flamewars. It gives those people who want vilify systemd a place to do that without bothering Debian with their opinion. But beyond that, if you want to run Debian and you don't want to run systemd, you can! Just make sure you choose the right options, and you're done.

All that makes me wonder why today, almost half a year after the initial release of Debian 9.0 "Stretch", Devuan Ascii still hasn't released, and why it took them over two years to release their Devuan Jessie based on Debian Jessie. But maybe that's just me.

Posted Monday afternoon, December 11th, 2017

For future reference (to myself, for the most part):

ffmpeg -i foo.webm -i foo.en.vtt -i -map 0:v -map 0:a \
  -map 1:s -map 2:s -metadata:s:a language=eng -metadata:s:s:0   \
  language=eng -metadata:s:s:1 language=nld -c copy -y           \

... is one way to create a single .webm file from one .webm input file and multiple .vtt files. A little bit of explanation:

  • The -i arguments pass input files. You can have multiple input files for one output file. They are numbered internally (this is necessary for the -map and -metadata options later), starting from 0.
  • The -map options take a "mapping". With them, you specify which input streams should go where in the output stream. By default, if you have multiple streams of the same type, ffmpeg will only pick one (the "best" one, whatever that is). The mappings we specify are:

    • -map 0:v: this means to take the video stream from the first file (this is the default if you do not specify any mapping at all; but if you do specify a mapping, you need to be complete)
    • -map 0:a: take the audio stream from the first file as well (same as with the video).
    • -map 1:s: take the subtitle stream from the second (i.e., indexed 1) file.
    • -map 2:s: take the subtitle stream from the third (i.e., indexed 2) file.
  • The -metadata options set metadata on the output file. Here, we pass:

    • -metadata:s:a language=eng, to add a 's'tream metadata item on the 'a'udio stream, with name language and content eng. The language metadata in ffmpeg is special, in that it gets automatically translated to the correct way of specifying the language in the target container format.
    • -metadata:s:s:0 language=eng, to add a 's'tream metadata item on the first (indexed 0) 's'ubtitle stream in the output file. This too has the english language set
    • `-metadata:s:s:1 language=nld', to add a 's'tream metadata item on the second (indexed 1) 's'ubtitle stream in the output file. This has dutch set as the language.
  • The -c copy option tells ffmpeg to not transcode the input video data, but just to rewrite the container. This works because all input files (WebM video plus VTT subtitles) are valid for WebM. If you do not have an input subtitle format that is valid for WebM, you can instead limit the copy modifier to the video and audio only, allowing ffmpeg to transcode the subtitles. This is done by way of -c:v copy -c:a copy.
  • Finally, we pass -y to specify that any pre-existing output file should be overwritten, and the name of the output file.
Posted at lunch time on Thursday, December 7th, 2017

This morning I uploaded version 0.1 of SReview, my video review and transcoding system, to Debian experimental. There's still some work to be done before it'll be perfectly easy to use by anyone, but I do think I've reached the point by now where it should have basic usability by now.

Quick HOWTO for how to use it:

  • Enable Debian experimental
  • Install the packages sreview-master, sreview-encoder, sreview-detect, and sreview-web. It's possible to install the four packages on different machines, but let's not go into too much detail there, yet.
  • The installation will create an sreview user and database, and will start the sreview-web service on port 8080, listening only to localhost. The sreview-web package also ships with an apache configuration snippet that shows how to proxy it from the interwebs if you want to.
  • Run sreview-config --action=dump. This will show you the current configuration of sreview. If you want to change something, either change it in /etc/sreview/, or just run sreview-config --set=variable=value --action=update.
  • Run sreview-user -d --action=create -u <your email>. This will create an administrator user in the sreview database.
  • Open a webbrowser, browse to http://localhost:8080/, and test whether you can log on.
  • Write a script to insert the schedule of your event into the SReview database. Look at the debconf and fosdem scripts for inspiration if you need it. Yeah, that's something I still need to genericize, but I'm not quite sure yet how to do that.
  • Either configure gridengine so that it will have the required queues and resources for SReview, or disable the qsub commands in the SReview state_actions configuration parameter (e.g., by way of sreview-config --action=update --set=state_actions=... or by editing /etc/sreview/
  • If you need notification, modify the state_actions entry for notification so that it sends out a notification (e.g., through an IRC bot or an email address, or something along those lines). Alternatively, enable the "anonreviews" option, so that the overview page has links to your talk.
  • Review the inputglob and parse_re configuration parameters of SReview. The first should contain a filesystem glob that will find your raw assets; the second should parse the filename into room, year, month, day, hour, minute, and second, components. Look at the defaults of those options for examples (or just use those, and store your files as /srv/sreview/incoming/<room>/<year>-<month>-<day>/<hour>:<minute>:<second>.*).
  • Provide an SVG file for opening credits, and point to it from the preroll_template configuration option.
  • Provide an SVG or PNG file for closing credits, and point to it from the postroll_template resp postroll configuration option.
  • Start recording, and watch SReview do its magic :-)

There's still some bits of the above list that I want to make easier to do, and there's still some things that shouldn't be strictly necessary, but all in all, I think SReview has now reached a certain level of maturity that means I felt confident doing its first upload to Debian.

Did you try it out? Let me know what you think!

Posted at lunch time on Friday, November 10th, 2017

At work, I help maintain a smartcard middleware that is provided to Belgian citizens who want to use their electronic ID card to, e.g., log on to government websites. This middleware is a piece of software that hooks into various browsers and adds a way to access the smartcard in question, through whatever APIs the operating system and the browser in question provide for that purpose. The details of how that is done differ between each browser (and in the case of Google Chrome, for the same browser between different operating systems); but for Firefox (and Google Chrome on free operating systems), this is done by way of a PKCS#11 module.

For Firefox 57, mozilla decided to overhaul much of their browser. The changes are large and massive, and in some ways revolutionary. It's no surprise, therefore, that some of the changes break compatibility with older things.

One of the areas in which breaking changes were made is in the area of extensions to the browser. Previously, Firefox had various APIs available for extensions; right now, all APIs apart from the WebExtensions API are considered "legacy" and support for them will be removed from Firefox 57 going forward.

Since installing a PKCS#11 module manually is a bit complicated, and since the legacy APIs provided a way to do so automatically provided the user would first install an add-on (or provided the installer of the PKCS#11 module sideloads it), most parties who provide a PKCS#11 module for use with Firefox will provide an add-on to automatically install it. Since the alternative involves entering the right values in a dialog box that's hidden away somewhere deep in the preferences screen, the add-on option is much more user friendly.

I'm sure you can imagine my dismay when I found out that there was no WebExtensions API to provide the same functionality. So, after asking around a bit, I filed bug 1357391 to get a discussion started. While it took some convincing initially to get people to understand the reasons for wanting such an API, eventually the bug was assigned the "P5" priority -- essentially, a "we understand the need and won't block it, but we don't have the time to implement it. Patches welcome, though" statement.

Since having an add-on was something that work really wanted, and since I had the time, I got the go-ahead from management to look into implementing the required code myself. I made it obvious rather quickly that my background in Firefox was fairly limited, though, and so was assigned a mentor to help me through the process.

Having been a Debian Developer for the past fifteen years, I do understand how to develop free software. Yet, the experience was different enough that still learned some new things about free software development, which was somewhat unexpected.

Unfortunately, the process took much longer than I had hoped, which meant that the patch was not ready by the time Firefox 57 was branched off mozilla's "central" repository. The result of that is that while my patch has been merged into what will eventually become Firefox 58, it looks strongly as though it won't make it into Firefox 57. That's going to cause some severe headaches, which I'm not looking forward to; and while I can certainly understand the reasons for not wanting to grant the exception for the merge into 57, I can't help but feeling like this is a missed opportunity.

Anyway, writing code for the massive Open Source project that mozilla is has been a load of fun, and in the process I've learned a lot -- not only about Open Source development in general, but also about this weird little thing that Javascript is. That might actually be useful for this other project that I've got running here.

In closing, I'd like to thank Tomislav 'zombie' Jovanovic for mentoring me during the whole process, without whom it would have been doubtful if I would even have been ready by now. Apologies for any procedural mistakes I've made, and good luck in your future endeavours! :-)

Posted Thursday afternoon, October 5th, 2017

As I've blogged before, I've been on and off working on SReview, a video review system. Development over the past year has been mostly driven by the need to have something up and running for first FOSDEM 2017, and then DebConf17, and so I've cut corners left and right which made the system, while functional, not quite entirely perfect everywhere. For instance, the backend scripts were done in ad-hoc perl, each reinventing their own wheel. Doing so made it easier for me to experiment with things and figure out where I want them to go, without immediately creating a lot of baggage that is not necessarily something I want to be stuck to. This flexibility has already paid off, in that I've redone the state machine between FOSDEM and DebConf17—and all it needed was to update a few SQL statements here and there. Well, and add a few of them, too.

It was always the intent to replace most of the ad-hoc perl with something better, however, once the time was ripe. One place where historical baggage is not so much of a problem, and where in fact abstracting away the complexity would now be an asset, is in the area of FFmpeg command lines. Currently, these are built by simple string expansion. For example, we do something like this (shortened for brevity):

system("ffmpeg -y -i $outputdir/$slug.ts -pass 1 -passlogfile ...");

inside an environment where the $outputdir and $slug variables are set in a perl environment. That works, but it has some downsides; e.g., adding or removing options based on which codecs we're using is not so easy. It would be much more flexible if the command lines were generated dynamically based on requested output bandwidth and codecs, rather than that they be hardcoded in the file. Case in point: currently there are multiple versions of some of the backend scripts, that only differ in details—mostly the chosen codec on the ffmpeg command line. Obviously this is suboptimal.

Instead, we want a way where video file formats can be autodetected, so that I can just say "create a file that uses encoder etc settings of this other file here". In addition, we also want a way where we can say "create a file that uses encoder etc settings of this other file here, except for these one or two options that I want to fine-tune manually". When I first thought about doing that about a year ago, that seemed complicated and not worth it—or at least not to that extent.

Enter Moose.

The Moose OO system for Perl 5 is an interesting way to do object orientation in Perl. I knew Perl supports OO, and I had heard about Moose, but never had looked into it, mostly because the standard perl OO features were "good enough". Until now.

Moose has a concept of adding 'attributes' to objects. Attributes can be set at object construction time, or can be accessed later on by way of getter/setter functions, or even simply functions named after the attribute itself (the default). For more complicated attributes, where the value may not be known until some time after the object has been created, Moose borrows the concept of "lazy" variables from Perl 6:

package Object;

use Moose;

has 'time' => (
    is => 'rw',
    builder => 'read_time',
    lazy => 1,

sub read_time {
    return localtime();

The above object has an attribute 'time', which will not have a value initially. However, upon first read, the 'localtime()' function will be called, the result is cached, and then (and on all further calls of the same function), the cached result will be returned. In addition, since the attribute is read/write, the time can also be written to. In that case, any cached value that may exist will be overwritten, and if no cached value exists yet, the read_time function will never be called. (it is also possible to clear values if needs be, so that the function would be called again).

We use this with the following pattern:

package SReview::Video;

use Moose;

has 'url' => (
    is => 'rw',

has 'video_codec' => (
    is => 'rw',
    builder => '_probe_videocodec',
    lazy => 1,

has 'videodata' => (
    is => 'bare',
    reader => '_get_videodata',
    builder => '_probe_videodata',
    lazy => 1,

has 'probedata' => (
    is => 'bare',
    reader => '_get_probedata',
    builder => '_probe',
    lazy => 1,

sub _probe_videocodec {
    my $self = shift;
    return $self->_get_videodata->{codec_name};

sub _probe_videodata {
    my $self = shift;
    if(!exists($self->_get_probedata->{streams})) {
        return {};
    foreach my $stream(@{$self->_get_probedata->{streams}}) {
        if($stream->{codec_type} eq "video") {
            return $stream;
    return {};

sub _probe {
    my $self = shift;

    open JSON, "ffprobe -print_format json -show_format -show_streams " . $self->url . "|"
    my $json = "";
    while(<JSON>) {
        $json .= $_;
    close JSON;
    return decode_json($json);

The videodata and probedata attributes are internal-use only attributes, and are therefore of the 'bare' type—that is, they cannot be read nor written to. However, we do add 'reader' functions that can be used from inside the object, so that the object itself can access them. These reader functions are generated, so they're not part of the object source. The probedata attribute's builder simply calls ffprobe with the right command-line arguments to retrieve data in JSON format, and then decodes that JSON file.

Since the passed JSON file contains an array with (at least) two streams—one for video and one for audio—and since the ordering of those streams depends on the file and is therefore not guaranteed, we have to loop over them. Since doing so in each and every attribute of the file we might be interested in would be tedious, we add a videodata attribute that just returns the data for the first found video stream (the actual source also contains a similar one for audio streams).

So, if you create an SReview::Video object and you pass it a filename in the url attribute, and then immediately run print $object->video_codec, then the object will

  • call ffprobe, and cache the (decoded) output for further use
  • from that, extract the video stream data, and cache that for further use
  • from that, extract the name of the used codec, cache it, and then return that name to the caller.

However, if the caller first calls $object->video_codec('h264'), then the ffprobe and most of the caching will be skipped, and instead the h265 data will be returned as video codec name.

Okay, so with a reasonably small amount of code, we now have a bunch of attributes that have defaults based on actual files but can be overwritten when necessary. Useful, right? Well, you might also want to care about the fact that sometimes you want to generate a video file that uses the same codec settings of this other file here. That's easy. First, we add another attribute:

has 'reference' => (
    is => 'ro',
    isa => 'SReview::Video',
    predicate => 'has_reference'

which we then use in the _probe method like so:

sub _probe {
    my $self = shift;

    if($self->has_reference) {
        return $self->reference->_get_probedata;
    # original code remains here

With that, we can create an object like so:

my $video = SReview::Video->new(url => 'file.ts');
my $generated = SReview::Video->new(url => 'file2.ts', reference => $video);

now if we ask the $generated object what the value of its video_codec setting is without telling it ourselves first, it will use the $video object for its probed data, and use that.

That only misses generating the ffmpeg command line, but that's all fairly straightforward and therefore left as an exercise to the reader. Or you can cheat, and look it up.

Posted Saturday evening, September 2nd, 2017