Transcoding video from one format to another seems to be a bit of a black art. There are many tools that allow doing this kind of stuff, but one issue that most seem to have is that they're not very well documented.
I ran against this a few years ago, when I was first doing video work for FOSDEM and did not yet have proper tools to do the review and transcoding workflow.
At the time, I just used mplayer to look at the .dv files, and wrote a
text file with a simple structure to remember exactly what to do with
it. That file was then fed to a perl script which wrote out a shell
script that would use the avconv
command to combine and extract the
"interesting" data from the source DV files into a single DV file per
talk, and which would then call a shell script which used gst-launch
and sox
to do a multi-pass transcode of those intermediate DV files
into a WebM file.
While all that worked properly, it was a rather ugly hack, never cleaned up, and therefore I never really documented it properly either. Recently, however, someone asked me to do so anyway, so here goes. Before you want to complain about how this ate the videos of your firstborn child, however, note the above.
The perl script spent a somewhat large amount of code reading out the text file and parsing it into an array of hashes. I'm not going to reproduce that, since the actual format of the file isn't all that important anyway. However, here's the interesting bits:
foreach my $pfile(keys %parts) {
my @files = @{$parts{$pfile}};
say "#" x (length($pfile) + 4);
say "# " . $pfile . " #";
say "#" x (length($pfile) + 4);
foreach my $file(@files) {
my $start = "";
my $stop = "";
if(defined($file->{start})) {
$start = "-ss " . $file->{start};
}
if(defined($file->{stop})) {
$stop = "-t " . $file->{stop};
}
if(defined($file->{start}) && defined($file->{stop})) {
my @itime = split /:/, $file->{start};
my @otime = split /:/, $file->{stop};
$otime[0]-=$itime[0];
$otime[1]-=$itime[1];
if($otime[1]<0) {
$otime[0]-=1;
$otime[1]+=60;
}
$otime[2]-=$itime[2];
if($otime[2]<0) {
$otime[1]-=1;
$otime[2]+=60;
}
$stop = "-t " . $otime[0] . ":" . $otime[1] . ":" . $otime[2];
}
if(defined($file->{start}) || defined($file->{stop})) {
say "ln " . $file->{name} . ".dv part-pre.dv";
say "avconv -i part-pre.dv $start $stop -y -acodec copy -vcodec copy part.dv";
say "rm -f part-pre.dv";
} else {
say "ln " . $file->{name} . ".dv part.dv";
}
say "cat part.dv >> /tmp/" . $pfile . ".dv";
say "rm -f part.dv";
}
say "dv2webm /tmp/" . $pfile . ".dv";
say "rm -f /tmp/" . $pfile . ".dv";
say "scp /tmp/" . $pfile . ".webm video.fosdem.org:$uploadpath || true";
say "mv /tmp/" . $pfile . ".webm .";
}
That script uses avconv to read one or more .dv files and transcode them
into a single .dv file with all the start- or end-junk removed. It uses
/tmp
rather than the working directory, since the working directory was
somewhere on the network, and if you're going to write several gigabytes
of data to an intermediate file, it's usually a good idea to write them
to a local filesystem rather than to a networked one.
Pretty boring.
It finally calls dv2webm
on the resulting .dv file. That script looks
like this:
#!/bin/bash
set -e
newfile=$(basename $1 .dv).webm
wavfile=$(basename $1 .dv).wav
wavfile=$(readlink -f $wavfile)
normalfile=$(basename $1 .dv)-normal.wav
normalfile=$(readlink -f $normalfile)
oldfile=$(readlink -f $1)
echo -e "\033]0;Pass 1: $newfile\007"
gst-launch-0.10 webmmux name=mux ! fakesink \
uridecodebin uri=file://$oldfile name=demux \
demux. ! ffmpegcolorspace ! deinterlace ! vp8enc multipass-cache-file=/tmp/vp8-multipass multipass-mode=1 threads=2 ! queue ! mux.video_0 \
demux. ! progressreport ! audioconvert ! audiorate ! tee name=t ! queue ! vorbisenc ! queue ! mux.audio_0 \
t. ! queue ! wavenc ! filesink location=$wavfile
echo -e "\033]0;Audio normalize: $newfile\007"
sox --norm $wavfile $normalfile
echo -e "\033]0;Pass 2: $newfile\007"
gst-launch-0.10 webmmux name=mux ! filesink location=$newfile \
uridecodebin uri=file://$oldfile name=video \
uridecodebin uri=file://$normalfile name=audio \
video. ! ffmpegcolorspace ! deinterlace ! vp8enc multipass-cache-file=/tmp/vp8-multipass multipass-mode=2 threads=2 ! queue ! mux.video_0 \
audio. ! progressreport ! audioconvert ! audiorate ! vorbisenc ! queue ! mux.audio_0
rm $wavfile $normalfile
... and is a bit more involved.
Multi-pass encoding of video means that we ask the encoder to first
encode the file but store some statistics into a temporary file
(/tmp/vp8-multipass
, in our script), which the second pass can then
reuse to optimize the transcoding. Since DV uses different ways of
encoding things than does VP8, we also need to do a color space
conversion (ffmpegcolorspace
) and deinterlacing (deinterlace
), but
beyond that the video line in the first gstreamer pipeline isn't very
complicated.
Since we're going over the file anyway and we need the audio data for
sox, we add a tee
plugin at an appropriate place in the audio line in
the first gstreamer pipeline, so that we can later on pick up that same
audio data an write it to a wav
file containing linear PCM data.
Beyond the tee, we go on and do a vorbis encoding, as is needed for the
WebM format. This is not actually required for a first pass, but ah
well. There's some more conversion plugins in the pipeline
(specifically, audioconvert
and audiorate
), but those are not very
important.
We next run sox --norm
on the .wav
file, which does a fully automated
audio normalisation on the input. Audio normalisation is the process of
adjusting volume levels so that the audio is not too loud, but also not
too quiet. Sox has pretty good support for this; the default settings of
its --norm
parameter make it adjust the volume levels so that the
highest peak will just about reach the highest value that the output
format can express. As such, you have no clipping anywhere in the file,
but also have an audio level that is actually useful.
Next, we run a second-pass encoding on the input file. This second pass uses the statistics gathered in the first pass to decide where to put its I- and P-frames so that they are placed at the most optimal position. In addition, rather than reading the audio from the original file, we now read the audio from the .wav file containing the normalized audio which we produced with sox, ensuring the audio can be understood.
Finally, we remove the intermediate audio files we created; and the
shell script which was generated by perl also contained an rm
command
for the intermediate .dv file.
Some of this is pretty horrid, and I never managed to clean it up enough
so it would be pretty (and now is not really the time). However, it Just
Works(tm), and I am happy to report that it continues to work with
gstreamer 1.0, provided you replace the ffmpegcolorspace
by an equally
simple videoconvert
, which performs what ffmpegcolorspace
used to
perform in gstreamer 0.10.