For future reference (to myself, for the most part):

ffmpeg -i foo.webm -i foo.en.vtt -i -map 0:v -map 0:a \
  -map 1:s -map 2:s -metadata:s:a language=eng -metadata:s:s:0   \
  language=eng -metadata:s:s:1 language=nld -c copy -y           \

... is one way to create a single .webm file from one .webm input file and multiple .vtt files. A little bit of explanation:

  • The -i arguments pass input files. You can have multiple input files for one output file. They are numbered internally (this is necessary for the -map and -metadata options later), starting from 0.
  • The -map options take a "mapping". With them, you specify which input streams should go where in the output stream. By default, if you have multiple streams of the same type, ffmpeg will only pick one (the "best" one, whatever that is). The mappings we specify are:

    • -map 0:v: this means to take the video stream from the first file (this is the default if you do not specify any mapping at all; but if you do specify a mapping, you need to be complete)
    • -map 0:a: take the audio stream from the first file as well (same as with the video).
    • -map 1:s: take the subtitle stream from the second (i.e., indexed 1) file.
    • -map 2:s: take the subtitle stream from the third (i.e., indexed 2) file.
  • The -metadata options set metadata on the output file. Here, we pass:

    • -metadata:s:a language=eng, to add a 's'tream metadata item on the 'a'udio stream, with name language and content eng. The language metadata in ffmpeg is special, in that it gets automatically translated to the correct way of specifying the language in the target container format.
    • -metadata:s:s:0 language=eng, to add a 's'tream metadata item on the first (indexed 0) 's'ubtitle stream in the output file. This too has the english language set
    • `-metadata:s:s:1 language=nld', to add a 's'tream metadata item on the second (indexed 1) 's'ubtitle stream in the output file. This has dutch set as the language.
  • The -c copy option tells ffmpeg to not transcode the input video data, but just to rewrite the container. This works because all input files (WebM video plus VTT subtitles) are valid for WebM. If you do not have an input subtitle format that is valid for WebM, you can instead limit the copy modifier to the video and audio only, allowing ffmpeg to transcode the subtitles. This is done by way of -c:v copy -c:a copy.
  • Finally, we pass -y to specify that any pre-existing output file should be overwritten, and the name of the output file.