A while ago, we, the Chaos Computer Club Kassel, flipdot e.V., organized a public event, called the hackumenta. Our talks were recorded and we wanted to published them at meda.ccc.de. Speaker and slides were separately recorded and need to be synced, merged, and cut, so I've created a little ffmpeg-toolchain.
This is the result:
The final toolchain is available at github.com/flipdot/0xA-voc-toolchain (subject to change).
It works by consecutively executing the numbered bash scripts. But first, set
WORKING_DIR in settings.env.
Install all required packages:
$ sudo apt install ffmpeg sox audacity mpv libxi-dev mesa-common-dev $ nvm use 8 $ npm install -g https://github.com/transitive-bullshit/ffmpeg-concat.git#feature/optimize-frames
$ git clone https://github.com/flipdot/0xA-voc-toolchain.git $ cd 0xA-voc-toolchain $ vim settings.env $ ./1_preprocess_raw_files.sh $ ./2_extract_audio.sh $ ./3_align_inputs.sh $ ./4_create_time_marks.sh $ ./5_combine_videos.sh $ ./6_add_intro_outro.sh
Each script will iterate through all directories inside
WORKING_DIR and execute commands
inside of them.
1. Preprocess raw files
Let's begin by taking a look at our raw video material:
INPUT_DIR ├── 01_Opening │ ├── cam.MP4 │ ├── intro.mp4 │ ├── screen_raw.avi │ └── title.txt ├── 02_CUDA_Basics | ├── cam_01.MP4 | ├── cam_02.MP4 │ ├── intro.mp4 | ├── screen_raw.avi | └── title.txt ├── background.mp4 └── outro.mp4
Due to our recording setup, we ended up with multiple "cam_xx.MP4" files, depending on the length of the
ffmpeg -f concat together with a loop for the
-i parameter allows us to combine all
cam_xx.MP4 files into a single
Also, the screen recordings were uncompressed.
ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -c:v h264_vaapi allows us to efficiently re-encode
the recordings as H264.
Let's run the above commands by executing
It will produce these files:
WORKING_DIR ├── 01_Opening │ ├── cam.mp4 │ ├── intro.mp4 │ └── screen.mp4 ├── 02_CUDA_Basics │ ├── cam.mp4 │ ├── intro.mp4 │ └── screen.mp4 ├── background.mp4 └── outro.mp4
2. Extract and normalize audio
The cam- and screen recordings are not in sync and need manual adjustment. To do this adjustment, we first
need an audio file for each cam- and screen recording. By using
ffmpeg -vn, we can remove the video track
To improve the volume of the recordings, we utilize
sox. It normalizes every audiofile:
sox --norm $INPUT $OUTPUT.
we have two new files for each video in our working tree:
WORKING_DIR ├── 01_Opening │ ├── cam_normalized.wav │ ├── screen_normalized.wav │ └── ... └── 02_CUDA_Basics ├── cam_normalized.wav ├── screen_normalized.wav └── ...
3. Align cam and screen recording
The manual adjustment of cam- and screen recording will be assisted by this script. It will start audacity for every talk, opening both wav-files in a single audacity project:
In the upper track you can see the waveform of the cam recording, in the lower track the waveform of the screen recording. Select the “Time Shift Tool” by pressing F5 or clicking on this button:
Now, move the lower track with your cursor until it is aligned with upper track. Pay attention to strong peaks in the waveform, they make the alignment more easy. Playing both files and listening to the audio helps confirming a correct adjustment.
After aligning the tracks, you need to determine the offset. To do so, pick the “Selection Tool” by pressing F1, and click on the left edge of the second track. In the lower section of audacity, the offset is printed below the ”Start and End of Selection” label.
Keep this timestamp in mind and close audacity. In your terminal, the script now prompts you for this timestamp. Enter it.
The timestamp you entered will be validated by a regular expression and saved to a text file:
WORKING_DIR ├── 01_Opening │ ├── screen_offset.txt │ └── ... └── 02_CUDA_Basics ├── screen_offset.txt └── ...
4. Search for start and end of the talk
A bit more manual work is required. Between the start of the recording and the beginning of the talk are a couple of minutes that need to be removed. This script opens mpv with a timestamp in the OSD and asks you to search for the start of the talk.
These are useful hotkeys for mpv:
- , – One frame backward
- . – One frame forward
- [ – Decrease playback speed
- ] – Increase playback speed
After you found the timestamp, close mpv. Once again, you will be asked to enter that timestamp. After you entered it, mpv will open again - but it will start playing from the timestamp you entered. This allows you to make sure that the timestamp is set correctly and no words are cut off.
The same procedure will be repeated for the end of the talk to get both timestamps. After you are done with all talks, your working directory will look like this:
WORKING_DIR ├── 01_Opening │ ├── talk_end.txt │ ├── talk_start.txt │ └── ... └── 02_CUDA_Basics ├── talk_end.txt ├── talk_start.txt └── ...
5. Create a splitscreen of both videos
With the timemarks inside our textfiles, we can now generate a splitscreen with both the cam and the
screen recording, synchronized. We will also add a background animation, which was copied earlier from
In our case, we had one more issue with our recording setup: The audio embedded in the screen recording
is not in-sync with the video of the screen recording. It has a constant delay of about 500ms. We can
fix that by using
ffmpeg -itsoffset $SCREEN_AUDIO_OFFSET. The offset variable is defined inside
./5_combine_videos.sh and lean back. Your working directory will contain a new file
WORKING_DIR ├── 01_Opening │ ├── combined.mp4 │ └── ... └── 02_CUDA_Basics ├── combined.mp4 └── ...
And it will look like this:
6. Add intro and outro
To finalize the video, we want to concat an intro and an outro to each file. FFmpeg filter graphs are
very powerful, but it is pretty difficult to create simple transitions with them. Therefore, we will
be using ffmpeg-concat for this task. We need
to use the
features/optimize-frames branch - otherwise, tons of gigabytes will be consumed in our
/tmp directory. Checkout this pull request –
if it was merged, you can use the master version.
ffmpeg-concat also does not keep the audio of the inputs
(issue #4). We work around this by
ffmpeg yet another time after concatenation, copying back our audio stream. Fortunately,
this does not require de- and encoding of the video. We will also add an
intro.wav, so we will
get a fancy intro sound. This requires re-encoding of the audio, but compared to video-encoding
this is quite fast.
And there you have it – a final
output.mp4 consisting of the intro, talk with cam and screen recording,
The talks are available at media.ccc.de/c/hackumenta19.