The Covid-19 lockdown period has seen an explosion of creativity from musicians sitting at home and wondering what to do with all of this spare time. Multi-track video has quickly become the dominant form, from those featuring a single performer to huge, international team efforts. Sophie and I have made or taken part in quite a few, including the overture of Nicole Lizée's Immersive Mozart Mystery Opera No One's Safe, a year after we took part in workshopping and performing it at Banff Centre.
I've learned a lot about how to do this properly: most importantly, you don't want to be faffing around lining things up by ear, or even visually in the case of the waveforms produced by some instruments! It's also not ideal to be trying to decide whether something appears out of sync because it actually is, or just because it wasn't played perfectly in time. You need a fixed, precise reference. Some of your participants will be recording/filming on a phone - mixing phone audio is a topic in itself! - while others may be filming on a phone and recording (hopefully better) audio separately, using anything from a handheld recorder to a full DAW setup. So you're conceivably going to be synchronising all the individual videos within your video editing software, all the audio files (or audio extracted from videos) in your DAW, and then ultimately syncing your audio mix with the video edit. This can very easily result in a great big mushy mess.
Assuming you have created some sort of click or guide track, or that you are recording your single-performer video to a click, there are a two main ways to get video and audio files that can be synced easily, and one is clearly better than the other.
Method 1: Visual and Sonic 'Slate'
If a performer is recording audio and video separately, you need some way to line the two up. The simplest, old-fashioned way is to have them do something that produces both a sharp transient in audio and a corresponding visual moment - the equivalent of the clapperboard used in film production, for example clapping, or tapping their music stand or other nearby item with a pencil. Ideally, this transient will be visible in both the audio recording and the camera audio; if not, the audio recording can at least be lined up visually with the moment the two items connect.
But this on its own does nothing to help you line up contributions from several performers. For this, the percussive event(s) must occur at precise known points against the backing track. For my first few efforts, I supplied a backing track with several bars of click as a count-in - let's say 3 bars. I'd ask the performers to clap or tap as precisely as they could in time with the click during the second bar: the thinking being that they have one bar to get used to the tempo, one bar to synchronise with it and one more to compose themselves and look ready for the performance itself.
Yeah, it's not a great approach. It relies on the performer's time being immediately precise, which is made even less likely if you've put them on the spot by making a fuss about the importance of getting this right. Also, it doesn't necessarily correspond with the way individual musicians usually do their thing as different muscles may be involved - this is especially true with singers.
Method 2: Spill is your friend!
The simplest way to get perfect sync between all these elements is to supply a backing track with a click count-in, and to ask each performer to make sure the beginning of this spills into the room when they record. Unless the performer is using a DAW, they will probably be playing the guide track on one device while recording/filming on another. It's obviously crucial that they do use headphones rather than speakers for the performance itself, but for syncing up the beginning, SPILL IS YOUR FRIEND.
Set up a playback device which will play out of speakers unless headphones are connected.
Start recording on the phone/camera and any separate audio recorder.
Start playback of the backing track
After the first bar of click, connect the headphones and perform.
The beauty of this approach is that both the phone/camera audio track and any separate audio will have very clearly identifiable click 'spikes' corresponding with the click in the file you sent out, which can be lined up in your DAW and/or video editor. Easy!
Pro Method 2b: Rig up a talkback speaker
If you're using a DAW to capture audio while filming using a camera or phone, and your interface has a spare output, connect a talkback speaker to its own output. A guitar 'micro amp' works perfectly for this purpose.
Create a single bar of click in the first bar of count-in, on its own track; this can be dubbed from the main DAW click, or created using a percussive sound on a MIDI track. Set this track to output to the talkback speaker, nice and loud, and close to the camera while not in shot. Every video file you record will contain these click spikes. These can all be lined up in in the video editing software. The final DAW mix should contain these clicks too - then simply import this into the video edit, line up the clicks, trim off the beginning and you're done.
This is actually a great way to guarantee sync between any video and audio recorded in a DAW - even single shots.