Applications such as surveillance and human behaviour analysis require high-bandwidth recording from multiple cameras, as well as from other sensors. In turn, sensor fusion has increased the required accuracy of synchronisation between sensors. Using commercial off-the-shelf components may compromise quality and accuracy due to several challenges, such as dealing with the combined data rate from multiple sensors; unknown offset and rate discrepancies between independent hardware clocks; the absence of trigger inputs or -outputs in the hardware; as well as the different methods for time-stamping the recorded data. To achieve accurate synchronisation, we centralise the synchronisation task by recording all trigger- or timestamp signals with a multi-channel audio interface. For sensors that don't have an external trigger signal, we let the computer that captures the sensor data periodically generate timestamp signals from its serial port output. These signals can also be used as a common time base to synchronise multiple asynchronous audio interfaces. Furthermore, we show that a consumer PC can currently capture 8-bit video data with 1024 × 1024 spatial- and 59.1 Hz temporal resolution, from at least 14 cameras, together with 8 channels of 24-bit audio at 96 kHz. We thus improve the quality/cost ratio of multi-sensor systems data capture systems.
- HMI-MI: MULTIMODAL INTERACTIONS
- EC Grant Agreement nr.: FP7/211486
- EC Grant Agreement nr.: ERC/203143
- Audio recording
- Multisensor systems
- Video recording
Lee, T. (Ed.), Lichtenauer, J., Soatto, S. (Ed.), Shen, J., Valstar, M., & Pantic, M. (2011). Cost-effective solution to synchronised audio-visual data capture using multiple sensors. Image and vision computing, 29(10), 666-680. https://doi.org/10.1016/j.imavis.2011.07.004