A group of researchers at mit, Microsoft and Adobe have developed an algorithm that can reconstruct audio signals by analyzing vibrations of objects depicted in video.
In particular, scientists in their experiments were able to recover the sounds produced by the bag of chips in the bag, based on video through soundproof glass from a distance of 15 feet. In other experiments, the engineers recreated the audio signals from the aluminum foil, the surface of the water in the glass and the noise from the leaves of the plants.
One of the researchers, Abe Davis said: "When sound hits an object, it causes the object to vibrate. The motion of this vibration creates a very subtle visual signal that is usually invisible to the naked eye. People do not even realize that this information was there."
Restore audio from video requires that the number of frames of video captured per second was higher than the frequency of the sound signal. In their experiments, the researchers used high-speed camera that captured 2,000 to 6,000 frames per second. This is much faster than 60 frames per second with some smartphones, but well below the frame rates of commercial high-speed cameras with 100000 frames per second.
In other experiments they used an ordinary digital camera. The special design of the sensor most cameras have helped to bring information on high frequency vibrations even from video recorded at a standard 60 frames per second. Although this audiokonstrukte was not ideal, as it was with the high-speed camera, it was still good enough to determine the sex of the speaker in the room.