High Speed Vibration Analysis

Synthesizing audio from 7200 fps video with the Ubicept solution

We’ve had an exciting week after being mentioned in TechCrunch! If you’re one of the many folks who discovered us through the article and reached out: thank you! We’re working as quickly as possible to get back to everyone.

One interesting aspect of getting media coverage is seeing the reactions of strangers on the internet. Recently, we stumbled upon a social media discussion that compared our solution to the recently released Raspberry Pi Global Shutter Camera:

The Raspberry Pi Global Shutter Camera is a specialised 1.6-megapixel camera that is able to capture rapid motion without introducing artefacts typical of rolling shutter cameras. It is ideally suited to fast motion photography and to machine vision applications, where even small amounts of distortion can seriously degrade inference performance.

First of all, this is great! Many of our team members worked on computer vision (both in academia and industry) prior to joining Ubicept, so we very much appreciate the benefits of global shutter cameras.

With that said, having a global shutter doesn’t always translate to high-speed performance. All it guarantees is that motion blur isn’t distorted. That alone is quite helpful for machine vision applications, but sometimes, it isn’t enough.

Our solution works quite differently. As mentioned by our co-founders in the TechCrunch article, we don’t work with conventional “frames” and instead use single-photon avalanche diode (SPAD) arrays to record the arrival times of individual photons. For example, our camera might record that a photon landed on pixel 117,421 at time 0.6129887 seconds. You could interpret this as a frame, but this is how it would look:

A “frame” from a SPAD array. Don’t worry, your display doesn’t have a hot pixel! That’s just a dot representing where a photon landed.

The “Ubicept magic” is about making sense of billions of data points like this and generating results which are optimized for specific applications. Because of how raw this data is—much more so than raw files from conventional cameras—we can essentially implement the capabilities of a low-light camera, a rolling-shutter camera, a global-shutter camera, a wide-dynamic-range camera, an event camera, and so on—all in software!

Case in point: one of our team members saw the Raspberry Pi demo video and wanted to know how our technology performed in the same situation:

He first tried capturing a similar demo with the 240fps “slo-mo” mode on his current-generation smartphone—this was taken indoors with typical room lighting on an overcast afternoon:

Our results were similar initially, but only because the evaluation kit was running our “automotive mode” which was tuned for low light and wide dynamic range. Yes, we’ve shown high frame rates before, but we needed much more to capture the vibrations of guitar strings.

So, one of our engineers adjusted some settings and sent over an “ultra high speed camera mode” software update. And here are the results:

That’s the equivalent of a 7200 fps global-shutter camera! The spatial and temporal resolution here is actually sufficient to synthesize telephone-quality audio of the individual strings:

We’re not trying to argue that our solution should replace a proper microphone, of course, but this demo highlights how it can be used for highly-targeted long-distance vibration analysis—or many other specialized perception tasks—with a simple software update.

So, if you're facing a computer vision problem that can't be solved using conventional cameras, don't hesitate to reach out to us about our evaluation kits or onsite consultations. We can help you optimize our solution for your specific needs, whatever they may be!


Note for engineers and researchers: we are aware of sophisticated techniques that exploit rolling shutters and subpixel information to extract audio information from video. Our intention here was not to show a better way of doing this. In fact, we chose the most naive approach imaginable, which basically amounted to reinterpreting small crops of the video file (which we showed in our demo videos above) as audio files.

Would a more sophisticated algorithm produce better results? Absolutely! But our goal at Ubicept isn’t to be experts at every specialized computer vision task. Instead, we want to empower those experts to achieve levels of perception that are difficult or impossible with conventional camera systems.

Previous
Previous

Counting and Inspecting Small Objects

Next
Next

Capturing Trains in Motion