Digital security is a constantly evolving arms race between fraudsters and security technology providers. In this race, fraudsters have now acquired the weapon of Artificial Intelligence (AI) that has posed an unprecedented challenge to solution providers, businesses and consumers. There are several technology providers, including Pindrop, that have claimed to detect audio deepfakes consistently. NPR, the leading, independent news organization, put these claims to the test. NPR recently ran an experiment under the special series “Untangling Disinformation”, to assess whether current technology solutions are capable of detecting AI generated audio deepfakes on a consistent basis.
While various providers participated in the experiment, Pindrop® Pulse emerged as the clear leader, boasting a 96.4% accuracy rate in identifying AI-generated audio1. The NPR study included 84 clips of five to eight seconds each. About half of them were cloned voices of NPR reporters and the rest were snippets of real radio stories from those same reporters.
Pindrop Pulse liveness detection technology accurately detected 81 out of the 84 audio samples correctly, translating to a 96.4% accuracy rate. In addition, Pindrop Pulse detected 100% of all deepfake samples as such. While other providers were also evaluated in the study, Pindrop emerged as the leader by demonstrating that its technology can reliably and accurately detect both deepfake and genuine audio.
A few additional notes on these results
- The voice samples evaluated in the study were relatively short utterances of 6.24 seconds. With slightly longer audio samples, the accuracy would increase even further2.
- Pindrop Pulse was not trained previously on the PlayHT voice cloning software that was used to generate the audio deepfakes in this study. This is the scenario of zero-day attack or “Unseen” models that we highlighted in a previous study. This showcases Pindrop® Pulse unmatched accuracy, one of the main tenets of our technology. On known voice cloning systems, our accuracy is 99%3. In fact, Pulse is constantly evolving and is being trained on new deepfake models which ensures that its detection accuracy continues to increase4.
- The audio samples used in this study are very difficult for humans to detect5. But Pindrop was still able to detect these deepfakes with a 96.4% accuracy.
- Pindrop Pulse is a Liveness detection solution that identifies whether an audio is created by using a real human voice or a synthetic one. If Liveness detection is combined with multiple factors such as voice analysis, behavior pattern analysis, device profiles and carrier metadata, the deepfake detection rate would be even higher2.
- The three audio samples that Pindrop missed, do not present a security threat since those were genuine voices. In typically authentication applications, individuals would have a second chance to authenticate into the systems using other factors.
The study also put a spotlight on several tenets that security technology providers should follow to improve their deepfake detection accuracy, such as training artificial intelligence models with datasets of real audio and fake audio, making their systems resilient to background noise and audio degradations and training their detectors on every new AI audio generator on the market.
Pindrop® Pulse is built on these core tenets6 and is committed to keeping our solutions ahead in the race of stopping audio deepfakes and fraud. Pindrop provides peace of mind for businesses in an era of uncertainty. We’re grateful for the trust and support from our team, customers, and partners, propelling us forward in security innovation.
1. https://www.npr.org/2024/04/05/1241446778/deepfake-audio-detection
2. Pindrop Labs research on deepfake detection accuracy
3. https://www.pindrop.com/blog/unmatched-performance-pindrops-liveness-detection-and-the-waterloo-study
4. https://www.pindrop.com/products/liveness-detection
5. https://synthical.com/article/c51439ac-a6ad-4b8d-82ed-13cf98040c7e
6. https://www.pindrop.com/products/pindrop-pulse