Nearly six months ago, we launched our PindropⓇ PulseTM solution, a cutting-edge deepfake detection technology for our enterprise customers to help detect AI-generated voices in their call centers. Since then, we have collaborated with news organizations, governments, the music and entertainment industry, and corporate security teams to assess hundreds of suspected deepfakes. From AI-generated robocalls aimed at voter suppression to sophisticated smear campaigns, and from general misinformation in conflicts worldwide to attempts to distort public perception—each case underscores the critical need for robust deepfake detection mechanisms.
The implications of these deepfakes are profound: they threaten the integrity of news organizations, social media platforms, and elections worldwide. The potential for misinformation to sway public opinion and disrupt social order is a stark reality that we now face.
In response to these grave threats, we’re thrilled to announce PindropⓇ PulseTM Inspect in Preview, an Audio Deepfake Detection Solution to assist fact-checkers, misinformation experts, security departments, trust and safety teams, and social media platforms. As a forensics tool, Pindrop Pulse is designed to detect AI-generated speech in audio or video media, including both digital media (e.g., deepfakes on social media) and phone call media (e.g., voicemails). Users log into the web application, upload their media files, and within seconds, receive a determination on whether the content contains AI-generated speech. Additionally, users can integrate the Pindrop Pulse award-winning deepfake detection technology programmatically into their own workflows via our simple-to-use APIs.
A Rapidly Growing Problem
Simply stated, ‘deepfakes’ are AI-altered images, text, video, and audio files.
Specifically for speech, this means creating highly realistic audio clips that can convincingly mimic someone’s voice by training an AI-model from their publicly available speech.
This problem is growing for several reasons. First, the technology has advanced so significantly that the quality of synthetic speech is remarkably high. Second, commercial platforms offering these services have become incredibly affordable. And, the number of available tools for deepfake creation, i.e. Text-to-Speech (TTS) and Speech-to-Speech (STS) have exploded over the past two years that there are now close to 2000 open source Text-to-Speech tools on Huggingface alone.
Humans are notoriously bad at detecting deepfakes. In a study, humans were only able to detect fake audio 54.5% of the time, and in the real world, distinguishing between genuine and fake audio is even more challenging. Scammers who are creating these deepfakes are becoming increasingly sophisticated, often adding background noise or music, or using very short clips of speech to make detection more challenging. These fraudsters are continuously evolving their techniques, making it imperative for us to stay one step ahead in the fight against misinformation.
Over the past 13 years, Pindrop has built a platform based on real-time analysis of +5 billion audio interactions. We have over 270+ patents on voice and security, and 25 patents on audio deepfake detection alone. Today, we’re proud to package our experience and technology into a tool that helps combat the most deceptive audio deepfakes, particularly for the news media or organizations that rely on the accuracy of their content to maintain customer trust and the credibility of their organization.
Good AI to Fight Bad AI in the Media
Pindrop has partnered with some of the market and technology leaders fighting misinformation online. For example, TrueMedia.org was among the first adopters to test our solution in their workflows and reported that the Pindrop Pulse audio deepfake detection had better accuracy than other alternatives in detecting synthetic speech.
According to Oren Etzioni, CEO of TrueMedia.org,“TrueMedia.org is a non-profit, non-partisan AI project to fight disinformation in political campaigns by identifying manipulated media. Our comprehensive evaluation found Pindrop’s audio deepfake detection has better accuracy than other alternatives in detecting synthetic speech. We are excited to partner with Pindrop in this mission, and add Pindrop’s deepfake detection technology in the solution for our customers and users across the world.”
Pulse Inspect offers trust and safety teams a forensics tool to enhance their disinformation detection workflows.
- Best-in-class Performance: Pindrop has trained its deepfake detection model on over 370 deepfake generation tools with over 20M statements (both genuine and synthetic), enabling us to achieve over 99% accuracy against previously seen deepfake models and 90% of “zero-day” attacks that use new or previously unseen tools. We’ve also had third parties confirm that our solution had over 40 percentage points higher accuracy than competing solutions on audio.
- Resilience: News and social media are global businesses and need support to detect deepfakes across various languages. PindropⓇ PulseTM Inspect is language agnostic and its underlying training models have been tested and validated on over 40 languages that cover over 90% of the internet’s spoken languages. This technology offers resilience to adversarial attacks such as addition of noise, reverberance or speech changes.
- Breadth of Audio: The same Pindrop Pulse technology that identifies over a million social engineering attempts in the call center has now expanded to digital media. Pulse Inspect supports both phone call audio (8kHz) and high-fidelity social media audio (44.1kHz). It also provides detection capabilities irrespective of whether synthetic speech is created using text to speech, speech to speech or voice conversion techniques.
- Video Support: Pulse Inspect supports audio deepfake detection in videos. The platform analyzes video files for AI-generated speech by extracting audio content out of video media types.
Explainability: Pulse Inspect offers segmental analysis of uploaded media to aid in the detection of partial deepfakes. This feature provides a visual indicator to users to help determine which segment in a long-form media file is synthetically generated vs. segments which most likely do not contain synthetic speech.
Free trial
With Pulse Inspect in Preview, we invite those who are responsible for identifying and reporting on deepfakes to evaluate our technology, at no cost.
Request access to a free trial here.
1. https://www.pindrop.com/blog/pindrop-named-a-winner-in-the-ftc-voice-cloning-challenge
2. https://synthical.com/article/c51439ac-a6ad-4b8d-82ed-13cf98040c7e
3. https://www.pindrop.com/blog/exposing-the-truth-about-zero-day-deepfake-attacks-metas-voicebox-case-study
4. In the NPR study, Pindrop detected 81 out of possible 84 (96.4%) voice samples correctly, compared to the nearest competitor who detected 47 out of 84 (56% – excludes samples identified as inconclusive).
5. Statista: Languages most frequently used for web content as of January 2024
6. Terms and conditions apply.