Articles

How Can AI Agents Perform Common Voice Scams?

logo
Laura Fitzgerald

author • 25th February 2025 (UPDATED ON 02/25/2025)

7 minute read time

Advances in artificial intelligence (AI) have changed how we interact with technology, but they have also opened new avenues for fraud.

For instance, phone scams, which are already problematic in the U.S. and globally, have advanced into a more sophisticated threat aided by AI, targeting individuals and businesses with greater precision.

Voice-enabled AI agents, powered by widely accessible AI tools, can now perform common scams at scale. Meanwhile, worldwide adoption of voice-enabled AI agents is projected to reach USD 31.9 billion from 2024 to 2033.

In this article, we’ll explore how and why voice-enabled AI agents are used to perform common scams, why this presents a growing concern, and what steps you can take to protect yourself and your organization.

Why are voice-enabled AI scams a growing concern today?

The most significant risk posed by voice-enabled AI is how effectively these attacks can scale. Previously, fraudsters would collaborate to form fraud rings targeting several banks or other institutions simultaneously. However, the availability of advanced AI tools alongside realistic voice cloning technology has shifted their strategies.

An individual fraudster can now orchestrate attacks of equal or greater magnitude using a generative AI toolkit. This can involve:

  • Creating multiple synthetic (artificially generated) voices to interact with targets.
  • Training an AI model to have automatic and simultaneous conversations with a target, such as a contact center agent. 
  • Calling and socially engineering multiple organizations simultaneously.
  • Avoiding detection by voice recognition systems, as synthetic voices mimic natural human inflection.

The fraudster does not have to use their authentic voice. The low cost and ease of access to these tools make such attacks highly accessible. While these methods are imperfect, advancements in open-source AI tools suggest that large-scale, targeted attacks are highly likely.

Common phone scams explained

In the context of phone scammers targeting companies and contact centers, there are a few common examples of how they active this with AI:

  • Synthetic account reconnaissance: A fraudster utilizes a synthetic (artificially created) voice to collect account details and maneuver through a company’s interactive voice response (IVR) system. After obtaining the target’s account information, the fraudster contacts the contact center agent, pretending to be the victim, to take control of the victim’s account.
  • Using synthetic voice for authentication: The fraudster uses machine-generated voice to circumvent IVR authentication for selected accounts. They correctly respond to security questions and provide one-time passwords (OTP). The individuals orchestrating the attack follow up to carry out the fraud with the contact center agent.
  • OTP phishing: The fraudster initiates multiple calls with a synthetic voice to instruct a call contact agent to alter the victim’s information, such as their email or mailing address. Once this change is made, the fraudster can receive the OTP or request a new card sent to their address.
  • Voice spoofing or impersonation: The fraudster develops and trains a voice bot to replicate the intended target’s voice, including that of an organization’s IVA agent. The voice bot collects internal information from organizations, including employee details, allowing it to evade fraud detection methods.

How voice-enabled agents perform common scams

Agent design and architecture

AI agents utilize advanced Text-to-Speech (TTS) tools to replicate realistic human voices. With access to public speech samples via social media or the dark web, fraudsters train these tools to mimic a person’s vocal nuances. The result? AI-generated voices that sound indistinguishable from their real counterparts.

Customer impersonation

Fraudsters impersonate individuals using stolen personal information such as names, addresses, phone numbers, and account details. Over 300 million records were compromised in 2023 alone, and this information is readily available on the dark web. Combined with TTS tools, fraudsters craft compelling synthetic voices and scenarios.

Ability to answer complex questions

AI models trained on stolen data can carry believable conversations, especially when combined with realistic-sounding synthetic voices. These agents can:

  • Navigate complex queries.
  • Provide convincing answers based on the victim’s leaked information.
  • Adapt to real-time conversational changes, making detection increasingly tricky.

The other side of the equation is humans’ limit to distinguishing between AI and human voices. Studies show we are only 54% accurate in identifying deepfake audio, and this number is expected to decline as AI technology advances.

The dangers of voice-enabled agents in authentication

Identity verification and authentication solutions have been considered effective and secure for a long time, but that is changing. According to Gartner, by 2026, 30% of enterprises will consider identity verification solutions unreliable due to the rise of AI-generated deepfakes. But why is this happening?

One reason is the rapid advancement in digital injection attacks, where AI-generated deepfakes bypass current standards for presentation attack detection (PAD). While PAD mechanisms in face biometrics assess a user’s liveness, these systems are not equipped to handle digitally injected synthetic media, making them vulnerable to AI-powered deception.

Voice biometrics authentication systems also face limitations in detecting synthetic voices. While they can identify some threats and have some protection, which is better than none, they are insufficient as a standalone solution. However, combining voice analysis with liveness detection and other authentication factors drastically improves reliability.

​​According to Pindrop’s response to the University of Waterloo study, this combined approach increases detection accuracy to an unmatched 99.2%, even against sophisticated signal-modified deepfakes. Pindrop’s Liveness Detection technology outperformed leading benchmarks, demonstrating superior performance against adversarially modified spoofed utterances.

This exceptional accuracy highlights the importance of leveraging multi-layered solutions to mitigate the growing risks posed by voice-enabled agents in authentication processes.

How to protect yourself from common phone scams

Protecting yourself and your organization requires a multi-layered approach. Combining advanced technologies with awareness and best practices can significantly reduce the risk of falling victim to these scams. Here’s how:

Caller authentication

Modern authentication systems can identify anomalies indicating synthetic voices by analyzing vocal features alongside other data points.

Deepfake fraud detection software

Advanced deepfake detection software for call centers is a vital component in combating AI voice scams. These tools analyze subtle features in audio recordings to identify signs of synthetic generation, such as:

  • Inconsistent tone or pitch.
  • Artifacts from audio processing.
  • Content-agnostic patterns that differ from natural human speech.
  • Real-time alerting for live risk scoring of every call.

Multifactor authentication (MFA)

MFA provides a layer of security by requiring users to verify their identity through multiple independent factors. These include:

  • Something you know: Security questions or PINs.
  • Something you have: A one-time password (OTP) sent to a mobile device or email.
  • Something you are:  A fingerprint, face recognition, or voice analysis.

MFA is designed to ensure that additional barriers protect sensitive accounts and systems even if one layer is compromised. For example, with the Five9 + Pindrop® integration, businesses can streamline MFA and fraud detection processes. This integration enables quick and secure authentication of inbound calls, enhances automation in IVAs, and detects fraudulent activity in real-time, making it an invaluable tool for safeguarding customer interactions.

Protect your business from common scams with Pindrop Solutions

Voice-enabled AI scams are an evolving threat, but with innovative Pindrop solutions, you are better positioned to stay ahead. Pindrop® Pulse and Pindrop® Pulse™ Inspect leverage cutting-edge liveness detection software to analyze vocal features unique to humans, effectively identifying synthetic voices and mitigating the risks of AI-generated fraud.

  • Pindrop® Pulse™ Tech: Specifically intended for contact centers, this liveness detection solution enhances security and must be paired with Pindrop® Passport or Pindrop® Protect to deliver multifactor authentication and fraud detection capabilities.
  • Pindrop® Pulse™ Inspect: A standalone liveness detection solution tailored for media companies to determine if audio is synthetic or human, helping you restore integrity before distribution.
  • Pindrop® Passport provides comprehensive multifactor authentication by combining voice analysis with additional layers of security to support verification of user identities.
  • Pindrop® Passport provides comprehensive multifactor authentication by combining voice analysis with additional layers of security to support verification of user identities.

With these tools, Pindrop Solutions offer a robust defense against AI-driven scams, empowering businesses to better protect their operations and maintain customer trust. Ready to experience the difference? Request a demo today.

Voice security is
not a luxury—it’s
a necessity

Take the first step toward a safer, more secure future
for your business.