Voice cloning attacks powered by generative artificial intelligence have moved from Hollywood special effects to everyday weapon for fraud. What once required hours of audio samples and technical expertise now takes seconds and costs nearly nothing. But when employees can no longer trust the person on the other side of the line, what’s left to catch the fraud? Fortunately, the countermeasures are simpler than you might expect.
Attackers Can Clone Any Voice in Minutes
Launching an AI voice clone attack is so straightforward that we can walk through the entire process in this short section:
- Obtain a voice sample: Attackers need surprisingly little audio to work with. According to McAfee Labs research, just three seconds of audio can produce a clone with 85% voice accuracy. Ten to thirty seconds yields professional-quality results. The necessary sample can often be found in YouTube videos, podcast appearances, and social media clips. For executives who haven’t left a public audio footprint, attackers might simply call the target’s office with an innocent question and record the answer.
- Generate the clone: Commercial voice cloning platforms have made this step trivially easy. For example, ElevenLabs offers instant voice cloning starting at $5 per month and Resemble AI can create high-quality clones from just ten seconds of audio. For those unwilling to pay, open-source tools like Real-Time-Voice-Cloning on GitHub can produce convincing results from five seconds of speech. The user uploads an audio file, waits a few minutes, and receives a voice model that can say anything they type.
- Deploy the attack: The simpler but less effective method uses pre-recorded audio. The attacker types out a script, generates the fake audio, and plays it during a phone call or sends it as a voicemail. More sophisticated attacks use real-time voice conversion, where the attacker speaks naturally while software transforms their voice into the target’s voice instantaneously. Tools like Deep-Live-Cam enable this capability and have tutorials circulating on criminal forums. Real-time conversion allows attackers to respond naturally to questions and objections, which makes detection far more difficult.
The tools that make AI voice cloning attacks possible are easier to use and more capable than ever before, but they’re not exactly new. The first major AI voice fraud case happened in 2019. It involved criminals cloning the voice of a German CEO in 2019 and using it to call the head of the company’s UK subsidiary. The British executive heard his boss’s familiar voice urgently requesting a €220,000 wire transfer to a supplier. He complied, and the money vanished into accounts in Hungary and Mexico before anyone realized what had happened.
A more recent example of an AI voice cloning attack comes from 2024. An employee at British engineering firm Arup joined what appeared to be a routine video call with the company’s CFO and several colleagues. The only problem was that every person on that call was an AI-generated deepfake. Over the course of a week, the employee authorized 15 transactions totaling $25.6 million, which remains the largest documented deepfake fraud to date.
These high-profile cases represent just the tip of the iceberg. Deepfake-enabled fraud (which includes AI voice cloning attacks) has surged by 1,300% in 2024 alone, and security researchers now detect one deepfake attempt every five minutes. Small and mid-sized businesses shouldn’t assume their size makes them less attractive targets. In fact, SMB employees experience 350% more social engineering attacks than their enterprise counterparts, largely because attackers know smaller organizations have fewer security resources and less formal verification procedures.
What Works Against AI Voice Cloning Attacks
Traditional cybersecurity tools are designed to catch malicious code, suspicious links, and dangerous attachments. AI voice cloning attacks contain none of these things. The “payload” is simply spoken words delivered through a legitimate phone call.
Another problem is that we as humans are wired to trust familiar voices. When an employee receives an email from their CEO, their brain remains in analytical mode, so it’s easy to scan for common red flags associated with phishing. But when that same employee hears their CEO’s voice on the phone, recognition triggers trust automatically, and it can be very difficult to watch for subtle signs of AI voice cloning as the person on the other end speaks.
Since reliable detection remains an unsolved problem even for researchers, the FBI, NSA, and CISA jointly recommend building verification procedures that work regardless of how convincing the fake appears.
Use Caller ID as a Starting Point
Caller ID provides a useful first filter. For example, a call supposedly from your CFO that displays an international number or an unknown area code should always be avoided. That said, spoofing caller ID is trivial, so caller ID shouldn’t be trusted on its own.
Promote Callback Verification
The simplest defense against voice cloning is to hang up and call back using a number you already have on file. Attackers can spoof caller ID and clone voices, but they cannot intercept calls to phone numbers they don’t control. To make callback verification easy, organizations should maintain verified contact lists specifically for this purpose, kept separate from incoming requests.
Make It Okay to Question the CEO
Most voice cloning attacks succeed because employees feel uncomfortable pushing back on someone who sounds like their boss. The solution is to establish verification as standard procedure, as a 2024 incident at Ferrari demonstrates.
When scammers targeted Ferrari in July 2024 using a cloned voice of CEO Benedetto Vigna, the targeted executive asked a simple verification question the attacker couldn’t answer: “What book did you recently recommend to me?” The scammer hung up immediately.
Leadership teams should establish code words or challenge questions in advance. Security experts recommend choosing phrases that can’t be guessed from publicly available information (so no pet names, birthdays, or anything discoverable on social media).
Verify Through a Different Channel
Out-of-band verification means confirming a request through a different communication method than the one used to make it. If you receive a phone call requesting a wire transfer, verify via email, Slack, or text. If the request came by email, pick up the phone.
The Center for Internet Security recommends establishing explicit policies that require out-of-band verification for all financial transactions and sensitive data requests because compromising two separate channels simultaneously is significantly harder than fooling someone on a single call.
Implement Multi-Person Authorization
Requiring more than one person to approve high-value transactions eliminates the single point of failure that voice cloning attacks exploit. An attacker might successfully fool one employee, but deceiving two people independently becomes exponentially harder.
In fact, many cyber insurance policies already mandate dual authorization controls for coverage, so implementing multi-person authorization proactively can strengthen your coverage position while reducing your actual risk.
Set Clear Thresholds and Document Procedures
Verification protocols only work if everyone knows when to apply them. Organizations should establish specific dollar thresholds above which callback verification and multi-person authorization become mandatory, regardless of who appears to be making the request.
These policies need to be written down, communicated clearly, and enforced consistently. An informal understanding that “big transfers need approval” isn’t enough when an attacker creates artificial urgency.
Conclusion
AI voice cloning attacks will only become more convincing and accessible, but the countermeasures are straightforward. The only challenge is to implement the before implement them before your organization becomes a case study in what went wrong. That’s something we at OSIbeyond can help with. Schedule a conversation to discuss your security posture and how we could help your organization become better protected against AI voice cloning and other modern cyber threats.