How Cybercriminals Can Perform Virtual Kidnapping Scams Using AI Voice Cloning Tools and ChatGPT

original photo effect layer effect layer

Virtual Kidnapping
How AI Voice Cloning Tools and ChatGPT are Being Used to Aid Cybercrime and Extortion Scams
An overview of the elements of virtual kidnapping and how malicious actors use social engineering tactics and abuse AI voice cloning tools and ChatGPT to launch these attacks.


By Craig Gibson and Josiah Hagen

New technologies, such as artificial intelligence (AI) and machine learning (ML), are typically developed to boost productivity, increase efficiency, and make our lives easier. Unfortunately, cybercriminals have also found ways to exploit them for ill gain. Recently, malicious actors have abused AI technology to accurately impersonate real people as part of their attacks and scams.

Cases in which AI technologies have been exploited are becoming increasingly rampant. Earlier this month, the Federal Bureau of Investigation (FBI) warned the public about how cybercriminals use deepfake technology to manipulate benign photos and videos in sextortion schemes, which have been lucrative for cybercriminals. According to the Federal Trade Commission, impostor scams accounted for the second highest reported losses in 2022, which amounted to US$2.6 billion.

One of these impostor scams involves the use of fake, AI-generated voice files, also referred to as deepfake audio, which can be generated using very small amounts of biometric information harvested from personal content published in public sources such as TikTok, Facebook, Instagram, and other platforms, including government portals. AI tools such as VoiceLab can be used to process a person’s voice biometrics, resulting in a deepfake voice that would sound exactly like that specific person. This is also referred to as voice cloning, which happens when voice biometrics are harvested for ransom, extortion, and fraud.

Malicious actors who are able to create a deepfake voice of someone’s child can use an input script (possibly one that’s pulled from a movie script) to make the child appear to be crying, screaming, and in deep distress. The malicious actors could then use this deepfake voice as proof that they have the targeted victim’s child in their possession to pressure the victim into sending large ransom amounts.

This article gives an overview of the elements of virtual kidnapping and how malicious actors use social engineering tactics and abuse AI voice cloning tools and ChatGPT to launch these attacks.

A real-life case of virtual kidnapping

In April 2023, an Arizona-based woman named Jennifer DeStefano reported that an anonymous caller said that he kidnapped her 15-year-old daughter and demanded a US$1 million ransom. If she failed to pay up, her child would be drugged and raped.

According to DeStefano, she clearly heard her daughter’s crying, yelling, and pleading voice in the background, but the caller refused to let her talk to her daughter on the phone.

After a few minutes of negotiation, the ransom amount dropped to US$50,000. Thankfully, before the ransom was paid, the victim was able to verify that her daughter was safe and had not been kidnapped. The matter was immediately reported to the police, who then identified the call to be a common scam.

Virtual kidnapping is an emerging cybercrime that abuses AI technologies to manipulate decision-making processes. Malicious actors exploit AI to introduce negative stimuli to unscrupulously control human emotions for ill gain. In the aforementioned real-life virtual kidnapping case, the malicious actors banked on the feelings of torment and trauma that child abduction victims experience to persuade them to pay up.

To further drive up victims’ feelings of despair and fear, it’s also possible for malicious actors to employ SIM jacking, a scheme wherein attackers gain access to a victim’s mobile number and redirect all calls and data to a device that they control. The SIM-jacked phone number won’t be able to send telemetry data to a security system and accept calls and messages. When virtual kidnappers use this scheme on a supposedly kidnapped person, the phone number becomes unreachable, which can increase the chances of a successful ransom payout.

The elements of a virtual kidnapping attack

As early adopters of emerging technology and fast-rising social platforms, young people and public figures are more prone to having their biometrics harvested for use in virtual kidnapping attacks. Social networking sites such as TikTok, Facebook, and Instagram make it even more convenient for criminals to search for victims and get targeted context to make the scam as believable as possible.

Virtual kidnapping, in essence, is a deception campaign that uses misinformation for the purpose of tricking victims into paying a ransom. Victims don’t just lose money from this scheme, they also suffer from great emotional distress. Even if they don’t pay the ransom and are quickly able to debunk the fraud, believing one’s child has been kidnapped — no matter how momentary — is deeply unsettling to parents. Unfortunately, virtual kidnappers can launch attacks on countless victims (and, sadly, subject all of them to extreme and only need to succeed very infrequently to make a lot of money.

The typical elements of a virtual kidnapping attack are as follows:

  1. Identifying a potential victim (relative of a kidnapee). This is someone who is capable of paying ransom. In the previously mentioned real-life virtual kidnapping case, this would be Mrs. DeStefano.  
  2. Identifying a potential virtual kidnapping victim (kidnapee). In the same real-life virtual kidnapping, this would be the 15-year-old daughter.
  3. Creating a story. The more emotionally manipulative the story is, the more impaired a victim’s judgment and critical thinking would be. It is highly likely that a frightened person will behave with more immediacy and less forethought.
  4. Harvesting voice biometrics from the virtual kidnapping victim’s social media posts. Malicious actors can also harvest a movie actor’s voice from a frightening kidnapping movie scene and use deepfake technology to make audio that sounds like the subject has been kidnapped and is saying words from a movie.
  5. Identifying time and logistic elements. Based on social media updates from the virtual kidnapping victim, the malicious actors will launch the scam when the subject is physically away from the ransom victim for a long enough period. This can hinder the ransom victim from quickly verifying if the child is safe, allowing the attack and ransom payment to go through successfully.
  6. Making the call. The attackers may use free voice modulation software to make their voice more scary or menacing. During the call, the attackers will simultaneously run the deepfake audio of the supposed kidnapee to grant credibility to their request for ransom payment.
  7. Initiating post-call activities. These include, but are not limited to, money laundering of the ransom payment, deleting all relevant files, and discarding of the burner phone used. 

Apart from money laundering, these steps do not require much knowledge or practical skill. Advertising analytics can extract common behavior elements from large populations and attach expected behaviors to the members of these populations. This target sorting and rating method increases the efficiency of the attack. Much of the work in this attack can be further automated with artificial intelligence tools such as ChatGPT.

The abuse of AI-powered chat tools in virtual kidnapping schemes

Aside from AI-powered voice cloning tools, another AI tool — a natural language processing chatbot called ChatGPT — can be abused to bridge attacker skills gaps and help scale what would typically be manual and time-consuming processes in their attack chains. The data processing needed to find victims requires filtering of large bodies of victim data. By using ChatGPT, an attacker can fuse large datasets of potential victims with not only voice and video information but also other signal data such as geolocation via Application Processing Interface (API) connectivity. The data will be sent to ChatGPT, and then to the target. The attacker will receive the target’s response, and the attacker can send a response that’s been improved using ChatGPT. These kinds of advertising analytics are available from a variety of public sources, and can be further filtered and prioritized based on potential revenue and the likelihood of payout. All these can result in a risk-based scoring system for selecting victims, making this type of attack even more profitable and scalable than it currently is.

In the future, or with a large enough research investment at present, malicious actors can even create audio files of ChatGPT texts using a text-to-speech app or software. In doing so, both the attacker and the virtual kidnapping victim (a voice clone of an actual person) are fully virtual. When these virtual files are distributed using a mass calling service, virtual kidnapping can become more effective and far-reaching.

New technologies and approaches and how they could potentially help or hurt virtual kidnapping scammers

Virtual kidnapping elements are similar to that of social network analysis and propensities’ (SNAP) modeling. SNAP attempts to predict which steps or actions people in a social network will most likely take or do. For example, SNAP can be used to create a report that predicts which individuals are most likely to purchase a particular item or service, and then use that report to sell social media advertising services specifically designed to increase the company’s sales or profit.

SNAP is a massively automated and scalable approach in which content (social media advertisements) are provided to targets (people with purchasing power who would most likely buy a company’s products or services) via emotionally driven and aesthetically pleasing posts (which are usually promoted by attractive or relatable celebrities or influencers) that are shown doing things that prompt an action (such as a purchase). These steps are identical to that of virtual kidnapping schemes. In this case, the “purchasing power” is the ability of a parent to pay the ransom, and “aesthetically pleasing” is replaced with aesthetically displeasing components, such as screams, threats, and scary voice-modulated intimidation.

In the future, it’s possible that the criminal use of victim propensity modeling will likely enable the extraction of large lists of potential victims. It will also likely allow scammers to scale and automate their attacks via cold-calling victims, which is like how some businesses attempt to connect to potential buyers. Malicious actors can farm out parts of their operation, such as buying off-the-shelf sim-jacking exploits, purchasing harvested credentials from data breaches, and obtaining money mule services, in the diverse commoditized dark web.

Virtual kidnapping could be thought of as an AI-weaponized scam, which has elements that share similarities with benign marketing tactics and malicious phishing schemes. It is an emerging tier of AI-enabled and emotionally driven extortion scam that will have phases of evolution that are similar to what we saw and are seeing with ransomware attacks. Virtual kidnapping scams rely on voice and video files to extort victims, which are not normally policed by security software.

However, as data-context-aware networks become more sophisticated in the future, security tools may soon be able to take multiple telemetry types and apply “signals” to these high-context abuses. It should be noted that “data-context-aware" means that decisions are made on relationships of data, not simple triggers based on a single value.

For example, a multilayered identity-aware system might be able to determine if virtual kidnapping subjects (the individuals who are supposedly abducted by kidnappers) are moving their phones (which can be detected by the phones’ onboard accelerometer sensor) and are using them consistently or normally — which they won’t be able to do if they’ve been truly kidnapped.

Conclusion and security recommendations

As with any extortion scheme, when victims pay the ransom, they inadvertently encourage malicious actors to continue launching attacks on other victims. Additionally, when victims pay the ransom, their information are added to a list of profitable victims, which are then sold to other attackers. Unfortunately, this leads to more victims suffering from other cyberattacks.

As virtual kidnapping scams become more popular, the movement of traditional ransom techniques used in cybercrime, such as in ransomware attacks, will involve harder-to-block communication paths such as voice and video, and even new environments, such as the metaverse.

Because these high-context communication paths and environments involve a level of abstraction that go beyond what typical “router-level” security solutions can handle, identity-aware antifraud techniques will become increasingly necessary. As more virtual kidnapping attacks happen, the amount of telemetry also increases — this data can be used to make pertinent improvements in security analytics, which could then be executed in identity-aware security sensors.

HIDE

Like it? Add this infographic to your site:
1. Click on the box below.   2. Press Ctrl+A to select all.   3. Press Ctrl+C to copy.   4. Paste the code into your page (Ctrl+V).

Image will appear the same size as you see above.