Foreseeing a New Era: Cybercriminals Using Machine Learning to Create Highly Advanced Threats

We listed a rundown of PoCs and real-life attacks where machine learning was weaponized to get a clearer picture of what is possible and what is already a reality with regard to machine learning-powered cyberthreats.

December 18, 2019

Cybersecurity companies use machine learning technology to enhance threat detection capabilities that help fortify organizations’ defense against malware, exploit kits, phishing emails, and even previously unknown threats. The Capgemini Research Institute conducted a study on the usage of machine learning for security and found that of 850 senior executive respondents based in 10 countries, around 20% started using the technology before 2019, and about 60% will be using it by year’s end.

The use of machine learning in cybersecurity — not to mention in many other fields across various industries — has proven to be beneficial. This technology, however, is also at risk of being used by threat actors. While widespread machine learning weaponization may still be far off, research concerning this area, particularly the use of deepfake technology to extort and misinform, have recently become a topic of interest for the IT community and the general public.

To get a clearer picture of what is possible and what is already a reality with regard to machine learning-powered cyberthreats, here’s a rundown of related PoCs and real-life attack cases that we have seen over the past few years:

A machine learning-powered malware

Research on machine learning-powered malware is still surprisingly scarce, considering that some experts have long considered it as a type of threat that can possess advanced capabilities. In fact, only one PoC of such a threat has been publicized, which was unveiled at Black Hat USA 2018. IBM presented a variant named DeepLocker that can deploy untraceable malicious applications within a benign data payload. The malware variant is supported by deep neural networks (DNN) or deep learning, a form of machine learning. The use of DNN disguises the malware’s conditions, which are pieces of information that security solutions need to detect malicious payload.

DeepLocker is designed to hide until it detects a specific victim. In the demonstration, DeepLocker was seen stealthily waiting for a specific action that will trigger its ransomware payload. The action that triggered the payload was the body movement of a targeted victim when he/she directly looked at a laptop webcam, which is operated by a webcam application embedded with malicious code. The application of machine learning in this attack can be considered limited, but it showed how malware variants can be highly evasive and targeted when infused with machine learning.

Deepfake video and audio

Experts are increasingly warning the public about deepfake videos, which are fake or altered clips that contain hyperrealistic images. Produced from generative adversarial networks (GANs) that generate new images from existing datasets of images, deepfake videos can challenge people’s perception of realities, confusing our ability to discern what is true from false.

Today, deepfake technology is mostly used in videos involving pornography, political propaganda, and satire. As for the impact of these videos, a Medium article published in May claimed that there were about 10,000 deepfake videos online, with House Speaker Nancy Pelosi and Hollywood star Scarlett Johansson being two of their most popular subjects/victims.

Regarding the use of this technology in profit-driven cybercrime, it can be surmised that deepfake videos may be used to create a new variation of business email compromise (BEC) or CEO fraud. In this variation of the scheme, a deepfake video can be used as one of its social engineering components to further deceive victims.

But the first reported use of deepfake technology in CEO fraud came in the form of audio. In September, a scam involving a deepfake audio was used to trick a U.K.-based executive into wiring US$243,000 to a fraudulently set-up account. The victim company’s insurance firm stated that the voice heard on the phone call was able to imitate not only the voice of the executive being spoofed, but also the tonality, punctuation, and accent of the latter.

A tool for guessing passwords

Brute force and social engineering methods are old but popular techniques that cybercriminals use to steal passwords and hack user accounts. New ways to do this could be inadvertently aided by user information shared on social media – some still embed publicly shared information into their account passwords. Additionally, machine learning research on password cracking is an area of concern that users and enterprises should closely pay attention to.

Back in 2017, one of the early proofs of machine learning’s susceptibility to abuse was publicized in the form of PassGAN — a program that can generate high-quality password guesses. Using a GAN of two machine learning systems, experts from the Stevens Institute of Technology, New Jersey, USA, were able to use the program to guess more user account passwords than popular password cracking tools HashCat and John the Ripper.

To compare PassGAN with HashCat and John the Ripper, the developers fed their machine learning system more than 32 million passwords collected from the 2010 RockYou data breach, and let it generate millions of new passwords. Subsequently, it attempted to use these passwords to crack a hashed list of passwords taken from the 2016 LinkedIn data breach.

The results came back with PassGAN generating 12% of the passwords in the LinkedIn set, while the other tools generated between 6% and 23%. But when PassGAN and Hashcat were combined, 27% of the passwords from the LinkedIn set were cracked. If cybercriminals are able to devise a similar or enhanced version of this methodology, it could be a potentially reliable way to hijack user accounts.

Adversarial machine learning

Adversarial machine learning is a technique that threat actors can use to cause a machine learning model to malfunction. They can do so by crafting adversarial samples, which are modified input fed to the machine learning system to mess up its ability to predict accurately. In essence, this technique — also called an adversarial attack — turns the machine learning system against itself and the organization running it.

This method has been proven capable of causing machine learning models for security to perform poorly, for example, by making them produce higher false positive rates. They can do this by injecting malware samples that are similar to benign files to poison machine learning training sets.

Machine learning models used for security can also be tricked using infected benign Portable Executable (PE) files or a benign source code compiled with malicious code. This technique can make a malware sample appear benign to models, preventing security solutions from accurately detecting it as malicious since its structure is still mostly comprised of the original benign file.

How can machine learning-powered cyberthreats be addressed?

Enhancing monitoring and data analysis solutions is a step in the right direction when it comes to detecting and blocking sophisticated threats such as those powered by machine learning. When the said solutions are eventually supported by a stronger capability to track network and server activity where even sophisticated or unknown threats can be identified, such advanced threats will be detected and weaknesses in the platform can be identified. This compels organizations to fix such weaknesses, which, in turn, paves the way for a more secure IT environment.

When it comes to dealing with advanced password cracking tools such as the machine learning-powered PassGAN, users and organizations can move towards two-factor authentication schemes to reduce their reliance on passwords. One approach to this is using a one-time password (OTP) — an automatically generated string of characters that authenticates the user for a single login session or transaction.

Meanwhile, technologies are continuously being developed to defend against deepfakes. To detect deepfake videos, experts from projects initiated by the Pentagon and SRI International are feeding samples of real and deepfake videos to computers. This way, computers can be trained to detect fakes. To detect deepfake audio, experts are training computers to recognize visual inconsistencies. And as for the platforms where deepfakes can creep in, Facebook, Google, and Amazon, among other organizations, are joining forces to detect them via the DeepFake Detection Challenge (DFDC) — a project that invites people around the world to build technologies that can help detect deepfakes and other forms of manipulated media.

Adversarial attacks, on the other hand, can be prevented by making machine learning systems more robust. This can be done in two steps: First, by spotting potential security holes early on in its design phase and making every parameter accurate, and second, by retraining models via generating adversarial samples and using them to enhance the efficiency of the machine learning system. Reducing the attack surface of the machine learning system can also ward off adversarial attacks. Since cybercriminals modify samples in order to probe a machine learning system, cloud-based solutions, such as products with Trend Micro™ XGen™ security, can be used to detect and block malicious probing.

Governments and private organizations, particularly cybersecurity companies, should anticipate a new era where cybercriminals use advanced technologies such as machine learning to power their attacks. As they have done in the past, cybercriminals will continue to develop more advanced and new forms of threats to be one step ahead. In this light, technologies for combating these threats should likewise continue to evolve. However, while it would be a good choice to implement a tailor-fit technology to detect such threats, a multilayered security defense (one that combines a variety of technologies) and the consistent application of cybersecurity best practices are still the most effective ways to defend against a wide range of threats.

HIDE

Like it? Add this infographic to your site:
1. Click on the box below. 2. Press Ctrl+A to select all. 3. Press Ctrl+C to copy. 4. Paste the code into your page (Ctrl+V).

Image will appear the same size as you see above.

Posted in Cybercrime & Digital Threats, Machine Learning