Machine learning is more than just a buzz-word — it is a technological tool that operates on the concept that a computer can learn information without human mediation. It uses algorithms to examine large volumes of information or training data to discover unique patterns. This system analyzes these patterns, groups them accordingly, and makes predictions. With traditional machine learning, the computer learns how to decipher information as it has been labeled by humans — hence, machine learning is a program that learns from a model of human-labeled datasets.
It is unique in how it becomes, in a way, intuitive. Through repetition, it learns by inference without a need to be deliberately programmed each and every time. However, a caveat: Machine learning can make mistakes and appropriate caution should be used. 1
Machine learning proves to be useful especially in today’s big data world. We come into contact with machine learning on a daily basis. It supports technologies such as identifying voice commands on our phones, recommending which songs to listen to on Spotify or which items to purchase next on Amazon, and even determining the fastest way to reach your destination on Waze, to name a few.
How Machine Learning Can Help Businesses
Machine Learning helps protect businesses from cyberthreats. However, it works best as part of a multilayered security solution.
Machine learning is also used in healthcare, helping doctors make better and faster diagnoses of diseases, and in financial institutions, detecting fraudulent activity that doesn’t fall within the usual spending patterns of consumers.
Machine Learning Algorithm Types
Supervised Machine Learning
The traditional machine learning type is called supervised machine learning, which necessitates guidance or supervision on the known results that should be produced. In supervised machine learning, the machine is taught how to process the input data. It is provided with the right training input, which also contains a corresponding correct label or result. From the input data, the machine is able to learn patterns and, thus, generate predictions for future events. A model that uses supervised machine learning is continuously taught with properly labeled training data until it reaches appropriate levels of accuracy.
Unsupervised Machine Learning
Unsupervised machine learning, through mathematical computations or similarity analyses, draws unknown conclusions based on unlabeled datasets.An unsupervised machine learning model learns to find the unseen patterns or peculiar structures in datasets. In unsupervised machine learning, the machine is able to understand and deduce patterns from data without human intervention. It is especially useful for applications where unseen data patterns or groupings need to be found or the pattern or structure searched for is not defined. This also refers to clustering.
Instance-Based Machine Learning
Another type is instance-based machine learning, which correlates newly encountered data with training data and creates hypotheses based on the correlation. To do this, instance-based machine learning uses quick and effective matching methods to refer to stored training data and compare it with new, never-before-seen data. It uses specific instances and computes distance scores or similarities between specific instances and training instances to come up with a prediction. An instance-based machine learning model is ideal for its ability to adapt to and learn from previously unseen data.
Machine Learning and Cybersecurity
The emergence of ransomware has brought machine learning into the spotlight, given its capability to detect ransomware attacks at time zero.
Evolution is malware’s game. A few years ago, attackers used the same malware with the same hash value — a malware’s fingerprint — multiple times before parking it permanently. Today, these attackers use some malware types that generate unique hash values frequently. For example, the Cerber ransomware can generate a new malware variant — with a new hash value every 15 seconds.This means that these malware are used just once, making them extremely hard to detect using old techniques. Enter machine learning. With machine learning’s ability to catch such malware forms based on family type, it is without a doubt a logical and strategic cybersecurity tool.
Machine learning algorithms are able to make accurate predictions based on previous experience with malicious programs and file-based threats. By analyzing millions of different types of known cyber risks, machine learning is able to identify brand-new or unclassified attacks that share similarities with known ones.
From predicting new malware based on historical data to effectively tracking down threats to block them, machine learning showcases its efficacy in helping cybersecurity solutions bolster overall cybersecurity posture.
And though machine learning has become a major talking point in cybersecurity fairly recently, it has already been an integrated tool in Trend Micro’s security solutions since 2005 — way before the buzz ever started.
Machine Learning-powered Threats
Advanced technologies such as machine learning and AI are not just being utilized for good — malicious actors are also abusing these for nefarious purposes. In fact, in recent years, IBM developed a proof of concept (PoC) of an ML-powered malware called DeepLocker, which uses a form of ML called deep neural networks (DNN) for stealth.
There are other ways in which cybercriminals exploit these technologies. A popular example are deepfakes, which are fake hyperrealistic audio and video materials that can be abused for digital, physical, and political threats. Deepfakes are crafted to be believable — which can be used in massive disinformation campaigns that can easily spread through the internet and social media. Deepfake technology can also be used in business email compromise (BEC), similar to how it was used against a UK-based energy firm. Cybercriminals sent a deepfake audio of the firm’s CEO to authorize fake payments, causing the firm to transfer 200,000 British pounds (approximately US$274,000 as of writing) to a Hungarian bank account.
We discuss the current and possible future ML- and AI-powered threats here:
We listed a rundown of PoCs and real-life attacks where machine learning was weaponized to get a clearer picture of what is possible and what is already a reality with regard to machine learning-powered cyberthreats.
We discuss the present state of the malicious uses and abuses of AI and ML and the plausible future scenarios in which cybercriminals might abuse these technologies for ill gain.
How Does Trend Micro Use Machine Learning?
Machine learning is a key technology in the Trend Micro™ XGen™ security, a multi-layered approach to protecting endpoints and systems against different threats, blending traditional security technologies with newer ones and using the right technique at the right time.
For over a decade, Trend Micro has been harnessing the power of machine learning to eliminate spam emails, calculate web reputation, and chase down malicious social media activity. Trend Micro continuously develops the latest machine learning algorithms to analyze large volumes of data and predict the maliciousness of previously unknown file types.
A Connected Threat Defense for Tighter Security
Learn how Trend Micro’s Connected Threat Defense can improve an organizations security against new, 0-day threats by connecting defense, protection, response, and visibility across our solutions. Automate the detection of a new threat and the propagation of protections across multiple layers including endpoint, network, servers, and gateway solutions.
Trend Micro’s Machine Learning Milestones
Machine Learning Milestone
As early as 2005, Trend Micro has utilized machine learning to combat spam emails via the Trend Micro Anti Spam Engine (TMASE) and Hosted Email Security (HES) solutions.
To accurately assign reputation ratings to websites (from pornography to shopping and gambling, among others), Trend Micro has been using machine learning technology in its Web Reputation Services since 2009.
Trend Micro's Script Analyzer, part of the Deep Discovery™ solution, uses a combination of machine learning and sandbox technologies to identify webpages that use exploits in drive-by downloads.
Trend Micro developed Trend Micro Locality Sensitive Hashing (TLSH), an approach to Locality Sensitive Hashing (LSH) that can be used in machine learning extensions of whitelisting. It generates hash values that can be analyzed for whitelisting purposes. In 2013, Trend Micro open sourced TLSH via GitHubto encourage proactive collaboration.
Machine learning algorithms enable real-time detection of malware and even unknown threats using static app information and dynamic app behaviors. These algorithms used in Trend Micro’s multi-layered mobile security solutions are also able to detect repacked apps and help capacitate accurate mobile threat coverage in the TrendLabs Security Intelligence Blog.
Predictive Machine Learning Engine was developed in 2016 and is a key part of the Trend Micro XGen solution. It uses two types of machine learning: pre-execution machine learning that identifies malicious files based on the file structure, and run-time machine learning for files that execute malicious behavior.
AV-TEST featured Trend Micro Antivirus Plus solution on their MacOS Sierra test, which aims to see how security products will distinguish and protect the Mac system against malware threats. Trend Micro’s product has a detection rate of 99.5 percent for 184 Mac-exclusive threats, and more than 99 percent for 5,300 Windows test malware threats. It also has an additional system load time of just 5 seconds more than the reference time of 239 seconds.
Overall, at 99.5 percent, AV-TEST reported that Trend Micro’s Mac solution “provides excellent detection of malware threats and is also well recommended” with its minimal impact on system load (something more than 2 percent).
On February 7, 2017, Trend Micro further solidified its position at the forefront of machine learning technology — by being the first standalone next-generation intrusion prevention system (NGIPS) vendor to use machine learning in detecting and blocking attacks in-line in real time.
The patent-pending machine learning capabilities are incorporated in the Trend Micro™ TippingPoint® NGIPS solution, which is a part of the Network Defense solutions powered by XGen security.
Through advanced machine learning algorithms, unknown threats are properly classified to be either benign or malicious in nature for real-time blocking — with minimal impact on network performance.
Machine learning at the endpoint, though relatively new, is very important, as evidenced by fast-evolving ransomware’s prevalence. This is why Trend Micro applies a unique approach to machine learning at the endpoint — where it’s needed most.
Pre-execution machine learning, with its predictive ability, analyzes static file features and makes a determination of each one, blocks off malicious files, and reduces the risk of such files executing and damaging the endpoint or the network. Run-time machine learning, meanwhile, catches files that render malicious behavior during the execution stage and kills such processes immediately.
Both machine learning techniques are geared towards noise cancellation, which reduces false positives at different layers.
A high-quality and high-volume database is integral in making sure that machine learning algorithms remain exceptionally accurate. Trend Micro™ Smart Protection Network™ provides this via its hundreds of millions of sensors around the world. On a daily basis, 100 TB of data are analyzed, with 500,000 new threats identified every day. This global threat intelligence is critical to machine learning in cybersecurity solutions.
The Trend Micro™ XGen page provides a complete list of security solutions that use an effective blend of threat defense techniques — including machine learning.
Data is vital to machine learning. Traditional machine learning models get inferences from historical knowledge, or previously labeled datasets, to determine whether a file is benign, malicious, or unknown.
We developed a patent-pending innovation, the TrendX Hybrid Model, to spot malicious threats from previously unknown files faster and more accurately. This machine learning model has two training phases — pre-training and training — that help improve detection rates and reduce false positives that result in alert fatigue.
Learn more about how we utilize both static and dynamic features to accurately and efficiently analyze unknown files here:
We have developed a machine learning model called TrendX Hybrid Model that uses two training phases — pre-training and training — and allows us to correlate static and behavior features to improve detection rates and reduce false positives.
Machine Learning vs. the Hype
How Is Big Data Relevant to Machine Learning?
The prevalence of the internet and the Internet of Things (IoT) — from devices, smart homes, and connected cars to smart cities — has made available large amounts of digital data that are generated on a daily basis, all available for collecting, analyzing, and utilizing.
These large amounts of data is aptly called big data. It is a combination of structured data (searchable by algorithms and databases) and unstructured data (hard or impossible to search via machine algorithm, such as macro files, emails, web searches, and images) that continue to grow at a highly accelerated pace. In fact, it is predicted that by 2025, 180 zettabytes (180 trillion gigabytes) of data will be generated.
Big data is being harnessed by enterprises big and small to better understand operational and marketing intelligences, for example, that aid in more well-informed business decisions. However, because the data is gargantuan in nature, it is impossible to process and analyze it using traditional methods.
Machine learning plays a pivotal role in addressing this predicament. Machine learning algorithms enable organizations to cluster and analyze vast amounts of data with minimal effort. But it’s not a one-way street — Machine learning needs big data for it to make more definitive predictions. Essentially, big data is necessary for machine learning to exist.
An understanding of how data works is imperative in today’s economic and political landscapes. And big data has become a goldmine for consumers, businesses, and even nation-states who want to monetize it, use it for power, or other gains.
The world of cybersecurity benefits from the marriage of machine learning and big data. As the current cyberthreat environment continues to expand exponentially, organizations can utilize big data and machine learning to gain a better understanding of threats, determine fraud and attack trends and patterns, as well as recognize security incidents almost immediately — without human intervention.
Cognizant of these benefits, Trend Micro has partnered up with Hadoop developers to help improve its security model. Hadoop is a popular big data framework used by giant tech companies such as Amazon Web Services, IBM, and Microsoft.
Despite their similarities, data mining and machine learning are two different things. Both fall under the realm of data science and are often used interchangeably, but the difference lies in the details — and each one’s use of data.
Data mining is defined as the process of acquiring and extracting information from vast databases by identifying unique patterns and relationships in data for the purpose of making judicious business decisions. Data mining is effectively used for different purposes. A clothing company, for example, can use data mining to learn which items their customers are buying the most, or sort through thousands upon thousands of customer feedback, so they can adjust their marketing and production strategies.
Machine learning, on the other hand, uses data mining to make sense of the relationships between different datasets to determine how they are connected. Machine learning uses the patterns that arise from data mining to learn from it and make predictions.
To simplify, data mining is a means to find relationships and patterns among huge amounts of data while machine learning uses data mining to make predictions automatically and without needing to be programmed.
Can end-to-end deep learning solutions replace expert-supported AI solutions?
ML- and AI-powered solutions make use of expert-labeled data to accurately detect threats. However, some believe that end-to-end deep learning solutions will render expert handcrafted input to become moot. There have already been prior research into the practical application of end-to-end deep learning to avoid the process of manual feature engineering. However, deeper insight into these end-to-end deep learning models — including the percentage of easily detected unknown malware samples — is difficult to obtain due to confidentiality reasons.
In an attempt to discover if end-to-end deep learning can sufficiently and proactively detect sophisticated and unknown threats, we conducted an experiment using one of the early end-to-end models back in 2017. Based on our experiment, we discovered that though end-to-end deep learning is an impressive technological advancement, it less accurately detects unknown threats compared to expert-supported AI solutions.
Learn more about our experiment that measured the detection rates of end-to-end deep learning technology here:
We look into developments in end-to-end deep learning for cybersecurity and provide insights into its current and future effectiveness.
Is Machine Learning a Security Silver Bullet?
Machine learning is a useful cybersecurity tool — but it is not a silver bullet. While others paint machine learning as a magical black box or a complicated mathematical system that can teach itself to generate accurate predictions from data with possible false positives, we at Trend Micro view it as one valuable addition to other techniques that make up our multi-layer approach to security.
Machine learning has its strengths. It is effective in catching ransomware as-it-happens and detecting unique and new malware files. It is not the sole cybersecurity solution, however. Trend Micro recognizes that machine learning works best as an integral part of security products alongside other technologies.
Trend Micro takes steps to ensure that false positive rates are kept at a minimum. Employing different traditional security techniques at the right time provides a check-and-balance to machine learning, while allowing it to process the most suspicious files efficiently.
A multi-layered defense to keeping systems safe — a holistic approach — is still what’s recommended. And that’s what Trend Micro does best.
Exploiting AI: How Cybercriminals Misuse and Abuse AI and ML
Generative Malware Detection
Ahead of the Curve: A Deeper Understanding of Network Threats Through Machine Learning
Web Defacement Campaigns Uncovered: Gaining Insights From Deface Pages Using DefPloreX-NG
Exploring the Long Tail of (Malicious) Software Downloads
An In-Depth Analysis of Abuse on Twitter
Automatic Classifying of Mac OS X Samples
Automatic Extraction of Indicators of Compromise for Web Applications
Machine Learning and Next-Generation Intrusion Prevention System (NGIPS)
Targeted Attacks Detection With SPuNge
TLSH - A Locality Sensitive Hash
Using Machine Learning to Stop Exploit Kits In-line in Real-Time: Statistical Models Identify Obfuscated HTML
Using Randomization to Attack Similarity Digests
Real-Time Detection of Malware Downloads via Large-Scale URL-File-Machine Graph Mining