Machine Learning

What Is Machine Learning?

Machine learning is more than just a buzz-word — it is a technological tool that operates on the concept that a computer can learn information without human mediation. It uses algorithms to examine large volumes of information or training data to discover unique patterns. This system analyzes these patterns, groups them accordingly, and makes predictions. With traditional machine learning, the computer learns how to decipher information as it has been labeled by humans — hence, machine learning is a program that learns from a model of human-labeled datasets.

It is unique in how it becomes, in a way, intuitive. Through repetition, it learns by inference without a need to be deliberately programmed each and every time. However, a caveat: Machine learning can make mistakes and appropriate caution should be used. 1

Machine learning proves to be useful especially in today’s big data world. We come into contact with machine learning on a daily basis. It supports technologies such as identifying voice commands on our phones, recommending which songs to listen to on Spotify or which items to purchase next on Amazon, and even determining the fastest way to reach your destination on Waze, to name a few.

Machine learning is also used in healthcare, helping doctors make better and faster diagnoses of diseases, and in financial institutions, detecting fraudulent activity that doesn’t fall within the usual spending patterns of consumers.

Machine Learning Algorithm Types

Supervised Machine Learning

The traditional machine learning type is called supervised machine learning, which necessitates guidance or supervision on the known results that should be produced. In supervised machine learning, the machine is taught how to process the input data. It is provided with the right training input, which also contains a corresponding correct label or result. From the input data, the machine is able to learn patterns and, thus, generate predictions for future events. A model that uses supervised machine learning is continuously taught with properly labeled training data until it reaches appropriate levels of accuracy.

Unsupervised Machine Learning

Unsupervised machine learning, through mathematical computations or similarity analyses, draws unknown conclusions based on unlabeled datasets. An unsupervised machine learning model learns to find the unseen patterns or peculiar structures in datasets. In unsupervised machine learning, the machine is able to understand and deduce patterns from data without human intervention. It is especially useful for applications where unseen data patterns or groupings need to be found or the pattern or structure searched for is not defined. This also refers to clustering.

Instance-Based Machine Learning

Another type is instance-based machine learning, which correlates newly encountered data with training data and creates hypotheses based on the correlation. To do this, instance-based machine learning uses quick and effective matching methods to refer to stored training data and compare it with new, never-before-seen data. It uses specific instances and computes distance scores or similarities between specific instances and training instances to come up with a prediction. An instance-based machine learning model is ideal for its ability to adapt to and learn from previously unseen data.

Machine Learning and Cybersecurity

The emergence of ransomware has brought machine learning into the spotlight, given its capability to detect ransomware attacks at time zero.

Evolution is malware’s game. A few years ago, attackers used the same malware with the same hash value — a malware’s fingerprint — multiple times before parking it permanently. Today, these attackers use some malware types that generate unique hash values frequently. For example, the Cerber ransomware can generate a new malware variant — with a new hash value every 15 seconds.This means that these malware are used just once, making them extremely hard to detect using old techniques. Enter machine learning. With machine learning’s ability to catch such malware forms based on family type, it is without a doubt a logical and strategic cybersecurity tool. 

Machine learning algorithms are able to make accurate predictions based on previous experience with malicious programs and file-based threats. By analyzing millions of different types of known cyber risks, machine learning is able to identify brand-new or unclassified attacks that share similarities with known ones.

From predicting new malware based on historical data to effectively tracking down threats to block them, machine learning showcases its efficacy in helping cybersecurity solutions bolster overall cybersecurity posture.

[Read: 5 ways machine learning can be used for security today]

And though machine learning has become a major talking point in cybersecurity fairly recently, it has already been an integrated tool in Trend Micro’s security solutions since 2005 — way before the buzz ever started.

How Does Trend Micro Use Machine Learning?

Machine learning is a key technology in the Trend Micro™ XGen™ security, a multi-layered approach to protecting endpoints and systems against different threats, blending traditional security technologies with newer ones and using the right technique at the right time. 

For over a decade, Trend Micro has been harnessing the power of machine learning to eliminate spam emails, calculate web reputation, and chase down malicious social media activity. Trend Micro continuously develops the latest machine learning algorithms to analyze large volumes of data and predict the maliciousness of previously unknown file types.

Trend Micro’s Machine Learning Milestones

Year Machine Learning Milestone

2005

As early as 2005, Trend Micro has utilized machine learning to combat spam emails via the Trend Micro Anti Spam Engine (TMASE) and Hosted Email Security (HES) solutions.

2009

To accurately assign reputation ratings to websites (from pornography to shopping and gambling, among others), Trend Micro has been using machine learning technology in its Web Reputation Services since 2009.

2010

Trend Micro's Script Analyzer, part of the Deep Discovery™ solution, uses a combination of machine learning and sandbox technologies to identify webpages that use exploits in drive-by downloads.

2012

With the goal of helping law enforcement with cybercriminal investigations dealing specifically with targeted attacks, Trend Micro has developed SPuNge, a system that uses a combination of clustering and correlation techniques to “identify groups of machines that share a similar behavior with respect to the malicious resources they access and the industry in which they operate.”

2013

Trend Micro developed Trend Micro Locality Sensitive Hashing (TLSH), an approach to Locality Sensitive Hashing (LSH) that can be used in machine learning extensions of whitelisting. It generates hash values that can be analyzed for whitelisting purposes. In 2013, Trend Micro open sourced TLSH via GitHub to encourage proactive collaboration.

2015

In 2015, Trend Micro successfully employed machine learning in its Mobile App Reputation Service (MARS) for both iOS and Android, as well as in its mobile security products (Trend Micro™ Mobile Security for Android™ for end users and Trend Micro™ Mobile Security for Enterprise for organizations).

Machine learning algorithms enable real-time detection of malware and even unknown threats using static app information and dynamic app behaviors. These algorithms used in Trend Micro’s multi-layered mobile security solutions are also able to detect repacked apps and help capacitate accurate mobile threat coverage in the TrendLabs Security Intelligence Blog.

Since 2015, Trend Micro has topped the AV Comparatives’ Mobile Security Reviews. The machine learning initiatives in MARS are also behind Trend Micro’s mobile public benchmarking continuously being at a 100 percent detection rate — with zero false warnings — in AV-TEST's product review and certification reports in 2017.

2017

Predictive Machine Learning Engine was developed in 2016 and is a key part of the Trend Micro XGen solution. It uses two types of machine learning: pre-execution machine learning that identifies malicious files based on the file structure, and run-time machine learning for files that execute malicious behavior.

 

AV-TEST featured Trend Micro Antivirus Plus solution on their MacOS Sierra test, which aims to see how security products will distinguish and protect the Mac system against malware threats. Trend Micro’s product has a detection rate of 99.5 percent for 184 Mac-exclusive threats, and more than 99 percent for 5,300 Windows test malware threats. It also has an additional system load time of just 5 seconds more than the reference time of 239 seconds.

Overall, at 99.5 percent, AV-TEST reported that Trend Micro’s Mac solution “provides excellent detection of malware threats and is also well recommended” with its minimal impact on system load (something more than 2 percent).

 

On February 7, 2017, Trend Micro further solidified its position at the forefront of machine learning technology — by being the first standalone next-generation intrusion prevention system (NGIPS) vendor to use machine learning in detecting and blocking attacks in-line in real time.

The patent-pending machine learning capabilities are incorporated in the Trend Micro™ TippingPoint® NGIPS solution, which is a part of the Network Defense solutions powered by XGen security.

Through advanced machine learning algorithms, unknown threats are properly classified to be either benign or malicious in nature for real-time blocking — with minimal impact on network performance.

[ Read: Machine Learning Masters ]

Trend Micro’s Dual Approach to Machine Learning

Machine learning at the endpoint, though relatively new, is very important, as evidenced by fast-evolving ransomware’s prevalence. This is why Trend Micro applies a unique approach to machine learning at the endpoint — where it’s needed most.

Pre-execution machine learning, with its predictive ability, analyzes static file features and makes a determination of each one, blocks off malicious files, and reduces the risk of such files executing and damaging the endpoint or the network. Run-time machine learning, meanwhile, catches files that render malicious behavior during the execution stage and kills such processes immediately.

Both machine learning techniques are geared towards noise cancellation, which reduces false positives at different layers.

A high-quality and high-volume database is integral in making sure that machine learning algorithms remain exceptionally accurate. Trend Micro™ Smart Protection Network™ provides this via its hundreds of millions of sensors around the world. On a daily basis, 100 TB of data are analyzed, with 500,000 new threats identified every day. This global threat intelligence is critical to machine learning in cybersecurity solutions.

The Trend Micro™ XGen page provides a complete list of security solutions that use an effective blend of threat defense techniques — including machine learning.

Machine Learning vs. the Hype

How Is Big Data Relevant to Machine Learning?

The prevalence of the internet and the Internet of Things (IoT) — from devices, smart homes, and connected cars to smart cities — has made available large amounts of digital data that are generated on a daily basis, all available for collecting, analyzing, and utilizing.

These large amounts of data is aptly called big data. It is a combination of structured data (searchable by algorithms and databases) and unstructured data (hard or impossible to search via machine algorithm, such as macro files, emails, web searches, and images) that continue to grow at a highly accelerated pace. In fact, it is predicted that by 2025, 180 zettabytes (180 trillion gigabytes) of data will be generated.

Big data is being harnessed by enterprises big and small to better understand operational and marketing intelligences, for example, that aid in more well-informed business decisions. However, because the data is gargantuan in nature, it is impossible to process and analyze it using traditional methods.

Machine learning plays a pivotal role in addressing this predicament. Machine learning algorithms enable organizations to cluster and analyze vast amounts of data with minimal effort. But it’s not a one-way street — Machine learning needs big data for it to make more definitive predictions. Essentially, big data is necessary for machine learning to exist.

An understanding of how data works is imperative in today’s economic and political landscapes. And big data has become a goldmine for consumers, businesses, and even nation-states who want to monetize it, use it for power, or other gains.

Read: Knowledge is Power: The societal and business impact of big data ]
Read: Big data analytics in the real world: Unique big data use cases ]

The world of cybersecurity benefits from the marriage of machine learning and big data. As the current cyberthreat environment continues to expand exponentially, organizations can utilize big data and machine learning to gain a better understanding of threats, determine fraud and attack trends and patterns, as well as recognize security incidents almost immediately — without human intervention.

Read: Big data and machine learning: A perfect pair for cyber security? ]
Read: Machine learning and the fight against ransomware ]
Read: Artificial intelligence could remake cyber security – and malware ]

Cognizant of these benefits, Trend Micro has partnered up with Hadoop developers to help improve its security model. Hadoop is a popular big data framework used by giant tech companies such as Amazon Web Services, IBM, and Microsoft.

Read: Securing Big Data and Hadoop ]

Are Data Mining and Machine Learning the Same?

Despite their similarities, data mining and machine learning are two different things. Both fall under the realm of data science and are often used interchangeably, but the difference lies in the details — and each one’s use of data.

Data mining is defined as the process of acquiring and extracting information from vast databases by identifying unique patterns and relationships in data for the purpose of making judicious business decisions. Data mining is effectively used for different purposes. A clothing company, for example, can use data mining to learn which items their customers are buying the most, or sort through thousands upon thousands of customer feedback, so they can adjust their marketing and production strategies. 

Machine learning, on the other hand, uses data mining to make sense of the relationships between different datasets to determine how they are connected. Machine learning uses the patterns that arise from data mining to learn from it and make predictions.

To simplify, data mining is a means to find relationships and patterns among huge amounts of data while machine learning uses data mining to make predictions automatically and without needing to be programmed.

Is Machine Learning a Security Silver Bullet?

Machine learning is a useful cybersecurity tool — but it is not a silver bullet. While others paint machine learning as a magical black box or a complicated mathematical system that can teach itself to generate accurate predictions from data with possible false positives, we at Trend Micro view it as one valuable addition to other techniques that make up our multi-layer approach to security.

Machine learning has its strengths. It is effective in catching ransomware as-it-happens and detecting unique and new malware files. It is not the sole cybersecurity solution, however. Trend Micro recognizes that machine learning works best as an integral part of security products alongside other technologies.

Trend Micro takes steps to ensure that false positive rates are kept at a minimum. Employing different traditional security techniques at the right time provides a check-and-balance to machine learning, while allowing it to process the most suspicious files efficiently.

A multi-layered defense to keeping systems safe — a holistic approach — is still what’s recommended. And that’s what Trend Micro does best.

You may find various research on machine learning from Trend Micro researchers below:

 Ahead of the Curve: A Deeper Understanding of Network Threats Through Machine Learning View A Deeper Understanding of Network Threats Through Machine Learning

Web Defacement Campaigns Uncovered View Web
Defacement Campaigns
Uncovered

Trend Micro Locality Sensitive Hash View Real-Time
Detection of Malware
Downloads

Trend Micro Locality Sensitive Hash View TLSH -
A Locality Sensitive
Hash

Trend Micro Analysis of Twitter Abuse View An
In-Depth Analysis of
Abuse on Twitter

Machine Learning and Next-Generation Intrusion Prevention System (NGIPS) View Machine Learning and Next-Generation Intrusion Prevention System (NGIPS)

Automatic Classifying of Mac OS X Samples View Automatic
Classifying of
Mac OS X Samples

Automatic Extraction of Indicators of Compromise for Web Applications View Automatic Extraction of Indicators of Compromise for Web Applications

Targeted Attacks Detection with SPuNge View Targeted
Attacks Detection
with SPuNge

Using Machine Learning to Stop Exploit Kits In-line in Real-Time View Using Machine Learning to
Stop Exploit Kits In-line
in Real-Time

View Using Randomization to Attack Similarity Digests View Using Randomization
to Attack Similarity
Digests

View Exploring the Long Tail of (Malicious) Software Downloads View Exploring the Long Tail
of (Malicious) Software
Downloads