Network Threats Examined: Clustering Malicious Network Flows with Machine Learning

Evasive network threats pose serious risks to enterprises. Learn about malicious network flow clustering—a machine learning-powered method for addressing concerns on network threats.

Ahead of the Curve: A Deeper Understanding of Network Threats Through Machine Learning View Ahead of the Curve: A Deeper Understanding of Network Threats Through Machine Learning

By Joy Nathalie Avelino, Jessica Patricia Balaquit, and Carmi Anne Loren Mora

Network threats are industry-agnostic when it comes to the risks they pose to enterprises. Now that cybercriminals are increasingly using evasion tactics to bypass rule-based detection methods, proactive techniques are needed to discover a malware infection before it leads to financial loss, reputational damage, or disruption of business operation. One approach to consider when addressing this concern is through network flow clustering enabled by the power of machine learning.

A flow is a “unidirectional stream of Internet Protocol (IP) packets that share a set of common properties: typically, the IP-five-tuple of protocol, source and destination IP addresses, source and destination flows."To discover and analyze different kinds of network anomalies, flow data needs to be looked at as they contain information useful for analyzing traffic composition of different applications and services in the network.

Machine learning is then applied to cluster malicious network flows. This will help analysts obtain insights that can show them relationships between different malware families, and how they differ from one another.

Network Threat Clustering Results on Exploit Kits

In its research using a semi-supervised model to cluster similar types of malicious network flows from the raw byte stream augmented with handcrafted features, Trend Micro was able to filter and classify a cluster comprised entirely of exploit kit detections.

The five malware families clustered were Rig, FlashPack, Angler, Neutrino, and Blacole — all targeting applications through certain file types. This makes sense since exploit kits are known to take advantage of their target applications through file formats, e.g., Shockwave/Flash, PDF, and JavaScript (JS), among others.

To show how the machine learning model sees the network flow, Figure 1 displays the different colors that correspond to the structural attributes determined by the features passed to the model. In a rule-based detection environment where one rule is created for each malware family to address the varying flow characteristics present in the network, it is important to note that a change in network traffic can render the rule unusable (unless modified). Thus, machine learning can be a key tool in successfully clustering network threats and providing insights on different network patterns from malicious traffic.

NOTE: Each color represents one characteristic.
Figure 1. Raw network data of each malware family

As we can see, the machine learning model was able to find similarities in the malicious network flows. From the multiple characteristics seen in each malware family, the model identified which ones make up a certain profile that correlates among the similar samples. Figure 2 shows an analogy of how the model sees the similar characteristics among the malware families.


Figure 2. Malicious network flows as seen by the clustering model

Initially, Blacole seems like an outlier, as it was categorized as a Trojan and not specifically as an exploit kit in the dataset labelling. However, upon examination of its network traffic, it became clearer that the key similarity that links Blacole to the other exploit kits is that its malware routine took advantage of JS vulnerabilities. This means that in certain cases, we can arrive at a more specific description (exploit kit) than what the initial labelling provided (Trojan), and exploit kits can be identified without tailoring features to a specific attack instance.

Making Sense of the Insights Formed from Clustering via Machine Learning

As seen in our analysis of exploit kit detections, insights on different network patterns from malicious traffic can be obtained through clustering malicious network flows. Such insights can be useful to augment rule creation for detecting network malware.

The use of machine learning in this study showed how the technology can speed up the process of organizing large amounts of data, and offer explanation to help analysts form conclusions and time-zero protection.

To know more about the results on the network threats clustered in this study, and how machine learning can help analysts gain valuable insights on future trends, check out our research paper, “Ahead of the Curve: A Deeper Understanding of Network Threats Through Machine Learning,” presented at the TENCON 2018 in Jeju, South Korea. An updated version will be available in the IEEE Xplore Digital Library.

Creative Commons License
The animated visualization for this work is licensed under a Creative Commons Attribution 3.0 Unported License.

HIDE

Like it? Add this infographic to your site:
1. Click on the box below.   2. Press Ctrl+A to select all.   3. Press Ctrl+C to copy.   4. Paste the code into your page (Ctrl+V).

Image will appear the same size as you see above.