by Cedric Pernet, Senior Threat Researcher Trend Micro Cyber Safety Solutions Team
Phishing is one of the oldest scams on the internet. It has become so common that every user and business has likely seen multiple phishing pages, knowingly or not. A lot of best practices teach users how to prevent or defend against phishing attacks when they receive one, but how can users — and especially businesses — actively detect and thwart them before users even see them?
What’s in a (domain) name?
A lot of phishing schemes — especially ones that involve financial fraud and credit card information theft — employ hosted fraudulent pages on the web using copies of legitimate pages. For instance, a user might be tricked into clicking a link in a fraudulent email, believing he is clicking a legitimate one. He is diverted to a copy of the legitimate page that a fraudster controls. Once the user enters his login information and credit card number on that page, the cybercriminal intercepts and steals them.
Hosting fake web pages can be done several ways. One of the most common is to buy or register a brand new domain name that looks very similar to the legitimate one, then host the fraudulent content there. The scammers would then send phishing emails to attract people to their web content.
All these can take hours or days depending on the reactivity of the registrar, the hosting handling, and the fraudster’s own schedule and technical know-how. For enterprises, it’s thus possible to maintain a monitoring and detection system based on new domain registrations. But how does it work, exactly?
How are domain names handled?
A domain name is a string that defines a realm of administrative autonomy, authority or control within the internet. Domain names are formed by the rules and procedures of the Domain Name System (DNS), and are organized hierarchically. It is started by “DNS root domain”, followed by subordinate levels or subdomains separated by a “.” (dot) character.
The domain name trendmicro.com, for instance, shows a first level “.com” and a second level “trendmicro”. The first level is called the “top-level domain” (TLD). They are divided into a few categories:
gTLD — generic top-level domain; there were originally only 7 (.com, .edu, .gov, .int, .mil, .net, .org), but there are way more nowadays.
ccTLD — country-code top-level domain; these are the ones owned by countries, like .fr for France, .tw for Taiwan, etc.
Second-level and third-level domains are generally bought and registered by companies or individuals. It's not difficult to handle several levels, but note that the owner is also the one who bought the second level. For example: technical.support.portal.trendmicro.com is valid and belongs to the individual who registered trendmicro.com. Things become more complicated when people have second-level subdomains and rent or offer more levels. Companies like DynDNS or afraid.org, for instance, have registered domain names and actually offer the option to use any subdomain they want on it.
Examples of domains available for Dynamic DNS use – provided by AFRAID
How are domain names registered?
Domain name registration entails registrant or resellers (i.e., web hosting companies) working with registrars, organizations that process the registration. They then coordinate with registry operators who keep an archive/database of domain names for each TLDs, which is ultimately managed by the Internet Corporation for Assigned Names and Numbers (ICANN).
Authorization (Auth-Code) for TLDs, created for helping identify the domain name’s registrar, is also given to TLDs. For gTLDs, this registrar needs to be accredited by ICANN. It varies for ccTLDs, and a number of them are reserved for use by citizens of the corresponding country.
Introducing zone files
Zone files, or “DNS zone files”, are large text files that contain DNS information for DNS zones. Zones are portions on the domain name space handled by a single manager. This means there is a DNS zone file for .com, for .net, for .fr, and so forth.
A certain amount of these zone files are shared and freely available on the internet. As one can imagine, there are many different zone files — 1,574 so far. The ICANN’s Centralized Zone Data Service (CZDS) provides a way to download a huge amount of them in a centralized manner. Unfortunately, it’s not possible to access all of the zone files via CZDS.
Some zone files are only provided or sold to selected organizations. As an example, the Association française pour le nommage Internet en coopération (AFNIC), which is the registry for .FR (and a few more TLDs), provides a license for daily access to the whole WhoIs database they maintain. This license is only provided to legal entities that can prove they have existed for more than three years, have an office in France or in the territory of another European Union member state. Several other elements, including the purpose for which the data will be used, are provided to a desk review. The information also comes at a cost.
Hunting in zone files
Once the zone files are downloaded, it is possible to start looking for fraud from within it. The format of the zone data text files is very straightforward and makes it easy to sift through. Since the text files are so big and contain millions of domain names, it is obviously not possible to parse it by hand. Automated scripts can be developed to look for suspicious domains. It will need to parse all domain names and look for similarities or patterns that would raise a red flag.
For a banking company, for instance, this usually entails looking for variants of its brands that were registered by third parties. The good way around it is to build a list of strings considered a potential threat. For example: in Trend Micro, suspicious strings in zone files can include: trendmicro, trendmicr0, trendm1cro, trendm1cr0, trend-micro, trend-micr0, trend-m1cro, trend-m1cr0, and so on. As one can see, it already has quite a list of possibilities. But opportunistic fraudsters will look for ways to find the one you forgot to hunt.
Using regular expressions (also known as regex or regexp) helps address this. Regular expressions are sequences of characters that define search patterns to monitor. The whole list of trendmicro strings exposed here, for instance, can be expressed with one single regexp, which would be trend.*m[i1]cr[o0].
It is possible to test written regex against large texts using different tools online. trend.*m[i1]cr[o0] can be read this way: Match any string containing: “trend” followed by any character, 0 or more times, followed by “m” then either “i” or “1”, then “cr”, followed by either “o” or “0”. The string “any character 0 or more times” is made to also detect registrations like “trend---micro”, for instance.
Note that fraudsters also use names to lend their malicious domains credibility, i.e., using trendmicro-support.com. While it appears legitimate, it's not an officially recognized domain within the company.
To be more efficient, zone files should be downloaded often, so that the analysts can find new domains as soon as possible. They also need to build a system to “remove” old domains they have already scrutinized. They also need to be aware that the domains already inspected can be released anytime and registered once again by other persons, so domain name registration dates must be taken into consideration when trying to detect fraud.
What can be done to detect fraudulent domain names on TLDs that do not provide their zone files? Sending regular “hunting” DNS requests helps keep an eye on those. In this scenario, a company like Trend Micro would send multiple DNS requests matching different suspicious or fraudulent schemes in the domain naming. Taking the Trend Micro example once again, automated DNS requests (where actually stands for any TLD not providing its zone file) could be sent to try to resolve domains such as:
trendmicro.< tld >
trendmicr0.< tld >
trendm1cro.< tld >
trendm1cr0.< tld >
trendmicro-support.< tld >
trendm1cro-support.< tld >
trendmicr0-support.< tld >
trendm1cr0-support.< tld >
trendmicro-supp0rt.< tld >
trendm1cro-supp0rt.< tld >
trendmicr0-supp0rt.< tld >
trendm1cr0-supp0rt.< tld >
trendmicro-solutions.< tld >
As one can immediately see, the list can go on and contain hundreds of different domain names. The problem with that detection is that you cannot use regular expressions (regex) to hunt for new registered domains. Sending real DNS requests is needed and so the domain names accordingly need to be complete.
An alternative is to use scripting or a service so that one can still only write regexp, and the script would generate every possibility for DNS requesting. Some DNS servers do not like being queried so much, so waiting a bit between each DNS request is generally a good idea.
Once a suspicious domain has been discovered, and analyses confirmed the domain wasn’t registered by the company or individual, or was connected to fraudulent activity, a takedown process should immediately be launched. This process allows entities to shut the domain down or render it useless for the fraudsters.
Detecting suspicious domains
Once one or several good regexes are written for any brand detection needs, zone files can be parsed more efficiently and detections grow to a satisfying rate. Note that all of these need to be automated as much as possible. A recommended practice is to do these regularly (daily, if possible):
Download all data via CZDS/other solutions.
Hunt for suspicious domains using regular expressions; any detected domain name should then be examined.
Update/improve the regular expressions in case of false positives or mistakes.
Update all intelligence no matter how it is stored: Excel sheets, MISP, own tooling, etc.
Although one should be able to get a majority of TLD zone files, the problem remains for the missing zone files. At this point, regularly launching DNS requests for all combinations of indicators of suspicion is recommended.
A good technique for hunting and detecting suspicious domains is to also use a similar modus that cybercriminals typically employ: patterns. DNS data (i.e., passive system of record of DNS resolution data), for instance, provides information security professionals and system administrators insight on how a particular domain changes over time. Not only does this help them correlate indicators of compromise, but also provides the context needed for identifying related or additional suspicious domains. Domain registration information also helps unmask a cybercriminal's infrastructure by correlating a specific suspicious domain to others registered using similar information.