Chen-yu Dai leads the threat intelligence team at the global CSIRT of an internet conglomerate with versatile e-commerce, fintech, and telecom businesses.
Different markets and industries, including e-commerce, social media, fintech, airline and travel, and online ticket services, are losing billions of dollars each year because of fake web traffic generated by illicit scrapers, fake accounts, robot buyers, carders, and stuffers. According to a 2017 report, bots are so pervasive that they are responsible for 52% of all online traffic.
CAPTCHA services and IP reputation feeds are used to counter the abuse of bots and scripts, which can facilitate e-commerce fraud and account takeover. However, abusers and cybercriminals are using residential proxies and CAPTCHA-solving services to circumvent these security measures. Residential proxies, which can hide originating IP addresses, can be used for a plethora of fraudulent activities, including gift card, chargeback, and click fraud.
This article provides insights on how abusers and cybercriminals use residential proxies and CAPTCHA-solving services to enable bots, scrapers, and stuffers. We also provide in-depth technical analyses of popular residential proxies and CAPTCHA-defeating services, and propose ways to detect and defeat them.
The many problems associated with bots, scrapers, and stuffers
Fake web traffic in the form of API abuse, price and inventory scrapers, robot buyers, and stuffers, affect a variety of businesses every day. A report by Arkose Labs highlighted some important statistics: 25% of newly registered accounts are fake, 20% of login attempts are attacks, and 86% of all attacks are carried out by bots. Late last year, the FBI has even issued a public warning that shines a spotlight on how cybercriminals use proxies and configurations to carry out credential stuffing attacks that lead to “financial losses associated with fraudulent purchases, customer notifications, system downtime and remediation, as well as reputational damage.”
The global economic losses caused by fake web traffic is estimated to be in the tens of billions of dollars. According to identity verification company Veriff, account takeover attacks cost businesses as much as US$11.4 billion per year. Sadly, it’s significantly cheaper to launch these kinds of attacks than to defend against them — F5 estimates that it costs less than US$200 to launch 100,000 account takeover attempts.
Despite their prevalence, account takeover bots are not the only problem. Price and inventory scrapers can also cause unnecessary financial upheaval among vendors. Robot buyers, such as sneaker bots and scalper bots, deprive real buyers of the opportunity to buy discounted or limited-edition items, which can result in a negative buying experience.
Sneaker bots have become a big enough problem that Nike added antibot regulations to its terms of sale agreement, which allow for the cancellation of orders or the limiting of purchase quantities if it is determined that the sale was made using automation. Scalper bots, on the other hand, are being used to secure tickets and services that have a great public demand. Last year, Akamai reported that scalper bots are being used to take all available appointment slots from the Israeli government services via the MyVisit platform. The operators of these scalper bots sold each appointment slot for more than US$100.
Since 2020, we have observed that account takeover and scraping bots started moving away from datacenter IP addresses. This is possibly because it’s relatively easy for an organization’s security operations team to block a couple of Class C IP addresses once a fraudulent IP address is detected. This trend is also echoed in tutorials and posts on underground forums.
To counter this, abusers use residential proxies, which are tricky to block because they are sparse and residential users can be allocated a dynamic IP. Typically, website owners do not want to risk losing a potential customer or incorrectly flagging a transaction as fraudulent simply because an IP was previously used by a shady bot script. Residential proxy providers have started selling residential IP services and advertising them as web testing or localization testing services. Some high-end and more expensive residential IP providers allow their users to specify their preferred region, city, and even ISP, to bypass a credit card issuer’s antifraud checks. This makes it possible for carders and stuffers to use stolen credit cards to purchase and sell items
Meanwhile, the Federal Trade Commission’s (FTC) 2021 Consumer Protection Data Spotlight report revealed that consumers lost US$148 million due to gift card fraud, as gift cards are scammers’ preferred payment method. Chargeback fraud is expected to cost businesses around US$38.1 billion by 2024 and US$49.3 billion by 2030.
The e-commerce industry has developed a standard countermeasure against these attacks. If a user’s IP has a bad reputation or if it originated from a VPN service or a datacenter, a Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) will be used to confirm that a human, not a bot, is browsing on the other side of the web. Compliance and risk control systems in credit card payment gateways are also triggered to reduce the probability of risky transactions.
However, malicious actors and abusers have developed tools to counter these defensive measures. Residential proxy services can be used to bypass IP reputation services, and such bypassing techniques are under regular review by the scraping community. When a CAPTCHA challenge appears, CAPTCHA-solving services can be used to defeat the verification. We discussed three interesting residential proxy and CAPTCHA-solving service cases in a recently published blog entry.
Which businesses are targeted by residential proxies and CAPTCHA-defeating services?
Residential proxies are mostly powered by proxyware, a type of software that is commonly marketed as a “network bandwidth sharing” service. At times, proxyware comes bundled with downloaded and repacked shareware and pirated software. Residential proxies allow other users to make use of the device that the proxyware is running on as an exit node. Proxyware is either marketed as a passive income stream (upon installing proxyware, users will get paid by the bandwidth they share) or is installed unknowingly through questionable software packages. The proxyware provider then makes money by selling access to the exit nodes and marketing it as residential proxies. We delve into what proxyware is and how it opens users up to risks in this blog entry.
Proxyway, a blog site dedicated to all things proxyware, provides insights on targeted sectors and big brand names, such as AliExpress, Amazon, Craigslist, Home Depot, Walmart, Booking.com, Indeed, Bing, Google, Yahoo, Facebook, and Instagram. Meanwhile, other proxy sellers, such as proxy-sale.com, directly offer proxies that are packaged according to their specific use cases, as seen in Figure 1.
Figure 1. Proxy-sale.com offers customized packages tailored for specific use cases.
Meanwhile, CAPTCHA-solving services, which we discussed in detail in our blog entry, attempts to blur the distinction between human- and bot-originating web traffic. Based on the data that we’ve gathered using the Trend Micro™️ Smart Protection Network™️ from January to August 2022, business websites from different industries, including social commerce, online gaming, cryptocurrency, and travel, were affected by CAPTCHA-solving services.
Figure 2. The top 20 websites affected by CAPTCHA-solving services from January to August 2022 based on the Smart Protection Network
Source: Trend Research
Although airline websites are not included in the top 20 list, it’s important to note that such websites are heavily affected by bad bot activity. According to a 2019 Imperva report, airline websites experience 43.9% bad bot traffic.
In the following sections, we will give examples of scrapers, robot buyers, carders and stuffers, and account takeover attacks that use residential proxies and CAPTCHA-solving services in their operations.
Web scrapers are automated tools that are designed to collect or scrape information in bulk from targeted websites. These tools are often created in a way that can pass as human users and use anti-CAPTCHA services and residential proxies to bypass IP addresses. In 2018, web scrapers reportedly caused e-commerce websites an estimated 2% loss in online revenue.
Kasada, in their 2022 State of the Bot Mitigation report, states that nearly 40% of companies with a bot mitigation solution reportedly had a 10% or greater loss of revenue because of bot-driven web scraping. It is especially serious for travel sites, taking up to 45% of the traffic. Abusers do not always have to build their tools from scratch — there are web scraping services available for as low as US$89 per month that are already bundled with IP proxies and CAPTCHA-solving services.
Figure 3. Price list of a web scraping service bundled with IP proxies and CAPTCHA-solving services
We have observed scraper bots extracting merchandise prices, hotel room prices and availabilities, ticket prices, and even the complete details of tour packages. These scraper bots can imperil businesses’ bandwidth, resources, and analytics.
From an e-commerce business owner’s perspective, it can be hard to detect scraper bots. This is especially true if the abusers who are launching them use residential IP providers that support sticky IP, where a single session can last anywhere from five minutes to an hour. However, there are still providers that change IPs per request, such as in the case of rotating IPs. A company’s antifraud team will be able to see if one single user (which is tracked by the session ID, user ID, or even redirection chains) has multiple, if not hundreds, of IPs.
In our investigation, we set up a website to study the behaviors of residential IP proxies (which we will tackle in detail in an upcoming blog entry). The Nginx log of a rotating IP can be seen in Figure 4.
Figure 4. HTTP access log shows a different rotated IP per request
We have also seen a buggy scraper bot that had one thread that has successfully logged in and another that has failed, which resulted in both success and failed responses on the same query. This scraper bot has caused unnecessary web traffic that wasted Booking.com’s bandwidth and the server’s CPU time. This type of activity can be used by the website operator to detect and block the scraper bot.
Figure 5. A scraper bot is scraping Booking.com with one logged in session and one failed session.
A robot buyer is a kind of automation software that is used to purchase discounted, limited-edition, or high-demand items such as event tickets, collectibles, or sneakers. Resellers used bots to purchase high-demand event tickets, such as what happened during the presale of Taylor Swift’s "Eras" tour in November 2022. According to The Guardian, these concert tickets were being resold online for highly inflated prices that reached a whopping US$22,700. In a US Senate Judiciary committee hearing, the president of Live Nation shared that scalper bots plagued Ticketmaster’s site and caused technical glitches that hindered real fans from purchasing concert tickets.
Meanwhile, sneaker-buying robots, otherwise referred to as sneaker bots or shoe bots, are used to buy limited-edition or rare sneakers to resell them at a higher price. Though the financial losses caused by sneaker bots is unknown, it’s safe to surmise that it takes a significant portion of the multibillion-dollar sneaker resale market — one that is expected to reach US$30 billion by 2030.
Nike recently announced that they’re cracking down on robot buyers by blocking suspicious orders and even refusing refunds for items purchased by sneaker bots. Sneaker bots seem to be a very popular use case for proxyware services. In fact, IPRoyal, a residential IP service, even provides a special package for sneaker bots, as seen in Figure 6. Proxyway even wrote an article that features six sneaker proxies and bots for those who want to buy limited-edition sneakers.
Figure 6. A residential IP service that has an option to get sneaker proxies
Trend Micro has observed access to sneaker-selling sites, such as kith.com and Nike, from a proxyware service like Honeygain, as well as direct access from several sneaker bot software. This allowed us to compare and verify the traffic generated by sneaker bots versus the ones generated via proxyware exit nodes when making a connection. For example, we have observed traffic from a tool named “nikeshoes” (SHA-256: 52c6a76c0c0847e798c646288ba039509c5f8672812d622d638b04cdb08c21a5) that consistently browses a sneaker selling site. Examples of the other popular sneaker bot software that we’ve observed in our investigation can be seen in Table 1.
Table 1. A list of sneaker bot software that we’ve observed in our investigation
Numerous data leaks have provided attackers with a wealth of usernames, email addresses, and passwords that they can use to launch credential stuffing and account takeover attacks. However, an attacker can only perform a certain number of login attempts before additional security measures, such IP blocking and CAPTCHA challenges, take effect.
This is where residential proxies and CAPTCHA-defeating services come into play. These tools, in addition to the availability of data leaks, enable account takeover by bypassing brute-force restrictions that have been put into place by online platforms.
From November 2022 to March 2023,we’ve seen brute-force attacks against accounts on an online bank (Standard Bank), an online payment system (Venmo), and a social media site (TikTok) through a residential proxy network.
Figure 10. An account takeover bot brute-forcing Bitwarden, an open-source password management service, via Infatica, a residential proxy service
Figure 11. An account takeover bot brute-forcing Standard Bank, a South African financial institution, via IPRoyal, a residential proxy service
It is estimated that account takeover costs businesses US$11.4 billion per year, affecting mostly businesses related to social media, e-commerce, and financial services. Fraudsters who launch account takeover attacks are motivated by financial gain, using stolen accounts to enable further fraud (such as when they use a stolen social media account to perpetuate fraud against the victim’s friends and family), monetize points or credits (there have been numerous incidents wherein loyalty reward points were redeemed by the fraudster who took over the victim’s account), or steal money directly from their victims (such as when fraudsters take over a victim’s e-commerce or financial service account).
When businesses suffer from account takeover attacks, the risk of losing brand value and customer trust increases. In fact, a 2020 Sift report states that 28% of customers stopped using a site or service after experiencing an account takeover attack.
Conclusion and recommendations
Online store abuse is an umbrella term for a series of strategies that fraudsters use to take advantage of e-commerce systems. These attacks are often enabled by residential IP services that are launched by unsuspecting users who think that they are only “sharing unused bandwidth” from home. When abused for ill gain, residential IP services can enable attacks that cause the e-commerce industry to lose US$20 billion to US$25 billion each year.
E-commerce business owners should enable countermeasures against residential IPs and CAPTCHA-solving services. We also urge online services to look for alternative ways to mitigate automated abuses and not limit themselves to IP-based blocking. We foresee a rise in the cyber arms race, as abusers and fraudsters are unceasing when it comes to finding ways around security measures. This means that our proposals below might only work for a certain period, so due diligence and continued research in this space is needed. E-commerce companies invest significant resources into dealing with automated activity against their web platforms, and there is a continuous need to adjust and fine-tune antifraud and anti-abuse systems.
There are services that are designed to both prevent abuse by proxies, as well as reduce the problems posed by earlier antifraud systems. These include the use of third-party IP reputation services such as Spur.us. While they do not guarantee 100% accuracy in proxyware detection, our tests show that this service provides an adequate level of accuracy.
Figure 12. An accurate third-party IP reputation service can detect residential proxies
However, IP reputation services suffer from the very dynamic nature of residential proxies. Since many residential users are allocated dynamic IPs, the use — or misuse — of an IP address can change quickly over time. If a residential proxy provider has enough available IPs to distribute, an IP reputation service will fail to work as effectively.
E-commerce business owners can also make use of the country distribution information of proxyware exit IPs. If traffic is coming from an IP with a lot of proxyware exit nodes, then additional challenges or anti-abuse measures should be put into place. A comprehensive list of proxyware distribution data by country based on our telemetries will be featured in an upcoming blog post.
User behavior analysis, especially when combined with machine learning algorithms, can be effective in the proactive detection of proxyware nodes. For example, it is possible to analyze IP traffic based on the following internet service provider (ISP)-level metrics:
- If an IP frequently accesses IP-checking services. Many proxyware services frequently check the IP address of its exit nodes as the user or automation behind it does not know the IP address they are currently using.
- If an IP frequently varies its user-agents. This metric should be viewed cautiously, especially since users sometimes install plugins that change user-agents to protect their privacy..
- If there is a mismatch between the user-agent OS and the OS network fingerprint. It should be noted that certain CAPTCHA-solving services, such as Anti-Captcha, are aware of this and force their users to use specific user-agents.
Matching the CAPTCHA-solving worker’s IP against the IP that accesses the website can help efficiently detect CAPTCHA-solving services. Fraud prevention platforms such as GeeTest Adaptive CAPTCHA and Arkose Labs are so far the only companies that provide a solver’s IP address, which can then be matched against the IP address that triggered the CAPTCHA challenge. If there is an IP mismatch (if the solver’s IP is different from the original IP address accessing the website), then it is a sign that the CAPTCHA was solved by a CAPTCHA-defeating service.
In some cases, there is a significant delay between the CAPTCHA challenge and the corresponding solution, which can serve as a good indicator of the use of CAPTCHA-solving services. We think that a service that accurately provides a solver’s IP address is more effective at thwarting CAPTCHA-solving services than sites that employ extremely hard even-humans-cannot-solve-them CAPTCHAs.
There are also residential IP providers that sell both residential proxy services and lists of residential IPs. However, buying data from such providers can be a moral dilemma, as they sell residential proxy services to users who run bots to harm the e-commerce industry while also selling the data of their own users. Therefore, we recommend an internal discussion about ethics and the potential benefits of considering such a purchase.
Like it? Add this infographic to your site:
1. Click on the box below. 2. Press Ctrl+A to select all. 3. Press Ctrl+C to copy. 4. Paste the code into your page (Ctrl+V).
Image will appear the same size as you see above.