While the internet is an integral and pervasive part of society — cloud based services are making the internet a homogenized experience no matter where you go, whether at home, at work, or even on the other side of the globe — many don’t understand the structure upon which it’s built. The internet was built in a similar manner to a house, with architects and engineers who design, shape, and create its structure. However, while we expect the internet to be built securely and solidly, we keep encountering security issues that make us shake our heads, often forgetting that the backbone of the internet is constructed primarily on technology that's more than 30 years old. Forget zero-days — the internet has enough existing holes to last a lifetime.
Ideally, an outward network connection should look similar to the diagram on the left, but the reality is that after a while, most data centers (where the actual physical wired connections are made), often look like the image on the right.
The Way Data Travels
Because of the way routing works, data does not generally move in a straight line. Although protocols exist to allow for direct connectivity, it is unwieldy at scale. The reality is that the internet is a collection of networks that each connect to one another, and packets of data bounce around each of these until they find the network to deliver the data. The bulk of internet traffic is routed by the Border Gateway Protocol (BGP), which is designed to allow routers to create relationships with each other that that tells them how to pass along data — effectively the largest of all peer-to-peer networks.
Basically, the internet addresses that computer or mobile devices use (IPs, both v4 and v6) are put together in aggregated address blocks called prefixes (sometimes also called CIDRs), and assigned to different organizations. Organizations then create bundles of their IP address space, referred to as an Autonomous System, which is then assigned a number (ASN), then use a method like BGP to send data between ASNs. In layman’s terms, think of ASN as your city, the prefix as your postal/ZIP code, the IP address as your house address, and the BGP as the in-car navigation system that tells you how to drive from Point A to Point B.
There exists a variety of protocols on how to transmit data between IPs and between ASNs, including international guidelines and standards on how all this interaction should happen. Not only might the data bounce between many ASN points in its path to go from the source to the destination, most protocols break the data into small chunks, called packets, each of which might take its own path between the source and destination. The transmission protocols then have to re-assemble them at the opposite end.
The issue is that none of the standards on interoperating or routing data are legally binding. So as long as two AS network operators are willing to connect their networks, neither of them has to follow any of the international guidelines on how to do so.
The challenge then is in knowing where your data has been with all this bouncing around going on.
Visualizations by Country
When it comes to internet routing, geopolitical awareness does not exist. When the internet was first designed, there was nothing within the protocols to enforce geolocation identification. As data routes across various AS network clouds, it does so without consideration of the geographic location it is routing through. Neither does it care about the country of origin its AS network operator is incorporated in. In an era of growing concerns regarding data privacy, data access, and data localization, we decided to do some internet traffic routing cartography to visualize how much routing stays within a given country, and how many network connections route out of the country.
We achieved this by collecting data on BGP announcements from a variety of sources, then creating a peer relationship table from these via inbound and outbound route advertisements. Because the routers can only “see” the announcements occurring within its own proximity, BGP route announcement collections are only as good as the number of routes being passed through them. Hence, data was collected from around the world, and then compared to one another to create an understanding of the quality and delta value-add that each data source provided.
As one can expect, this data set became rather large, so we created subsets using the country assignment based on the whois registration for each ASN in order to create lists of ASNs per country. Finally, we created visualizations for each country based on a snapshot in time, showing all ASNs identified for that particular country and each of their BGP peers, regardless of whether the BGP peer was identified as part of that country. All the ASNs local to the country are represented in red, while all the ASNs that were not local to that country because of peership are indicated in black. ASNs for the country that had no peers were shown as single dots around the central visualization.
These visualizations, where the red parts refer to the local country and the black parts are operated by a foreign ASN, show how impossible it is to keep data within one’s own country. As an example, we adjusted the US cartography to indicate how many ASNs were based in the US (green) and how many were in Europe (red). The results show a lot of overlap:
Country by Country Observations
Some of the observations we noted when looking at the routing landscapes for each country is that one can see some distinct differences in their approach.
In the US, the peering seems to cluster on three main ASNs, and many of the “blooms” in the visualization show large clusters of ASNs that only have a single BGP peership. These cases leave the country vulnerable to BGP attacks. If the single point to which any of the larger blooms connect to is BGP-hijacked, all the single ASN nodes behind it will lose their routing to the rest of the globe. Ultimately, their upstream ASN will also fall victim to the BGP hijack. Canada also has these same kinds of clustering and single peer blooms, but not to the same extent of tight clustering as the US.
We observed the exact opposite In Russia. ASN peering is highly distributed, thus making the internet space in that region highly stable. However, it also will make data localization particularly challenging so long as the bulk of the ASNs in Russia have non-Russian ASN peering.
Interestingly enough, despite the perception that China is segregated from the rest of the world’s infrastructure, China-based ASNs can be observed to have a number of foreign ASN peers. The assumption is that the country uses network-based traffic filtering rather than a BGP based one to enforce the data wall put in place. Additionally, China can be observed as having ASNs—that are not otherwise part of the core China-based internet—that have direct peering to foreign based ASNs. In all of these cases, the foreign-based ASNs appear to be subsidiaries or partners of China-based companies and likely received special approval for such connections.
Japan’s infrastructure is elegant in its simplicity. Despite the large number of users and IPs in the country, these all appear to use only a few ASNs that then connect to the rest of the world. The one weakness is that, like in countries other than Russia, there appears to be a susceptibility to BGP hijacking given how few active ASNs there are.
The German infrastructure was particularly interesting to observe, especially in light of the GDPR’s data in transit regulations. It has heavy peering with non-German-based ASNs, and if these ASN are not European-based, they may have GDPR regulatory requirements that they’re unaware of.
Globally, there are thousands of ASNs not in use. When studying the day over day data, we could observe that occasionally these ASNs that have long been abandoned would start to announce IP space and have peers for a short period of time -- and then stop again. In many cases, this is due to the use of the ASN as part of a criminal scheme to do xx announcements of IP space.
The visualizations immediately raised a number of data regulatory and localization questions.
How is the Internet Standardized and Regulated?
The majority of specifications and guidelines for the engineering of the internet were (and continue to be) developed by the Internet Engineering Task Force (IETF.org). The IETF is a non-profit organization formed in 1986 that allows anyone (without specific membership requirements) to join to get more participation in standards development and maintenance. Through a variety of Special Interest Working Groups, the IETF has published a series of voluntary standards over the years; these generally are one of three types: RFCs (Request for Comments), BCPs (Best Current Practices), and STDs (Standards). The Internet Architecture Board (IAB), whose role is to ensure consistency across the Working Groups and develop long term planning for voluntary standards development, oversees the IETF.
However, the IETF's standards are not legally binding. These are voluntary adoption standards that may or may not be followed; although successful integration with the rest of the internet is fairly dependent on compliance.
Relevant Laws and Regulations
If the IETF standards are not legally binding regulations for internet routing, then what is? The issue of laws, regulations, and jurisdictions become complicated when put into the context of cyberspace. Long gone are the days of “my data is stored on this disk” — can anyone even say for sure where their data has been, let alone attest to it for compliancy audits? Do people know for sure that those who handle their data are compliant to various regulations on privacy and data protection? Are they even subject to regulations by countries not their own?
Privacy, law enforcement access, and data localization — these three concepts all focus on the same thing, but from different perspectives. How do we protect, or gain access to, data on individuals’ activities online, when “online” could mean anywhere around the globe? This makes matters even more complex. Multiple factors make it a challenge to determine where your data resides and where it is during transit, whether due to the fact that organizations are moving to cloud based systems, or because of the way the internet's building blocks connect (via routing / BGP) in a completely geopolitically agnostic way.
After 9/11, global concerns on safety and security moved many countries to increase their surveillance laws in order to provide law enforcement and other investigative bodies easier access to information that can better assist them with their case. This is especially true in the US, where the first of these types of laws — the Patriot Act — was enacted. The Patriot Act granted law enforcement greater authority in tracking and intercepting communications — with the caveat that it was limited for the most part to data and communications within the US. More recently, the US passed the Cloud Act, a law that has caused apprehension amongst privacy advocates. The Cloud Act was meant to address inabilities to access data on US citizens stored outside of the US or data held by companies with a US presence stored outside of the US. The latter concept raises concerns because of the range of possible interpretations of the concept of “presence” in a digital world. In other words, this could be applied to just about any data stored or traversing the internet anywhere around the world, if one can make the case for the need to access it.
In response to privacy concerns raised by these new access laws, many countries enacted stronger privacy regulations to protect its citizens from these types of unreasonable intrusions into their digital lives. Perhaps the biggest and most notable one is the European General Data Protection Regulation (GDPR). The GDPR was intended to ensure that personal data is handled with data protection by design — aka the highest level of protection — by default. Data on and about the individual must be stored separately, and can only be retained with informed consent. The GDPR takes the approach that it is sufficient for the data to pertain to its citizens in order for it to be applicable, rather than just being physically stored within a European country. It contains specific articles (Articles 44-50) on the transfer of data outside of the country. Specifically:
Article 44 outlines that any transfer of personal data must be done with the highest levels of protection, and can only be done in a manner that does not undermine any of the protections outlined in GDPR.
Article 48 states that a third party country cannot access data protected under GDPR unless it’s under an international agreement such as MLAT or it constitutes a violation of GDPR.
Clearly, the regulations of GDPR and similar privacy protection laws are in direct opposition to those in access laws, creating a confusing situation where one set of regulations is non-compliant with the other. And although GDPR and the Cloud Act are used here as examples, many countries have enacted similar kinds of regulations. Mapping which regulations are applicable to your organization should be a top priority given the high cost of fines for non-compliance.
Strong Explicit requirements that data must be stored on servers within the country.
Brunei, China, Indonesia, Nigeria, Russia, Vietnam
De Facto Laws that create such large barriers to the transfer of data across borders that they effectively act as data localization requirements.
Partial Wide range of measures, including regulations applyingonly to certain domain names and regulations requiring the consent of an individual before data about them is transferred internationally.
Belarus, India, Kazakhstan, Malaysia, South Korea
Mild Restrictions on international data transfer under certain conditions.
Argentina, Brazil, Colombia, Peru, Uruguay
Sector-specific Tailored to specific sectors, including healthcare, telecom, finance, and national security.
Australia, Canada, New Zealand, Taiwan, Turkey, Venezuela
None No known data localization laws.
Are Regulations Consistent With the Way the Internet Operates?
The problem with many data privacy and localization laws and regulations is that they do not take into account how internet routing actually works.
In the case of data privacy regulations, there appears to be a tendency in most of these to treat data routing with a client/server application layer mentality. Most treat the concept of “data goes from point A, the server, to Point B, the client” as if there were three boxes – data stored at one location, data stored at the end location, and then “the middle bit.” There seems to be a misunderstanding of what happens between Point A and Point B, ranging from privacy standards that treat it as if it is a local data store as well, to treating it as if data is transported in a straight and direct path between the two, with no intersections or stops in between. The technical reality is “the middle bit” is a series of data transportations then data stores (when it hits the next hop router in the path it is traveling, similar to a red light at a road traffic intersection), then another set of data transportations and data stores until it makes all the necessary hops to the final destination. Most global privacy standards do not take into account the geopolitical-agnostic nature of routing, and often do not truly understand how frequently data crosses various international borders. And those few that do take into account how data bounces around require textbook high-end encryption, without understanding that implementing it to be compliant at this scale would become so cost prohibitive that it would prohibit its adoption and implementation.
Turning to data localization laws, or in other words, laws that have been implemented to avoid the multijurisdictional data transversal issues raised above, there have been many famous cases where countries attempted to cut itself off from the internet in an attempt to control its citizens' data traffic:
Syria: Internet connectivity between Syria and the outside world was shut down in late November 2011 and again in early May 2013 in order to control information regarding the ongoing civil war at the time., However, news of the ongoing events was still able to flow in and out of the country.
Pakistan: In one of the best examples of issues arising from network route manipulation, an attempt to control its citizens' access to YouTube resulted in Pakistan inadvertently disabling access for most of the globe to the popular web service.
Yemen: Rebels managed to disrupt nearly 80% of all internet traffic to the country in July 2017 when they cut one of largest fiber cables into the country. Single points of failure in the design of the country’s infrastructure led to what could have been a very serious national security threat.
Additionally, some countries have heavily segregated the internet infrastructures within the country from outside connections by design. These include countries such as:
These countries have implemented strict technical and legislative controls within their borders in order to limit their citizens and the content of other countries. However, China’s internet map, for example, shows that digital segregation does not completely cut off access from outside connections (shown in black).
Ironically, our research found that it seems the more robust a network a country has, the harder it is to enact data localization and geographic network segregation. Russia, by far, has one of the most robust routing maps of all countries we have investigated thus far:
The visualization above shows that the number of peering points within its network minimizes the risk of national internet outages. Unfortunately, this works against their attempt to implement Federal Law No 242-FW, which requires that all Russian citizens’ personal data be collected, stored, and processed fully within Russian borders. This regulation also applies to multinational companies that operate in Russia, even if they have no current physical presence but are used by or contain information about Russian citizens. This has forced many large organizations such as Amazon, Google, and Facebook to set up data centers in Russia in order to continue their operations there. Additionally, Russia has been experimenting with models that disconnect it from the internet for several years now, and, based on the global routes, this was attempted again earlier this year. While not a complete success, Russia is now publicly stating that its attempts at localization have been successful enough to allow itself to lock out the West should political tensions rise sufficiently.
In an era so fraught with political tensions, these regulations are only going to become more complex. It would behoove any organization to carefully consider which of these many various global regulations it must adhere to when designing its network architecture and outbound connection points. This includes considering what types of data will be flowing across its network peering points, as well as understanding where its own data goes and the data path it will take across the internet. Any mistake could be costly; The GDPR includes fines of up to 4% of global revenues, or up to €20 million, whichever is higher, for such omissions, and it’s been shown that data breaches can irreparably damage the reputation of a brand or organization.
There is a mismatch between the nature of the internet and the ways we regulate it. We are making a call to action for both governments and organizations to recognize that this is a global issue that cannot be solved by any one government or organization, as each plays a vital role in the solution. It will take a global united effort to address the issues raised by our current routing infrastructure:
International routing guidelines are not legally binding, but routing does not adhere to geopolitical boundaries.
Despite various regulations, data is still prone to theft when misrouted to unintended destinations via BGP hijacking.
Threat actors are realizing flaws in the foundational components of internet infrastructure as shown by a marked increase in BGP-based attacks (BGP Hijacking, IP spoofing, etc).
Governments should have a better understanding of the nuances of data routing across the globe, including the realization that internet cabling and routing does not allow for geopolitical distinction. They should also understand the complicated interrelationships between internet engineering, data privacy considerations, and cross-geopolitical impacts of regulations — especially since the nature of data routing means that legislation from one country will also apply to others outside of that country.
For organizations, there is a need to understand how regulations affect them, and to recognize that poor network engineering on their part contributes to global security and privacy issues.
The issue of digital sovereignty is still far from polished — organizations and individuals that create laws and regulations need to collaborate with those that design and implement how the internet works in their respective countries.
For governments, there is a need for a more realistic approach to data protection, both for privacy and localization. In an ideal scenario, they would implement regulations that consider several factors:
Realizing that it will cost billions to retrofit the internet so be aware that this is not possible, nor enforceable.
Realizing that it will be near impossible to implement true data localization, so truly private information must be encryption encapsulated in some form.
Requiring ISPs and Telcos to have stronger identification and request authentications standards for IP space reassignments.
Funding national CIRTs to do BGP monitoring for the ASNs within their country to see when routes are being hijacked such that alerting and mitigation can be done on these.
As the operators of the bulk of the ASNs within a given country, the role of private sector enterprises is even more important. In this case, the private sector can play an even bigger role than governments in protecting their country’s national interests, thus they should keep in mind the following:
Ensuring that organization peers directly (or no more than one) hop between the organization and the outsourced data cloud service providers. This will help promote transparency in data transmission.
Implementing systems so that there are at least two or more outbound peers, in order to minimize the potential damage should one peering point become unavailable.
Assessing the transmission path for data to determine the geographic paths the data will traverse.
Ensuring that the network has BCP38 and 82 enabled, to limit the potential of spoofed IP addresses traversing the network.
Building strong network route monitoring. This will help organizations quickly detect if data is being routed – accidentally or maliciously to places not intended, which can lead to serious consequences (a quick look at the website BGPMon.net will show how prevalent this issue is).
Ensuring that peering partners have BPC38 and 82 implemented as part of an organization’s peering agreements.
Levering influence throughout the supply chain and include these same requirements contractually of the suppliers.
Like it? Add this infographic to your site: 1. Click on the box below. 2. Press Ctrl+A to select all. 3. Press Ctrl+C to copy. 4. Paste the code into your page (Ctrl+V).