Mining Through Mountains of Information and Risk: Containers and Exposed Container Registries

In this entry, we continue delving into an investigation of exposed registries and look at the types of files and information that malicious actors can access and compromise from these.

October 09, 2023

By Alfredo de Oliveira and David Fiser

In a previous article, we discussed the potential risks of open registries, where our investigation revealed a large amount of exposed data that threat actors could exploit.

The communication between registries and container runtime 

We know that container registries can be likened to libraries where container images are stored and cataloged. However, to better understand how various environments can pull these images from registries, we need to look at the role that application programming interfaces (APIs) play. When a container is spawned, an API request is sent to the registry to pull an image, which is composed of application files, environment variables, configuration files, and libraries. Once fetched, the image from the registry is sent back to the runtime which uses it to create the container.

Figure 1. The interaction between a container runtime and the API

Figure 2. The layers that compose a container image

Diving deeper into open registries 

We looked further into the open registries that we discovered during our research and found a wealth of information, both in volume and sensitivity. In the following sections, we scrutinize the contents we found to underline how perilous unsecured registries can be. 

First, we noticed that some of these private registries were used as a versioning keeper: a backup of numerous versions of the same image for developed products. In our previous entry, we mentioned finding 9.31 TB worth of downloaded images, 197 dumped unique registries, and 20,503 dumped images; however, these numbers come from only the latest three versions of repositories that had versioning behavior, suggesting that actual numbers might be exponentially bigger. 

Figure 3. Various versions are kept in the registry as seen online.

Figure 4. Pulled data on the latest three versions of repositories that are publicly accessible from the registry revealed one image sample containing 42 versions.

The volume of data was challenging not only to download and store, but also to sift through for sensitive data, which proved to be a tedious task. To search these nested archiving layers, first we used regular expressions to search the images for matches on secrets and other sensitive data. Next, we used a tool to search the container images for sets of files that are not direct leaks but could potentially expose the security of the container or the project running on it. In the following section, we discuss a detailed breakdown of our search. 

Figure 5. Image base layer operating system distribution

As we sifted through the data of the container images, we also took a close look at base images. Base images work like the backbone of these containers: They provide the core capabilities of the operating system and create the environment where the containerized applications run. They also play an integral role in the security, speed, and performance of the containers. 

Among the base images we found, Alpine, Debian, and Ubuntu represented more than 80% of the containers we looked at. Despite being favored for use, it is worth examining whether they allow for building secure and production-ready applications.  

Lightweight Alpine is often preferred by developers who want to keep their containers lean; this base image is slimmed-down and tiny, best for simple applications. This also means it might not have the muscle and security features necessary for more complex and heavy-duty applications. From the data we were able to look at, we observed that the vast majority of the Alpine base images were older and possibly vulnerable versions, with only 27 images in their latest version at the time of writing. 

Figure 6. Alpine Base layer operating system version distribution

On the other hand, heavyweights Debian and Ubuntu are larger and can support more complex applications. But this capability also comes with risk: Every additional component within a base image can be leveraged by threat actors for entry. Application developers should therefore be careful when managing dependencies in these base images to prevent vulnerabilities from being exploited.  

Another notable observation from our research revealed very few distroless images within the open registries. Distroless images are designed to be both lightweight and secure, providing just the bare necessities to run an application. They cut out extra components that bulk up a base image, which means fewer potential security weak spots. Despite these benefits, which we elaborated on in a comparative study, distroless images aren’t widely used, as seen in this investigation.  

Source code and sensitive information leaks 

Another disturbing finding from our research revealed source code leakage, a serious security issue that unveils security configurations that could allow attackers to generate valid tokens or open authorization (OAuth) flow secrets to gain unauthorized access to systems and data. This could potentially allow a malicious actor to impersonate application users, further escalating the potential damage. Source code leakage also poses a business risk, as it exposes proprietary algorithms or business logic that can either be exploited by competitors to gain a competitive advantage, or by malicious actors to disrupt operations.  

The most alarming discovery in our investigation was the leakage of sensitive information, including secrets found within the file system and container image metadata that specifies environmental variables using environment directives. The following list shows the types of sensitive data we found within the images: 

S3 Keys. These keys could potentially give attackers access to cloud storage buckets, which could contain more sensitive data or be used to host malicious content. 

CSP access keys. These keys could give an attacker access to cloud service providers (CSPs), potentially allowing them to spin up resources at the organization's expense, access more data, or carry out attacks from the organization's cloud environment. 

Database (DB) authentication credentials. These credentials could give an attacker access to databases, potentially allowing them to steal, modify, or delete data. 

Application credentials. These could give an attacker access to private applications, potentially allowing them to carry out actions as a legitimate user or access more sensitive data. 

JSON Web Tokens (JWT). These are often used for authentication and could give attackers access to systems or data. 

Figure 7. Distribution of files of interest that contained possible sensitive data in clear text

In the following section, we look closely at each of the file types in the preceding figure and the potential risk they pose if accessed by malicious actors. 

.Env 

.Env files are perhaps the most sensitive type of files we found in this research. These files are widely used in programming environments to store environment variables and can include database credentials, API keys, and other secrets that the application needs to run. However, these should not be hard-coded into the application's source code.  

If a .env file is exposed, an attacker could gain access to the aforementioned resources, which could lead to unauthorized database access, unauthorized API calls, or other unauthorized actions not only on the application but on other services associated to the project, such as cloud buckets and remote databases. The potential impact could range from data theft to service disruption and even infrastructure takeover, depending on the permissions associated with the compromised credentials. 

Figure 8. The images above show a small amount of the sensitive information exposed in clear text on .env files

Dockerfile 

Dockerfiles are the backbone of Docker images and containers. They contain a set of instructions that Docker uses to build an image, which can then be used to run containers. These instructions can include the base image to be used, the commands to be run during the build process, the ports to be exposed, the files and directories to be copied to the image, and the command to be run when a container is started from the image. Dockerfiles can also contain sensitive information such as environment variables, which can include API keys, database credentials, or other secrets.  

If a Dockerfile is exposed, it could reveal potential security vulnerabilities in the software stack, such as outdated or vulnerable base images or dependencies. It could also provide an attacker with information about the internal structure of the Docker image, including the locations of important files or directories, the commands used to run the application, and any exposed ports. 

Figure 9. Sensitive information exposed in clear text on Dockerfile files

Application_default_credentials.json 

These files are specific to another cloud platform service environments and contain service account credentials. These credentials are used to authenticate applications running on GCP, allowing them to interact with other GCP services. The file typically contains information on the type of account, the client ID and client secret, the authorization URI, the token URL, the authentication provider x509 certificate URL, and the client x509 certificate URL.  

If this file is exposed, an attacker could potentially gain access to services associated with the cloud platform. This could lead to unauthorized access to sensitive data stored in GCP services, unauthorized modifications to GCP resources, or unauthorized actions such as starting or stopping services. The potential impact could range from data theft to service disruption or even infrastructure takeover, depending on the permissions associated with the compromised service account. 

Figure 10. Sensitive information exposed in clear text on “application_default_credentials.json” files

Manifest.yml 

These files are often used in cloud applications, specifically in platform-as-a-service (PaaS) offerings. These contain information about the application's structure, services, and sometimes even environment-specific configurations. This could include the number of instances, the memory limit for each instance, the route (URL) of the application, and the services to be created and bound to the application.  

If exposed, manifest.yml files could reveal application dependencies, service names, and other metadata that could be used to map the application's architecture and exploit potential vulnerabilities. An attacker could gain insights into the application's structure, dependencies, and environment configurations, which could be exploited to perform targeted attacks. 

Id_rsa 

These files are typically private secure socket shell (SSH) keys, used for SSH public key authentication. The corresponding public key is kept on the systems that the user wants to access, and the private key is kept secret by the user. When the user tries to connect to a system, the system uses the public key to encrypt a challenge message. The user's SSH client then uses the private key to decrypt the message and send the response. If the response is correct, the system grants access.  

If an id_rsa file is exposed, an attacker could use it to gain unauthorized access to any system where the corresponding public key is installed. This could potentially lead to data theft, system disruption, or further network intrusion. The attacker could also potentially escalate privileges or move laterally within the network, depending on the permissions associated with the compromised SSH key. 

Jenkinsfile 

Jenkinsfiles are used in open-source automation server Jenkins to define the continuous integration and continuous delivery and continuous deployment (CI/CD) pipeline. It contains a set of instructions that Jenkins uses to build, test, and deploy applications. These instructions can include the stages of the pipeline, the steps to be performed in each stage, the nodes where the steps should be performed, and any environment variables or credentials to be used.  

If a Jenkinsfile is exposed, it could reveal sensitive information about the build and deployment process. This could include the locations of source code repositories, the commands used to build or test the application, the locations of deployment servers, or the credentials used to access these resources. An attacker could potentially use this information to introduce malicious code into the pipeline, disrupt the application's delivery process, or gain unauthorized access to source code or deployment servers. 

Package-lock.json 

These files are used in Node.js applications and are automatically generated or updated by the node package manager (NPM) whenever the NPM install command is run and a package.json file is present. It describes the exact tree that was generated when the application's dependencies were installed, including the exact versions of each dependency and their respective dependencies.  

If a package-lock.json file is exposed, it could reveal potential security vulnerabilities in the dependencies. An attacker could potentially use this information to exploit known vulnerabilities in the specified versions of the dependencies. The file could also provide an attacker with information about the application's structure, including the names and versions of all dependencies and their interdependencies. 

Azure-pipelines.yml 

These files are used in Azure DevOps applications and define the build and deployment pipeline with a set of instructions that Azure Pipelines uses to build, test, and deploy the application. These instructions can include the stages of the pipeline, the jobs to be performed in each stage, the steps to be performed in each job, and any variables or resources to be used.  

If an azure-pipelines.yml file is exposed, it could reveal sensitive information about the build and deployment process. This could include the locations of source code repositories, the commands used to build or test the application, the locations of deployment servers, or the credentials used to access these resources. An attacker could potentially use this information to introduce malicious code into the pipeline, disrupt the application's delivery process, or gain unauthorized access to source code or deployment servers. 

Ansible.cfg 

These files are used in Ansible applications to configure various settings for the Ansible command-line tool and the Ansible playbook runner. It can contain settings related to connection types, privilege escalation, forks, timeouts, persistent connections, and plug-in paths, among others.  

If an ansible.cfg file is exposed, it could reveal sensitive information such as remote system details, privilege escalation settings, or other configuration details. An attacker could potentially use this information to exploit the Ansible system or the systems it manages. The potential impact could range from unauthorized command execution to data theft or system disruption, depending on the permissions associated with the compromised Ansible configuration. 

Service.yaml 

Service.yaml files are often used in Kubernetes. They define a service, an abstraction that defines a logical set of pods and a policy to access them. Pods are the smallest and simplest unit in the Kubernetes object model. The service.yaml file can contain information about the type of service, the ports to be exposed, the selector to determine which pods will receive the traffic, and any session affinity settings.  

If a service.yaml file is exposed, it could reveal the internal structure of the Kubernetes service. An attacker could potentially use this information to map the application's architecture, exploit potential vulnerabilities in the service or the pods it manages, or disrupt the service's operation. 

Docker-compose.yml 

These files are used in Docker Compose, a tool for defining and running multi-container Docker applications. Docker-compose.yml defines services, networks, and volumes for a Docker application in a YAML format and can contain information about the containers to be run, the networks to be created, the volumes to be mounted, and any environment variables to be set.  

If a docker-compose.yml file is exposed, it could reveal the internal structure of the Docker environment. An attacker could potentially use this information to map the application's architecture, exploit potential vulnerabilities in the containers, networks, or volumes, or disrupt the application's operation. 

Figure 11. Sensitive information exposed in clear text on “docker-compose.yml” files

.Gitlab-ci.yml 

These files are used in GitLab CI/CD and define the build and deployment pipeline for an application. It contains a set of instructions that GitLab Runner uses to build, test, and deploy the application. These instructions can include the stages of the pipeline, the jobs to be performed in each stage, the scripts to be run in each job, and any variables or artifacts to be used. 

If a .gitlab-ci.yml file is exposed, it could reveal sensitive information about the build and deployment process. This could include the locations of source code repositories, the commands used to build or test the application, the locations of deployment servers, or the credentials used to access these resources. An attacker could potentially use this information to introduce malicious code into the pipeline, disrupt the application's delivery process, or gain unauthorized access to source code or deployment servers. 

Figure 12. Sensitive information exposed in clear text on “.gitlab-ci.yml” files

.Dockerignore

These files are used in Docker to specify a pattern of files and directories that should be ignored when building a Docker image. A .dockerignore file functions similarly to a .gitignore file in Git, but instead of ignoring files from version control, it ignores files from the Docker build context.  

If a .dockerignore file is exposed, it could reveal sensitive information about the structure of the Docker image. An attacker could potentially use this information to infer the locations of important files or directories, the structure of the application's source code, or the application's dependencies. 

Server.js 

These files are often the main entry point in Node.js applications. It typically contains the code to start a server and define any routes for the application.  

If a server.js file is exposed, it could reveal sensitive information about the application's structure and functionality. This could include the locations of other source code files, the routes and endpoints exposed by the application, and any middleware used by the application. An attacker could potentially use this information to exploit vulnerabilities in the application's code, disrupt the application's operation, or gain unauthorized access to the application's data. 

.Npmrc

These files are used in Node.js applications to configure NPM settings. It can contain settings related to the registry, the cache, the prefix, and other NPM behaviors. It can also contain authentication tokens for private registries or scoped packages.  

If a .npmrc file is exposed, it could reveal sensitive information such as private registry credentials or other configuration details. An attacker could potentially use this information to exploit the NPM environment, gain unauthorized access to private packages, or disrupt the application's operation. 

Main.yml 

These files are often used in GitHub Actions and define the workflow for an application. It contains a set of instructions that GitHub uses to build, test, and deploy the application. These instructions can include the events that trigger the workflow, the jobs to be run, the steps to be performed in each job, and any environment variables or secrets to be used.  

If a main.yml file is exposed, it could reveal sensitive information about the build and deployment process. This could include the locations of source code repositories, the commands used to build or test the application, the locations of deployment servers, or the credentials used to access these resources. An attacker could potentially use this information to introduce malicious code into the workflow, disrupt the application's delivery process, or gain unauthorized access to source code or deployment servers. 

.Travis.yml

These files are used in Travis CI and define the build and test process for an application. It contains a set of instructions that Travis CI uses to build, test, and sometimes deploy the application. These instructions can include the language, the version of the language, the services to be used, the script to be run, and any environment variables or secrets to be used.  

If a .travis.yml file is exposed, it could reveal sensitive information about the build and test process. This could include the locations of source code repositories, the commands used to build or test the application, the locations of deployment servers, or the credentials used to access these resources. An attacker could potentially use this information to introduce malicious code into the process, disrupt the application's delivery process, or gain unauthorized access to source code or deployment servers. 

Figure 13. Sensitive information exposed in clear text on “.travis.yml” files

Vagrantfile 

These files are used in Vagrant to define and configure virtual machines with a set of instructions that used to create, configure, provision, and control virtual machines. These instructions can include the base box to be used, the provider, the network configuration, the synced folders, the provisioners, and any environment variables or secrets to be used.  

If a Vagrantfile is exposed, it could reveal sensitive information about the virtual environment. An attacker could potentially use this information to exploit vulnerabilities in the virtual machines, disrupt the virtual environment's operation, or gain unauthorized access to the virtual machines. 

Figure 14. Sensitive information exposed in clear text on Vagrantfile files

Self-hosted versus cloud service-hosted registries 

The amount of data revealed in this research begs the question of how to secure registries, as well as whether they should be self-hosted or hosted on cloud services. Inevitably, both options of self-hosting your own registry and using a cloud service offer their own set of advantages and disadvantages. Hosting your own registry provides you with more control over access, storage, and the location of images. However, it also means you bear the responsibility of manually configuring your registry’s security and maintaining the server it is hosted on.  

Conversely, using a cloud service can be more convenient as it can cover server maintenance and provide several built-in security features, with the trade-off being that you have less control over the registry. 

Fig 15. Difference between self-hosted and cloud service registries

The risk that comes with convenience 

There is a stark contrast between the traditional model of deploying applications on separate workloads and the modern approach of using containers. This difference can have significant implications for security, and understanding these implications is crucial for anyone planning on developing, deploying, or securing applications. 

In an infrastructure without containers where application services are typically spread across multiple workloads that could be virtual machines or physical servers, each service runs on its own workload and is therefore isolated from the others. This separation can provide a degree of security, as an attacker would need to compromise multiple workloads to gain access to all the services of an application, and only then would they obtain full comprehension of their target. Breaching every single workload could be time-consuming and a challenging task that requires a high level of skill and resources. 

However, the advent of containers has changed this landscape dramatically. Developers package application services into a single unit that allows for increased portability, scalability, and efficiency within a stable environment. With its multi-isolation features, it can be easy to recognize containers as secure by default. Containers are even used beyond specialized workloads. But while they offer many benefits, containers also introduce security challenges that could be overlooked. 

DevOps practitioners influence the security of container ecosystems the most. There are consequences in using containers for building containerized applications, hard-coding secrets needed for successful builds, and then pushing them into the same container image registry as the build container image. The build container image, which has all the secrets referenced, should ideally only be accessed at runtime. However, the image that builds the application can access the secrets from within the same container image registry because they reside there together. This gives anyone who has access to the image repository access to these very sensitive tokens. While our investigation revealed secure implementation of secrets references within the application image, its build images contained these secrets anyway. 

Numerous containers we found in the open registry were used for CD/CI pipelines, including target applications, builds, and deployments. By nature, these require access to secrets such as content management system software for pulling source code or API tokens for repositories, as well as cloud service access and log systems, among others. We found these secrets hard-coded within the container images via file system or even as .env directives within Dockerfiles; this means they are saved within the container image metadata, which is the same as storing them within the file. These pose a high risk for information exposure when pushed to unsecure container registries.  

Whenever containers are used within the CI/CD pipeline, careful attention should be paid to managing secrets, as this entry shows. In particular, threat actors can exploit vulnerabilities in poor security designs of container images, as these designs offer them effortless access to the application’s architectural information and an extended attack vector. It is worth noting that this scenario has been observed in multiple cases with unsecure image registries.  

On the other hand, exposed container registries, if misconfigured, can provide public access to multiple container images. An attacker who gains access to these images can examine them in detail, as well as learn about the application's structure, dependencies, configuration, and even source code. This underlines the attention that should be given to finding and identifying misconfigurations, as when exploited, they leave a greater impact on victims, especially those operating in cloud environments.  

The shift from traditional servers to containers has transformed the security landscape. However, while containers offer many benefits, their wide use requires a new approach to security that requires care and diligence.  

While the traditional model of fortifying each server separately does not apply in the world of containers, this system can be imitated by securing containers as if each were a complete application, with its own set of defenses. This entails encrypting images when possible and creating ACLs, firewalls, or control access through virtual private networks (VPNs). DevOps practitioners should always take caution when considering the need to use containers, not to mention explore other more secure solutions. 

Conclusion 

In this research, the main security issue started with open container registries. Nonetheless, thoroughly investigating them revealed associated risks that development operations should be aware of.  

We recommend the following to mitigate security risks: 
Do not hard-code secrets within files inside container images. 
Use vaults to store secrets and their references. 
When using environmental variables for secrets, do not store them within Dockerfiles or .env files. Instead, inject them at the container runtime. 
Make sure your private container registries are not accessible to the public. 
Encrypt container images.

Container image registries should be regularly checked for misconfigurations and constantly scanned for vulnerabilities, malware, and secrets. For a lightweight and secure way to run applications, go distroless and use a secrets manager to inject secrets into container runtimes. 

By understanding these challenges and taking proactive steps to mitigate them, developers and security professionals can harness the power of containers without exposing their applications to unnecessary risk. 

HIDE

Like it? Add this infographic to your site:
1. Click on the box below. 2. Press Ctrl+A to select all. 3. Press Ctrl+C to copy. 4. Paste the code into your page (Ctrl+V).

Image will appear the same size as you see above.

Posted in Virtualization & Cloud, Virtualization, Containers