We'll respond shortly.
Using Linux Containers for tenant isolation rather than OS isolation from VMs or non-virtualized hosts exposes unique challenges. The Cloud Foundry team has worked on problems like that this quite a few times over the last few years. Recently we have been working on what we believe to be an unsolved isolation issue in Linux Containers for the /dev/random device. This new issue came to light after investigating reports of poor user experience on the mailing lists and attendees of Pivotal’s Cloud Platform road show. Sharing /dev/random is a challenge for any Linux container technology including Cloud Foundry Garden, Docker, LXC, etc. We are hopeful that we can work with the Linux community to help us weigh the various trade-offs and improve the experience and isolation for running multi-tenant Linux Containers.
During the public road shows, attendees learn about Cloud Foundry and they walk through an exercise to deploy a demo Java application running Apache Tomcat. Typically the application starts up in several seconds, but we started getting reports that users would consistently not be able to start the application within the 60 second default timeout and sometimes not even after multiple attempts. Why would a simple application that normally starts in several seconds suddenly start taking over 60 seconds?
A standard java debugging technique is to take a JVM thread dump, which will show you what the JVM threads are working on at any given point in time. This technique is very useful in identifying where something may be stuck, especially when comparing several thread dumps taken sequentially over a short amount of time. Once we looked at thread dumps from one of the instances that was slow to start up, we realized what was happening. When Tomcat initializes, it starts several security sub-systems, such as one responsible for generating session identifiers. These Tomcat security sub-systems use an API for random numbers from the standard Java libraries which uses the JVM configuration to choose an implementation for random numbers. On Linux, the JVM is set by default to read from the /dev/random device.
The /dev/random device is a blocking cryptographically secure pseudo random number generator (CSPRNG) that relies on the Linux kernel to gather measurements from device drivers like mice, keyboards and other sources into an entropy pool. The entropy pool is used to generate random numbers. As the /dev/random device is read, the entropy pool is drained. The /dev/random device will block if the entropy pool is empty until the entropy pool gathers enough noise to fulfill the amount of data requesting to be read. Operating systems running on virtualized servers that typically do not have keyboards, mice and other similar sources of unpredictable physical events attached therefore have lower rates of acquiring entropy.
The low amount of entropy on virtualized servers problem is exacerbated because many attendees of the road show attempt to start the application at roughly the same time. Therefore, Linux Containers running on the same host compete for a limited supply of entropy. This type of problem is sometimes referred to as a stampeding herd. The /dev/random device is a scarce shared system resource that Linux Container tenants likely have not realised they are sharing. When they all try to use it at the same time they are effectively causing a denial of service on each other.
Similar problems have come up before, such as Bitcoin miners trying to maximize the amount of shared CPU to generate virtual currency. To account for Bitcoin miners and other CPU heavy workloads, we use Linux cgroup CPU shares. Linux cgroups ensure a fair-share allocation was granted to the scarce resource, in this case CPU, thus preventing tenants from using more than their fair share. (when you run a public service with a free tier or trial, you will undoubtedly need to contend with Bitcoin miners at some point.)
There are several approaches the Java community particular have used to mitigate these types of issues. Some people advocate using /dev/urandom, which is a non-blocking and therefore much more performant implementation of a PRNG, and it is easy enough to change the JVM settings to change SecureRandom to use /dev/urandom. This approach would require changes to the default configuration of software running inside of the containers. We have found people are generally very apprehensive about changing default JVM and Tomcat approaches without strong assurances that the configuration is as secure as the default.
Another approach is to use a dameon like haveged, timer_entropyd, and randomsound to increase the rate that entropy is generated for /dev/random. This approach would be transparent to software running in Linux Containers. Our research on this approach found some concerns that haveged and timer_entropyd rely on a precision timer CPU instruction that may be virtualized, which exposes the reliance of any virtualized OS on the hypervisor capabilities and configuration. Other researchers have raised concerns about the PRNG generation techniques in /dev/random as being “not robust”.
So for the time-being we are not making changes to the Linux Containers or the hosts they are running on, we are documenting this behavior as a known issue and sharing the work-arounds to either increase the application startup timeout value or use /dev/urandom.
Even if we change Java and Tomcat behavior and improve user experience for this particular scenario, there still remains the underlying problem that Linux Containers share a random device with scarcity without controlling access. For example, non-Java workloads are also able to utilize Linux’s /dev/random device. Linux cgroups allocate CPU cycles with shares, which address both noisy neighbor and denial-of-service concerns. Perhaps a resource sharing wrapper around /dev/random whereby tenants are limited to use only their fair share of allocation of data when there is resource contention is something we could consider implementing.
There may be other ideas to consider as well, such as using an entropy generator using services commonly available like NTP. We want to collaborate with others on this problem, so please reach out to us on our mailing list if you would like to work with us on this and other problems in the Linux Container space.
Challenge: How to solve multi-tenant Linux Containers having contention over /dev/random on low-entropy environments
|Adjust container software configuration to use /dev/urandom||Each user is able to adjust the software configuration in the container that makes sense for them. Commonly recommended for JVMs that are not generating long-term cryptographic keys. Not dependent on generating additional entropy.||Non-default configuration for most software and requires user awareness of the issue and user intervention.|
|Provide supplemental entropy to the Linux host /dev/random||Each Linux Container transparently gets additional entropy without adjusting software configurations inside the container.||Concerns about whether the mechanisms this approach depends on are as secure as users expect. For example, entropy daemons may use virtualized CPU instructions that the virtualization provider may not implement securely.|
|Wrap /dev/random in Linux Containers with a fair share control or similar||Provides containers protection from noisy neighbors and DOS attacks.||Does not address scarcity of entropy on virtualized cloud servers.|
|Provide alternative /dev/random implementation in Linux Containers that use non-host /dev/random source of entropy||Isolates the host /dev/random from implementation in the Linux Containers.||Non-trivial burden of finding a secure source of entropy and dependencies on hardware or external services.|
|Do Nothing. Document this as a known issue and stick with standard defaults and status quo behavior||Users get the defaults they expect and are able to discover documentation with potential work-arounds.||Default user experience is sometimes poor and difficult for new users to troubleshoot and identify the root cause.|
Pivotal R&D Collaborators To This Issue
Mark Thomas, Ben Hale, Dieu Cao, Zach Robinson, Ryan Morgan, Matthew Kocher, Dmitriy Kalinin, Vik Rana, Andrew Shafer, Cornelia Davis, Glyn Normington, Steve Powell, James Bayer and Alex Jackson.
Thank you to VMware’s Glen McCready for providing feedback on this post.