Analysing Linus Torvald’s Critique of Docker

December 3, 2024

This article looks at Docker’s security flaws, particularly its shared-kernel model, and contrasts it with traditional VMs for better isolation. It discusses Linus Torvalds’ concerns, explores mitigation techniques, and proposes a roadmap for building a more secure containerisation platform using hardware-assisted virtualisation, true isolation, and a robust orchestration layer.

Docker revolutionised the way we think about software deployment. It’s a lightweight, portable, and scalable solution for containerising applications. But there’s a flag—Linus Torvalds. Or more precisely, Linus’s apprehensions with this tech. I have been in the tech space for a good 15 years now. And as far as I can tell, Linus Torvald’s intuition about a piece of technology has never failed him.

Take blockchain for instance. When everyone was going gaga over the technology back in 2020, Linus didn’t seem all that excited. The sheer complexity of the technology bothered him, and he could already see the issues with scalability of such technologies. Similarly, consider his current stance on the AI boom. While he is impressed by the incredible developments taking place, he is not too sold on the whole AGI hype. It’s easy to see that he has a nose for smelling tech ‘bs’ from a mile away and I trust that.

So when it comes to his critique of Docker, I decided to take it seriously and pay close attention to the aspects of the technology that seem to bother him. My hope is that by the end of this article, I may be able to better articulate the issues in Docker from Linus’s perspective, while also providing potential solutions and next steps for this tech.

Architecture

To understand Docker’s security limitations, we need to examine its core architecture, which revolves around Linux features like namespaces and cgroups (control groups). These components are crucial for container isolation, but they’re not designed to provide the kind of security guarantees you’d expect from full virtualisation.

Docker utilises Linux namespaces to create the illusion of isolation by partitioning kernel resources. Here’s a breakdown of how each namespace contributes.

PID namespace: Provides separate process ID trees, so each container believes it has its own PID space.
NET namespace: Gives each container its own network interfaces, routes, and firewall rules.
MNT namespace: Isolates the filesystem, allowing each container to have its own directory structure.
IPC namespace: Manages inter-process communication isolation, preventing containers from accessing each other’s message queues and semaphores.
UTS namespace: Offers hostname and domain isolation, making each container feel like it’s running on a unique machine.
USER namespace: Maps user IDs from the host to container-specific ranges, potentially offering a layer of privilege isolation.

Here’s a simple example of creating a Docker container with user namespace isolation enabled.

docker run --rm -it --userns=remap busybox

While namespaces do a decent job of separating resources, they don’t offer airtight isolation. For example, if a container exploits a kernel vulnerability, it could potentially access other namespaces or even the host system. This risk makes namespaces inherently weaker than the isolation provided by hypervisors in virtual machines.

Control groups (cgroups)

Cgroups limit and account for a container’s resource usage (CPU, memory, I/O, etc). This ensures that a container can’t hog system resources, providing stability in multi-container environments. For instance, you can limit a container’s memory the following way.

docker run -m 512m my-container

However, cgroups aren’t designed to enhance security. They are more about controlling resource usage. If a container is compromised, cgroups won’t prevent it from attempting to exploit the shared kernel or other containers.

A cross-comparison with traditional virtualisation

I know that many people criticise Docker for the elements of a traditional virtual machine (VM) that it is missing. But that’s the thing — Docker’s not trying to be a VM! Virtual machines operate with a hypervisor sitting directly on the hardware, creating completely independent environments. Each VM has its own dedicated OS, kernel, and everything else, which means they’re genuinely isolated from one another. Even if you manage to break into one VM, you’re still miles away from the others. That’s security in the real sense.

With Docker, you’ve got containers that run on a shared OS kernel. There’s no hypervisor acting as a guard and no full OS instance per container. They’re just processes running on the same machine, isolated by Linux namespaces and cgroups.

Here’s an example to put it in perspective. In a VM, the process of creating an isolated environment might look something like this:

virt-install \
--name=my-vm \
--ram=2048 \
--disk path=/var/lib/libvirt/images/my-vm.qcow2,size=10 \
--vcpus=2 \
--os-type=linux

That’s a full-blown instance with its own kernel. Now, in Docker, the equivalent setup is as simple as the following example.

docker run -d --name=my-container nginx

Sure, that container is up and running in seconds, but it’s still using the host’s kernel, sharing it with every other container on the system. That’s why Docker’s isolation isn’t quite what it seems.

Security concerns

When you share a kernel, you increase the attack surface. If a container finds a way to exploit that shared kernel, it’s game over. That’s not me being dramatic; this has happened before with container escape vulnerabilities like CVE-2019-5736, where attackers broke out of the container and gained root access on the host. It’s these kinds of risks that VMs don’t have to worry about because each one is running its own kernel.

You may argue that tools like AppArmor, SELinux, and seccomp profiles can mitigate these risks in Docker. But let’s be real — adding layers to compensate for a fundamentally shared architecture isn’t the same as having true isolation. When all containers are tied to the same kernel, any vulnerability there becomes an open door to all containers on that host. Here’s a quick-and-dirty snippet showing how a container breakout might look using a chroot escape technique, for example.

mkdir /host
mount --bind / /host
chroot /host

What’s happening here? We’re remounting the host filesystem within the container’s process space, effectively bypassing the container’s isolation. This is an example of why Docker’s kernel sharing is fundamentally dangerous. In a VM, this would never fly.

By default, Docker containers run as root. Is that a good thing? That’s a whole different debate, but for the sake of this topic, no it’s not. And while you can mitigate this by running containers as non-root users, many developers skip that step (I, too, do this) because we are lazy. When you run as root, and an attacker compromises your container, they’re not just in the front door; they’ve got the keys to the entire house. If you’re sceptical, try it yourself. Spin up a container and check the user.

docker run -it ubuntu bash
whoami

Nine times out of ten, you’ll see the root. This means any vulnerability in that container has the potential to escalate privileges to the host level if it exploits a kernel flaw. Docker’s architecture just isn’t built for zero-trust environments or situations where security is non-negotiable. You can stack on AppArmor, SELinux, seccomp, and all the security frameworks you want, but these are band-aids, not solutions.

Linus’s argument

Just so I am clear, Linus Torvalds can, in no way, be classified as anti-Docker. He does see potential in this technology, and he is right to do so. Docker has made life ultra simple for nimble programming and deployment of software across devices and platforms. Having said that, Linus has been pretty vocal about the fact that Docker’s reliance on shared kernels isn’t true virtualisation.

Linus was quick to criticise Docker’s dependence on user namespaces as a supposed solution to privilege isolation. In theory, user namespaces allow containers to map root privileges to non-root privileges on the host, which sounds like a solid security measure. But in practice it’s a leaky bucket. If you’ve got a process running as root inside a container, even with user namespaces mapping it to a non-root user on the host, a kernel exploit still has the potential to bypass this and gain root access on the host machine. We discussed this at length in the previous section.

Linus has also criticised how Docker containers share kernel modules and libraries with the host. Any vulnerability in these shared resources potentially opens up every container to exploitation. Docker tries to handle this with layered filesystems, but that’s more like a patchwork solution, something that someone as meticulous as Linus can never be fond of. This becomes even more of a concern when we start layering multiple containers on a single host, which is standard in most deployments. One crack in that foundation, and you’re dealing with a full-scale breach.

I am not trying to put words in Linus Torvald’s mouth, but I get the sense that his argument is more akin to the idea that Docker is pretending to be something it’s not. VMs, with their hypervisors and dedicated kernels, offer the kind of security that Docker just can’t replicate. And when it comes to environments where isolation is crucial, Linus would argue that Docker is simply out of its depth. He isn’t entirely wrong. Docker’s isolation model is weak compared to VMs, and its dependence on shared kernel resources is a ticking time bomb in high-stakes environments. But does that mean Docker’s worthless? Not at all. It’s a matter of understanding the risks and using Docker where it makes sense, not pretending it’s a one-size-fits-all solution.

Something better than Docker

To wrap this whole thing up, I would like to quickly suggest some potential solutions. I titled this section as ‘Something better than Docker’ because I understand that some of these solutions might require a ground-up rebuild of Docker itself which might be as unideal as it might be impractical. So think of this section as an ideal repository for building something that has the benefits of Docker, but then also mitigates the potential risks. A perfect containeriser if you will (if there ever was one). If we want to build something that truly outclasses Docker, we need to address its weaknesses head-on and rethink containerisation from the ground up.

1. First off, we ditch the shared-kernel approach entirely. We need to build a micro-hypervisor model, where each container runs its own minimal kernel. This ensures that every container is genuinely isolated, similar to a lightweight VM but without the bloat. By employing a microkernel architecture, you’re essentially granting each container its own mini-OS that only loads essential components, drastically reducing the attack surface. This step eliminates the primary flaw of Docker’s shared-kernel model.

2. Next, leverage hardware-assisted virtualisation like Intel VT-x or AMD-V to handle isolation efficiently. This is where we’ll differentiate ourselves from Docker’s reliance on namespaces. With hardware support, each container will get near-native performance while maintaining strict separation. For example, instead of binding everything to a Linux kernel, containers will interact directly with hardware-level isolation, meaning exploits won’t have the chance to jump from one container to another.

3. We can’t ignore orchestration. Rather than bolting on security later, build an orchestration layer that enforces strict security policies from the get-go. This orchestration tool, think Kubernetes but with security baked in, will enforce seccomp, AppArmor, and SELinux profiles automatically based on container configurations. For instance, before launching a container, the orchestration layer could analyse its dependencies and generate a security profile dynamically, ensuring that each container only has access to the resources it needs.

4. Let’s go beyond the crude root vs non-root distinction Docker offers. Implement a permission system that assigns containers fine-grained capabilities, like capabilities management in modern OSes. You’ll create an RBAC model that defines precisely what a container can or cannot access such as network resources, storage, specific hardware, etc. Imagine having a declarative YAML file that specifies, down to the syscalls, which capabilities each container is granted, ensuring it only gets what it genuinely needs to function.

5. Containers shouldn’t be changing their state once they’re up and running. We must enforce immutable infrastructure, meaning containers are rebuilt from scratch for every update rather than being patched live. This prevents attackers from persisting inside a compromised container. Think of this as Docker’s “build once, deploy everywhere” mantra. It never truly worked for Java (also a technology that Linus absolutely hates), but it might just work for a containeriser. Changes require redeployment, not modification, thus ensuring that every running instance is identical to the tested version.

6. Build in real-time vulnerability scanning and automated patching. Containers should be scanned continuously, not just at build time. If a vulnerability is found, the system will either patch it in the background or alert you to rebuild the affected containers. This means integrating tools like Clair or Trivy directly into the platform, ensuring that no container runs outdated or vulnerable code.

Analysing Linus Torvald’s Critique of Docker

Architecture

Control groups (cgroups)

A cross-comparison with traditional virtualisation

Security concerns

Something better than Docker

NO COMMENTS

LEAVE A REPLY Cancel reply

Architecture

Control groups (cgroups)

A cross-comparison with traditional virtualisation

Security concerns

Something better than Docker

RELATED ARTICLES

Deploying Generative AI LLMs on Docker

The Rest of Rust: A Beginner’s Gateway

FOSS Security Tools: Binwalk

NO COMMENTS

LEAVE A REPLY Cancel reply