Cilium: The De Facto Kubernetes Networking Layer and Its Exciting Future
Discover why Cilium, the first CNCF-graduated cloud-native networking project, is revolutionizing Kubernetes networking, security, and observability.
Join the DZone community and get the full member experience.
Join For FreeCilium is an eBPF-based project that was originally created by Isovalent, open-sourced in 2015, and has become the center of gravity for cloud-native networking and security. With 700 active contributors and more than 18,000 GitHub stars, Cilium is the second most active project in the CNCF (behind only Kubernetes), where in Q4 2023 it became the first project to graduate in the cloud-native networking category. A week ahead of the KubeCon EU event where Cilium and the recent 1.15 release are expected to be among the most popular topics with attendees, I caught up with Nico Vibert, Senior Staff Technical Engineer at Isovalent, to learn more about why this is just the beginning for the Cilium project.
Q: Cilium recently became the first CNCF graduating “cloud native networking” project — why do you think Cilium was the right project at the right time in terms of the next-generation networking requirements of cloud-native?
A: I would say there are multiple factors that would explain why Cilium has become the de-facto Kubernetes networking layer.
1. The emergence of eBPF happened simultaneously with the exponential growth and popularity of Kubernetes. For those not familiar with eBPF, it’s a groundbreaking technology that lets you hook into the Linux kernel and run custom networking programs.
Kubernetes itself relied on underlying technologies like iptables that were not built for the scale and churn of microservices. As the size of Kubernetes deployments increased, so did performance issues. For once, it wasn’t a solution looking for a problem — rather eBPF and Cilium were a perfect fit for Kubernetes. By using eBPF programs, Cilium was able to address a lot of the networking, security, and observability shortcomings inherent to Kubernetes.
2. The other reason why Cilium became so popular was because of the people and the community behind it. The founding team of Isovalent (the creators of Cilium and eBPF) had deep expertise in Linux kernel, open source technologies, and software-defined networking from their time at Cisco, Red Hat, and Nicira, the start-up that created the NSX network virtualization software that was eventually acquired by VMware. They kick-started Cilium and made it an exciting community and project to work on. The cast of engineers that built and improved Cilium is stellar. Open-sourcing Cilium and eventually donating it to the CNCF were also clearly influential in the project’s success.
Q: What is the current list of cloud service providers’ major Kubernetes cloud offerings that support Cilium as the default CNI?
A: Back in 2020, Google announced that GKE would use Cilium for its networking data plane. The following year, Amazon picked Cilium for its EKS Anywhere distribution. In 2022, Microsoft chose Cilium for Azure Networking with Azure CNI powered by Cilium mode. It’s also used by the likes of Digital Ocean, Alibaba Cloud, and many other providers around the world. Cilium has been adopted across so many cloud providers and Kubernetes distributions that it’s almost easier to list where it’s not the default!
Q: Cilium is one of the most active CNCF projects in terms of contributions, right up there with K8s and OpenTelemetry. How would you describe what those contributions are, and where all the activity is coming from?
A: Cilium has over 18,000 GitHub stars and 700 contributors. The pace of innovation around Cilium can be overwhelming. Every week, I record a demo of a new Cilium feature and I never run out of ideas! What I find most pleasing about the development of Cilium is that it’s not just Isovalent engineers who contribute to the project. On the contrary, the recently released Cilium 1.15 had lots of external contributors from the likes of Google, Elastic, and Booking.com. I interviewed some of these new contributors on a recent livestream and they praised how helpful and welcoming the community was. And even with Cisco’s upcoming acquisition of Isovalent, I don’t foresee any changes to the inclusivity of the Cilium community.
As mentioned, some of the Cilium features would require eBPF skills, which remain scarce. But many of the contributions are in the more approachable Golang programming language or even simple documentation enhancements that don’t require any programming skills.
Q: How would you explain Cilium’s journey in recent years from being the CNI for K8s, to evolving into other areas like run-time security and other advanced networking use cases? What is it about Cilium that makes it such a dynamic project for addressing other parts of the cloud-native stack beyond that CNI use case?
A: I observed in my recent networking predictions that many of the sessions at KubeCon in Chicago were actually about use cases outside Kubernetes. As much as Kubernetes is popular, it’s not quite ubiquitous yet. There are millions of workloads running on bare-metal and virtual machines and I don’t expect them all to be migrated to Kubernetes, if ever.
Therefore we’re seeing a new category emerging: heterogeneous networking - the ability to connect, secure, and observe workloads regardless of their form (VM, bare metal, containers, even serverless).
Because Cilium can be deployed pretty much anywhere with a Linux kernel and because eBPF lets us innovate rapidly, it’s going to be the perfect vehicle to connect and secure these workloads, regardless of their nature.
Q: Who are some of the most notable users and maintainers in the Cilium project?
A: We’ve already talked about how most cloud providers use Cilium by default. There are probably thousands of notable companies that don’t know they use Cilium! Cilium has been adopted across most industries - Bell Canada for telco, New York Times and Sky for media, Bloomberg in financial, and e-commerce like Trip.com.
If I had to pick up my favorite Cilium user story, it would have been Seznam’s; their performance test results with Cilium compared to using other solutions were so off the charts they thought they had made an error in their measurements.
Q: As we approach KubeCon EU, what are some of the big theme areas where you see the Cilium community really excited? Anything you can preview for DZone readers about Cilium-related topics that might be on display at the event in Paris this month?
A: I am really looking forward to KubeCon Paris and being back in my hometown. I am co-presenting an introduction to Cilium — one of 27 Cilium sessions throughout the week. I am particularly looking forward to the user stories and hearing some of the creative real-world use cases that Cilium underpins.
But I think the big theme this year for most KubeCon attendees is simplicity. I could be wrong but I don’t think most KubeCon attendees are necessarily looking for more products and more tools to manage.
I suspect many operators of Kubernetes clusters are actually dealing with tool fatigue. Over the past few years, many one-dimensional solutions have been created to address Kubernetes networking requirements — how do I encrypt traffic in my cluster? How do I secure traffic? How do I load balance traffic? How do I connect microservices together? How do I route ingress traffic into the cluster?
What we ended up with was a mishmash of niche projects and the operational headache of managing them all.
And that’s what I think the Cilium community is excited about: how can we make Cilium the platform for most if not all the networking needs you may have in Kubernetes?
Q: Cilium sub-project Tetragon is exposing a lot of cool new use cases in run-time security and compliance. What is Tetragon, and how has it evolved since its early release versions? How would you describe the evolution of runtime security as a discipline within cloud-native?
A: Tetragon was initially part of Isovalent’s enterprise offering of Cilium. It was eventually open-sourced, alongside Cilium at the CNCF. Tetragon is a Kubernetes native security observability and runtime enforcement tool.
If you look at the runtime security landscape, you’d see that most tools are great at collecting data (usually, at a performance cost) but poor at giving you relevant insights. Tools would report network flows and IP addresses — fine in the traditional networking space but irrelevant in cloud-native environments, where you have ephemeral workloads and where IPs change constantly.
Where Tetragon is different is that it can tie a network flow to a Kubernetes object — label, pod name, namespace — down to the process that generates the network traffic.
What’s also impressive about Tetragon is its overhead — there’s barely any. Again, this is where eBPF shines. The collection and filtering of events is done in the kernel and only transfers events of interest into user space, while, in other tools, a large amount of CPU cycles are spent in moving events from kernel space to user space. This means that Tetragon’s footprint is pretty much invisible.
The earlier versions of Tetragon weren’t necessarily easy to consume unless you had deep Linux skills but that’s changed significantly in the past few releases. It’s now become very simple to deploy — in a couple of minutes, you can get a rich data view of what’s happening inside your cluster.
Opinions expressed by DZone contributors are their own.
Comments