extended Berkeley Packet Filter (eBPF) for Cloud Computing
Explore eBPF, a technology that executes codes in the Linux kernel and is necessary for K8s network observability/monitoring, auditing, and traffic routing.
Join the DZone community and get the full member experience.
Join For FreeeBPF, or extended Berkeley Packet Filter, is a revolutionary technology with origins in the Linux kernel that can run sandboxed programs in a privileged context such as the operating system kernel.
eBPF is increasingly being integrated into Kubernetes for various purposes, including network observability, security, and performance monitoring.
With eBPF, Kubernetes users can gain deep insights into network traffic, enforce security policies, and optimize resource utilization within their clusters. It offers a powerful toolset for managing and troubleshooting Kubernetes environments.
In Kubernetes clusters, monitoring the various containers and routing traffic based on the availability of resources, is necessary for the applications to function efficiently eBPF enables this.
What Are Kubernetes Clusters?
Kubernetes clusters contain one master node and any number of worker nodes and can be either physical or virtual machines. The master node is responsible for controlling the state of the cluster and is the origin of task assignments. Worker nodes manage the components that run the applications. Namespaces allow operators to organize multiple clusters into one physical cluster and divide resources amongst different teams.
Components of Kubernetes Clusters
- Scheduler: Assigns containers under defined resource requirements and metrics; When pods have no assigned node, it autonomously selects one for them to run on.
- API server: Exposes a REST interface to Kubernetes resources, essentially acting as the front end of the Kubernetes control plane
- Kubelet: Ensures that containers are fully operational within a given pod
- Kube-proxy: Maintains all network rules across nodes and manages network connectivity across every node in a cluster
- Controller manager: Executes controller processes and ensures consistency between the desired state and the actual state; It manages all node controllers, replication controllers, and endpoint controllers.
etcd
:etcd
is open source and used as a distributed key-value store used to hold and manage critical information for distributed systems. etcd manages the configuration data, state data, and metadata for Kubernetes.
What Are the Advantages of eBPF?
Using eBPF for Kubernetes service has numerous advantages that ensure that the processes take place in an optimal way. These benefits include:
Convenience
One doesn’t have to create kernel modules for performing the Kubernetes operations mentioned. With the way eBPF functions, one just has to create and manage the sandbox programs, which makes it much more convenient and simple.
Singular Framework
The eBPF acts as a single structure/platform/dashboard for Kubernetes-oriented operations. Admins can essentially use this to get insight into details such as which containers are being used, conduct packet traffic controls, execute auditing commands, and more.
Security
eBPF is more secure than running a kernel module in privileged processor mode, which could be potentially exploited by malicious code to cause a denial of service or other types of attacks. eBPF can also be utilized within the Security Profiles Operator, to ensure consistent scalable security for each container regardless of the size of the rollout.
Troubleshooting in Real-Time
eBPF can also be used as a debugger. However, while carrying out this process, it doesn’t have to stop any running program. Instead, it will troubleshoot without interrupting the process which would result in lesser downtime.
While these are a few pros of using eBPF, there are others including rich programmability, high speed, and efficient performance.
Before I go further, let's see the scenarios where eBPF can be used.
Scenarios Where eBPF Is Used
Kernel Observability
There are numerous cloud monitoring tools that can be used to get real-time insights into the K8 containers 24×7. However, there can be issues such as request latency, so to prevent those complications, eBPF in the kernel layer is used. As mentioned previously, it is pretty fast and can function quite efficiently.
Routing Network Traffic
Usually, packets traveling in a network are only cognizant of leaving from point A to reach point B. However, the routes or paths they use may not be the most optimal. With eBPF, the packets gain awareness of the shortest, fastest, and essentially best paths to travel in, reducing the overhead and increasing efficiency.
Tracing Programs
While eBPF is used for monitoring operations running in Kubernetes containers, it is also necessary to keep track of the programs that enable them. After all, any defects in them can result in a defect in the monitoring operation.
Tracking TCP Connections
The Weave Scope tool is used for giving periodic reports on the container-based system and its performance. While most of the operations are carried out by the tool itself, the eBPF is leveraged for having visibility of the TCP connections such as socket events.
Pod and Container Statistics
eBPF, in general, is known to give users in-depth visibility of the K8 systems. When Linux 4.10 was launched, it came up with a hierarchical grouping system for the container and pod levels. eBPF could then provide network statistics for each of these groups and thus give complete details of the functioning of different pods and containers.
Mostly Used List of eBPF Tools
Following are some of the prominent tools that use the eBPF technologies:
Real-World Examples
Let's see some real-world examples where many successful organizations implemented eBPF:
Netflix: Observability
Netflix has developed a network observability sidecar called Flow Exporter that uses eBPF tracepoints to capture TCP flows in near real-time. At much less than 1% of CPU and memory on the instance, this highly performant sidecar provides flow data at scale for network insight. The cloud network infrastructure that Netflix utilizes today consists of AWS services such as VPC, DirectConnect, VPC Peering, Transit Gateways, NAT Gateways, etc., and Netflix-owned devices. Netflix software infrastructure is a large distributed ecosystem that consists of specialized functional tiers that are operated on AWS and Netflix-owned services. While Netflix strives to keep the ecosystem simple, the inherent nature of leveraging a variety of technologies will lead technologists to challenges such as:
App Dependencies and Data Flow Mappings
With the number of microservices growing by the day without understanding and having visibility into an application’s dependencies and data flows, it is difficult for both service owners and centralized teams to identify systemic issues.
Pathway Validation
Netflix's velocity of change within the production streaming and studio environment can result in the inability of services to communicate with other resources.
Service Segmentation
The ease of cloud deployments has led to the organic growth of multiple AWS accounts, deployment practices, interconnection practices, etc. Without network visibility, it’s difficult to improve reliability, security, and capacity posture.
Network Availability
The expected continued growth of our ecosystem makes it difficult to understand our network bottlenecks and the potential limits we may be reaching.
Walmart: Traffic Mirroring
The best way to succeed in a business is by providing an amazing customer experience. The quality of the overall experience is often what influences customers when they shop online. Walmart wants to have visibility into how its customers are interacting with their site.
Walmart has a few analytics solutions that can operate on the data streams and provide the needed analysis. But these solutions need the data of interest and that interest changes from time to time. There is an opportunity to save valuable time and money by automating the process of collecting this data.
Walmart uses effective ways of collecting this data of interest in the public cloud from the edge proxy servers. However, it is also a critical hop that handles all of the ingress traffic to the site and is performance-sensitive.
So, Walmart started exploring some of the commercial solutions, a few of which are listed here:
- Running a stand-alone agent that would mirror 100% of traffic on the proxy VMs: However, this would incur:
- Significant traffic expenses as Walmart would mirror 100% of data
- Managing additional licensing cost
- Overhead on the resources of the host
- Using traffic mirroring services that are offered natively by the public cloud: However, this isn’t a consistent solution as many flavors of the public cloud either do not offer this solution or do not offer the necessary capability to filter the data of interest.
Implementation of eBPF
Cilium
Cilium is an open-source project to provide networking, security, and observability for cloud-native environments such as Kubernetes clusters and other container orchestration platforms. At the foundation of Cilium is the new Linux kernel technology called eBPF, which enables the dynamic insertion of powerful security, visibility, and networking control logic into the Linux kernel. eBPF is used to provide high-performance networking, multi-cluster and multi-cloud capabilities, advanced load balancing, transparent encryption, extensive network security capabilities, transparent observability, and much more.
Cilium comprises four key components:
1. Cilium Agent
The agent, running on all cluster nodes, configures networking, load balancing, policies, and monitoring via Kubernetes or APIs that describe networking, service load-balancing, network policies, and visibility and monitoring requirements.
2. Cilium Client Command Line Tool
The client tool, bundled with the agent, inspects and manages the local agent's status, offering direct access to eBPF maps.
3. Cilium Operator
The operator centrally manages cluster tasks, handling them collectively rather than per node.
4. Cilium CNI Plugin
The CNI plugin, invoked by Kubernetes during pod scheduling or termination, interacts with the node's Cilium API to configure necessary datapaths for networking, load balancing, and network policies.
Calico
Calico Open Source is a networking and security solution for containers, virtual machines, and native host-based workloads. Calico supports a broad range of platforms, including Kubernetes, OpenShift, Docker EE, OpenStack, and bare metal services. Whether you use Calico’s eBPF data plane, Linux’s standard networking stack, or the Windows data plane, Calico delivers blazing-fast performance with true cloud-native scalability.
Calico comprises three key components:
1. Calico/Node Agent
This entity consists of three components - felix
, bird
, and confd
.
- The primary responsibility of
felix
is to program the hostiptables
and routes to provide the connectivity that you want to and from the pods on that host. bird
is an open-source BGP agent for Linux® that is used to exchange routing information between the hosts. The routes that are programmed byfelix
are picked up bybird
and distributed among the cluster hosts.confd
monitors theetcd
data store for changes to the BGP configuration, such as IP Address Management (IPAM) information and autonomous system (AS) number. It also changes thebird
configuration files and triggersbird
to reload these files on each host. Thecalico/node
agent createsveth
-pairs to connect the pod network namespace with the host's default network namespace.
2. Calico/CNI
The CNI plug-in provides the IPAM functions by provisioning IP addresses for the pods that are hosted on the nodes.
3. Calico/Kube-Controller
The calico/kube-controller
watches Kubernetes Network Policy objects and keeps the Calico data store in sync with the Kubernetes objects. The calico/node
that is running on each node uses the information in the Calico etcd
data store to program the local iptables
.
Comparison
Now we have seen Cilium and Calico both use eBPF as a foundational technology, let's have a quick comparison between Cilium and Calico:
|
Calico |
Cilium |
Technology Stack |
Calico Supports eBPF, Linux IP Tables, Windows HNS, and VPP dataplanes. |
Cilium is solely based on eBPF-based dataplane. |
Network Security |
Calico offers network security policies at both application and network levels. |
Cilium also offers network security policies at both application and network levels. |
Load Balancing & Networking |
Efficient load-balancing with eBPF dataplane for routing and overlay networks. |
Similar approach to load balancing and networking. |
Container Orchestrator Integration |
Broad integration including Kubernetes, OpenShift, Docker EE, etc. |
Cilium is mostly focused on Kubernetes and container orchestration platforms. |
Observability & Monitoring |
Extensive visibility with integration options like Prometheus, Grafana, Istio, and Jaeger. |
Uses Hubble for observability, might have limitations in data export. |
Scalability & Performance |
Highly scalable with minimal performance overhead, supports large-scale deployments. |
Scalable, but limited by identities in packet headers and eBPF map sizes. |
Encryption |
Supports WireGuard and mTLS (with Istio). |
Supports WireGuard and IPsec. |
Architecture |
Flexible architecture with multiple dataplane options. |
Single eBPF-based dataplane, focuses on security identities. |
Policy Management |
Advanced policy management with Calico API, Calicoctl, and enhanced options in Enterprise and Cloud versions. |
Basic policy management, lacks advanced lifecycle management. |
Kubernetes Platform Support |
Supports a range of platforms and maintains compatibility with Kubernetes versions. |
Primarily supports Kubernetes. |
Multi-Cluster Management |
Advanced multi-cluster management, especially in Enterprise and Cloud versions. |
Standard multi-cluster management with kubectl and Hubble. |
Cluster Mesh |
Flexible multi-cluster setup using BGP protocol. |
Supports up to 255 clusters in a cluster mesh. |
Deployment & Configuration |
Utilizes Tigera operator or Calico manifests for deployment. |
Deployment via Cilium CLI utility. |
Conclusion
In this article, we have discussed eBPF, its benefits, use cases, and eBPF implementations like Cilium and Calico. It also provides an overview and comparison between Cilium and Calico.
Opinions expressed by DZone contributors are their own.
Comments