eBPF for Observability: From Theory to Practice Without a PhD

2026-02-14 · eBPF, Observability, Linux

eBPF has been described as the most significant Linux kernel technology of the past decade. It has also been surrounded by so much hype that many engineers dismiss it as something only kernel developers need to understand. The reality is more practical: eBPF enables observability capabilities that were previously impossible or prohibitively expensive, and you do not need to write a single line of kernel code to benefit from it.

This article bridges the gap between eBPF theory and practical observability, focusing on what SRE and infrastructure teams can deploy today.

What eBPF Actually Does at the Kernel Level

eBPF (extended Berkeley Packet Filter) allows you to run sandboxed programs inside the Linux kernel without modifying kernel source code or loading kernel modules. Think of it as a safe, programmable extension point for the kernel.

When you load an eBPF program, several things happen:

The eBPF bytecode is submitted to the kernel via the bpf() system call.
The kernel's verifier analyzes the program to guarantee safety: no infinite loops, no out-of-bounds memory access, no unsafe pointer arithmetic. Programs that fail verification are rejected.
The verified bytecode is JIT-compiled to native machine code for near-zero overhead.
The compiled program is attached to a specific hook point in the kernel.

The key insight for observability is that eBPF programs can observe kernel and user-space events (system calls, network packets, function entries, scheduler decisions) with negligible performance impact and without requiring application instrumentation. No agents to install inside containers. No library dependencies. No application restarts.

Hook Points That Matter for Observability

eBPF programs attach to specific hook points. Understanding these is essential for knowing what you can observe.

Kprobes and Kretprobes

Kprobes allow you to attach eBPF programs to virtually any kernel function. A kprobe fires on function entry; a kretprobe fires on function return. This gives you access to function arguments and return values.

For observability, the most useful kprobe targets include:

tcp_retransmit_skb -- fires on every TCP retransmission, giving you retransmit metrics without packet capture.
tcp_v4_connect / tcp_v6_connect -- fires on outbound TCP connection attempts, revealing service-to-service communication patterns.
vfs_read / vfs_write -- fires on filesystem operations, enabling I/O latency measurement per file and per process.

The caveat with kprobes is that they attach to internal kernel functions, which may change between kernel versions. A kprobe that works on 5.15 might not exist on 6.1. For production observability, prefer tracepoints where available.

Tracepoints

Tracepoints are stable, documented hook points in the kernel. Unlike kprobes, tracepoints are part of the kernel's ABI and are safe to rely on across upgrades. Key tracepoints for SRE work:

sched:sched_process_exec -- fires when a process calls exec(), useful for security observability and detecting unexpected binary execution in containers.
net:net_dev_xmit -- fires on network packet transmission, enabling per-interface traffic accounting.
block:block_rq_complete -- fires on block I/O completion, enabling disk latency histograms without specialized storage agents.
tcp:tcp_retransmit_skb -- the tracepoint equivalent of the kprobe, preferred for stability.

XDP (eXpress Data Path)

XDP hooks run at the earliest possible point in the network stack, before the kernel allocates a socket buffer. This makes XDP programs extremely fast for packet-level operations. For observability, XDP enables wire-speed packet counting, protocol distribution analysis, and traffic classification without tcpdump overhead. XDP is also the foundation for high-performance load balancing in tools like Cilium and Katran.

Practical Tools You Can Deploy Today

Cilium Hubble

If you run Kubernetes with Cilium as the CNI, Hubble is the most mature eBPF observability tool available. Hubble provides:

L3/L4/L7 network flow visibility between pods, services, and external endpoints.
HTTP, gRPC, Kafka, and DNS protocol-aware flow logs without sidecar proxies.
Service dependency maps generated automatically from observed traffic.
Network policy verdict logging (which policies allowed or denied which flows).

Hubble collects this data using eBPF programs attached to the kernel's network stack, so it works for any application regardless of language, framework, or whether the application has been instrumented. A Go microservice and a legacy C++ daemon get the same visibility.

# Observe HTTP flows to a specific service in real-time
hubble observe --namespace production \
  --to-label app=api-gateway \
  --protocol http \
  --verdict FORWARDED

# Export flow data as Prometheus metrics
# (configured in Cilium Helm values)
hubble:
  metrics:
    enabled:
      - dns
      - drop
      - tcp
      - flow
      - icmp
      - httpV2:exemplars=true;labelsContext=source_namespace,destination_namespace

Pixie (CNCF Project)

Pixie takes a different approach: it uses eBPF to capture full application-level protocol traces (HTTP, MySQL, PostgreSQL, Cassandra, Redis, Kafka) without any application instrumentation. It reconstructs request/response pairs by observing the data passed through kernel socket functions, intercepting TLS traffic at the OpenSSL/BoringSSL boundary so it can see plaintext even in mTLS environments.

The result is application-level observability -- request latency, error rates, throughput -- derived entirely from kernel-level observation. For debugging purposes, you can inspect individual request and response payloads. This is particularly valuable for services you cannot instrument easily: third-party software, legacy applications, or sidecar-averse teams.

# PxL script to show HTTP latency by service
import px

df = px.DataFrame(table='http_events', start_time='-5m')
df.service = df.ctx['service']
df.latency_ms = df.resp_latency_ns / 1e6
df = df.groupby('service').agg(
    p50=('latency_ms', px.quantiles, 0.5),
    p99=('latency_ms', px.quantiles, 0.99),
    error_rate=('resp_status', px.fraction, px.equal, 500),
    throughput=('latency_ms', px.count),
)
px.display(df)

The limitation worth noting: Pixie works best with Go, C/C++, and Rust binaries. JVM and Python support exists but is less mature due to how these runtimes handle TLS internally.

Tetragon (Security Observability)

Tetragon, also from the Cilium project, focuses on security-relevant observability. It uses eBPF to monitor process execution, file access, network connections, and privilege changes at the kernel level. For SRE teams, Tetragon answers questions like:

Which processes are making outbound network connections from this pod?
Did any process in this container read /etc/shadow or access SSH keys?
Which binaries were executed inside this container that were not part of the original image?
Are any processes running as root that should not be?

# TracingPolicy to monitor sensitive file access
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: sensitive-file-access
spec:
  kprobes:
    - call: "security_file_open"
      syscall: false
      args:
        - index: 0
          type: "file"
      selectors:
        - matchArgs:
            - index: 0
              operator: "Prefix"
              values:
                - "/etc/shadow"
                - "/etc/passwd"
                - "/root/.ssh/"

Tetragon can also enforce policies, not just observe. A TracingPolicy can be configured to kill a process that violates a rule (for example, terminating any process that attempts to execute a binary downloaded from the internet inside a container). This blurs the line between observability and runtime security, which is exactly where SRE and security teams need to collaborate.

TCP Retransmit Metrics Without Agents

One of the most immediately useful eBPF applications for SRE teams is TCP retransmit monitoring. Retransmissions are the earliest signal of network degradation, often appearing minutes before application-level latency increases become noticeable.

Traditionally, monitoring retransmits required either parsing /proc/net/snmp counters (which are system-wide, not per-connection) or running packet captures (which are expensive and generate enormous data volumes). With eBPF, you can get per-connection, per-pod retransmit metrics with source and destination context at negligible cost.

If you are running Cilium, enabling retransmit metrics is a configuration change:

# In Cilium Helm values
hubble:
  metrics:
    enabled:
      - tcp:sourceContext=pod;destinationContext=pod

This exposes hubble_tcp_flags_total with labels for retransmissions, which you can alert on in Prometheus:

- alert: HighTCPRetransmitRate
  expr: |
    rate(hubble_tcp_flags_total{flag="SYN-ACK",
      source_namespace="production"}[5m])
    /
    rate(hubble_tcp_flags_total{flag="SYN",
      source_namespace="production"}[5m])
    > 0.01
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "TCP retransmit rate exceeds 1% for {{ $labels.source_pod }}"

For environments without Cilium, bpftrace provides ad-hoc retransmit tracing with a one-liner:

bpftrace -e 'tracepoint:tcp:tcp_retransmit_skb {
  @retransmits[ntop(args->saddr), ntop(args->daddr)] = count();
}'

Continuous Profiling with eBPF

eBPF-based continuous profilers (Parca, Pyroscope with eBPF backend, Polar Signals Cloud) capture CPU stack traces across all processes on a node by attaching to perf events. Unlike traditional profilers that require per-application instrumentation or agent injection, eBPF profilers see everything on the node with typically under 1% CPU overhead.

The practical value for SRE teams: when a service's CPU usage spikes, you can immediately view a flame graph showing where CPU time is being spent, even if the service has no profiling instrumentation. This turns "the service is slow" from a multi-hour investigation into a five-minute diagnosis.

# Deploy Parca agent via DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: parca-agent
  namespace: parca
spec:
  selector:
    matchLabels:
      app: parca-agent
  template:
    spec:
      containers:
        - name: parca-agent
          image: ghcr.io/parca-dev/parca-agent:v0.32.0
          args:
            - /bin/parca-agent
            - --store-address=parca-server.parca:7070
            - --node=$(NODE_NAME)
          securityContext:
            privileged: true
          volumeMounts:
            - name: proc
              mountPath: /proc
            - name: sys-kernel
              mountPath: /sys/kernel

We identified a 15% CPU regression in a Go service within hours of deployment using Parca -- the flame graph showed unexpected time in runtime.mallocgc due to a loop that allocated slices instead of reusing a buffer pool. Without continuous profiling, this would have gone unnoticed until capacity planning flagged the trend weeks later.

Where eBPF Observability Falls Short

eBPF is not a replacement for application-level instrumentation. It cannot observe business logic metrics (order conversion rates, feature flag usage, user journey completions). It cannot inject distributed trace context into requests. It has limited visibility into encrypted payloads when TLS is terminated at the application layer using non-standard libraries.

Think of eBPF observability as the infrastructure layer of your observability stack: it provides universal, zero-instrumentation visibility into network, system, and runtime behavior. Application metrics, structured logging, and distributed tracing remain necessary for understanding business-level behavior. The most effective observability stacks combine both: eBPF for infrastructure-level signals and OpenTelemetry for application-level signals, correlated through shared context like pod labels and trace IDs.