Skip to content

A Gentle Introduction to XDP

Daniel Lavie
By Daniel Lavie
8 min read
eBPF
A Gentle Introduction to XDP

XDP, or in its full name “eXpress Data Path”, is a Linux Kernel framework introduced in version 4.8 that processes packets in the lowest possible level, thus allowing for high-performance packet processing.

Kernel

Generally, when an incoming packet arrives at the NIC (Network Interface Card), it gets transferred into a ring buffer in the Kernel called rx_ring. Then, the packet is copied to the Kernel network stack - to a struct called sk_buff - which contains the raw data along with other metadata (See here for the full struct specs). Once in the sk_buff struct, the packet is directed to the appropriate packet handler function (e.g. ip_recv for IP Packets).

Copying the packet to the sk_buff is a costly operation, and traditional packet processing solutions such as Netfilter hooks can suffer in terms of performance, especially for use cases like DDOS protection and Firewall. What if we could somehow process the packet before it gets copied to the sk_buff struct? 🤔

XDP program to the rescue

An XDP program is an eBPF-based program that is attached as a hook point in the NIC driver (there are a couple of exceptions to this, more on this later) and can decide the fate of a network packet as early as it comes in.

An XDP program operates on the rx_ring buffer before the packet gets copied to the Kernel network stack. This allows for fast packet processing and filtering with all the benefits of an eBPF program.

The following diagram illustrates the packet flow in the Kernel and you can see where the XDP program is operating at the bottom left:

Packet flow

By Jan Engelhardt - Own work, Origin SVG PNG, CC BY-SA 3.0, File:Netfilter-packet-flow.svg - Wikimedia Commons

 The mechanics of XDP programs

Before we dive in - It’s important to explicitly state the following about XDP programs:

  • An XDP program is attached to a specific NIC in a machine, meaning that if there are 2 or more NICs, the program will only see the packets coming from the specific NIC it was attached to.

  • An XDP program is visible only to ingress packets.

A basic flow of an XDP program is generally as follows:

  1. Get the raw packet as input (given by the xdp_md struct).

  2. Do some processing on the packet (gain some insights based on the data, change it, and so on).

  3. Decide what to do with the packet (pass/drop it, for example).

Let’s look at the simplest XDP program (think of it as the “hello world” program of XDP):

#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
SEC("xdp")
int xdp_hello_world(struct xdp_md *ctx)
{
return XDP_PASS;
}

Ignoring all of the eBPF mechanics (SEC, includes, etc), we can see that all this program does is return a code called XDP_PASS. In this case, step 2 is empty (no processing is done on the packet). The XDP_PASS return code lets the packet continue through the network stack as one would expect. So basically, this XDP program does nothing useful 😊.

Let’s spice things up a bit and change the previous program to get the following:

#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
SEC("xdp")
int xdp_hello_world_all_spiced_up(struct xdp_md *ctx)
{
return XDP_DROP;
}

AH-HA! We replaced the return value of the function with XDP_DROP. This means that all packet that arrives at the NIC will be dropped 😈. While it can be funny in some use cases, I wouldn’t recommend loading such a program, especially in your production environment.

The following return codes are available for an XDP program:

  • XDP_PASS, XDP_DROP - as we have already seen

  • XDP_TX - bounce the packet back to the NIC

  • XDP_ABORTED - used to indicate that the XDP program had some error during the execution (the packet will be dropped)

  • XDP_REDIRECT - indicates that the XDP program has redirected the packet to a different interface or to a eBPF map

For more information about the different return codes, check out XDP actions — Prototype Kernel 0.0.1 documentation.

There are 2 other cases for the context of where the XDP program runs:

  • Some NICs provide the ability to offload the XDP program to the NIC itself, thus making the XDP program runs very early and super fast (without bothering the CPU).

  • Some NIC drivers lack the implementation of XDP, so an XDP program cannot be loaded to that specific NIC. In this case, the XDP program can be loaded as a generic program that works at the network stack level. The downside is that the performance will be slower.

  • Check out the BCC repository for a list of drivers that support XDP.

XDP programs, as all other eBPF programs, can communicate with other eBPF programs or with the userspace through a concept called eBPF maps. For more information about eBPF maps, check out the eBPF maps — Prototype Kernel 0.0.1 documentation.

Let’s build a simple XDP program

We are going to build an XDP program and an XDP runner. This program will output a list of protocol names along with the number of packets that got captured for a specific protocol. It is based on gobpf/xdp_drop.go at master · iovisor/gobpf and it’s using the BCC framework.

You can find the full code in our eBPF training repo. It also contains the instructions on how to load and test the program as well as a docker image for easier setup.

xdp_prog.c

Let’s begin by looking at the first piece of code:

int xdp_counter(struct xdp_md *ctx)
{
void *data = (void *)(long)ctx->data;
void *data_end = (void *)(long)ctx->data_end;
...

This is the function that we are going to load as an eBPF program. data and data_end are pointers to the beginning and end of the packet’s raw memory. Note that ctx->data and ctx->data_end are of type __u32, so we have to perform the casts (most XDP programs will start like this).

We continue:

struct ethhdr *eth = data;
uint64_t network_header_offset = sizeof(*eth);
if (data + network_header_offset > data_end)
{
return RETURN_CODE;
}

ethaddr is a struct that defines an ethernet frame header, so we can easily access the fields of that frame. For our purpose - we are interested in the h_proto field, which will tell us if we need to parse the packet as an IPv4 or an IPv6 packet. We also save the offset to the end of the ethernet frame.

Next, we check that the offset added to the data pointer does not exceed the data_end pointer and if it does, we return the RETURN_CODE (which is defined to be XDP_PASS from before, remember?). If we don't do it, we'll get the following error while trying to run the program:

bpf

We get a permission denied error as the eBPF verifier refuses to load our program. The verifier, from his point of view, complains that we are trying to access a memory area that is outside of the packet. Hence, we must add these bound checks so our program can be loaded properly. This is a common pattern in XDP programs and we will see it again soon.

Let’s continue:

uint16_t h_proto = eth->h_proto;
int protocol_index;
if (h_proto == htons(ETH_P_IP))
{
protocol_index = parse_ipv4(data + network_header_offset, data_end);
}
else if (h_proto == htons(ETH_P_IPV6))
{
protocol_index = parse_ipv6(data + network_header_offset, data_end);
}
else
{
protocol_index = 0;
}

Nothing fancy here - we use the h_proto field to decide what parsing we should do, and we call a function accordingly, which returns a protocol_index value.

We move on to:

if (protocol_index == 0)
{
return RETURN_CODE;
}
long *protocol_count = protocol_counter.lookup(&protocol_index);
if (protocol_count)
{
lock_xadd(protocol_count, 1);
}
return RETURN_CODE;

We check if protocol_index is 0 and, in this case, we just return the RETURN_CODE.

Then we look up the protocol_index from the previous step in a map called protocol_counter - which maps a protocol index to a count of packets. We then increment the protocol count for the specific protocl_index we captured, and return the RETURN_CODE.

Let’s look at one of the utility functions that parse a network protocol:

static inline int parse_ipv4(void *ip_data, void *data_end)
{
struct iphdr *ip_header = ip_data;
if ((void *)&ip_header[1] > data_end)
{
return 0;
}
return ip_header->protocol;
}

We create an iphdr to easily extract the packet’s protocol. To make the eBPF verifier happy again, we check that we don’t exceed the packet memory address and return the protocol number. The parse_ipv6 function is similar but we'll use the ipv6hdr struct instead.

That is it! Our XDP program main logic is ready.

xdp_runner.go

OK, time to move to the runner. Our runner gets 2 arguments:

  1. The XDP program file path (xdp_prog.c in our case).

  2. The device name to attach the XDP program to.

    1. The easiest way will be to use the lo interface, which stands for the loopback interface, but you can also run ip link to show all available interfaces to the machine.

Here’s the first snippet from the beginning of the main function:

bpfSourceCodeFile := os.Args[1]
bpfSourceCodeContent, err := ioutil.ReadFile(bpfSourceCodeFile)
if err != nil {
fmt.Fprintf(os.Stderr, "Failed to read bpf source code file %s with error: %v\n", bpfSourceCodeFile, err)
os.Exit(1)
}
module := bcc.NewModule(string(bpfSourceCodeContent), nil)
defer module.Close()
fn, err := module.Load("xdp_counter", C.BPF_PROG_TYPE_XDP, bpfDefaultLogLevel, bpfLogSize)
if err != nil {
fmt.Fprintf(os.Stderr, "Failed to load xdp program: %v\n", err)
os.Exit(1)
}
device := os.Args[2]
err = module.AttachXDP(device, fn)
if err != nil {
fmt.Fprintf(os.Stderr, "Failed to attach xdp program: %v\n", err)
os.Exit(1)
}
defer func() {
if err := module.RemoveXDP(device); err != nil {
fmt.Fprintf(os.Stderr, "Failed to remove XDP from %s: %v\n", device, err)
}
}()

All this code does is use the BCC framework to load the XDP program to memory, attach it to the kernel, and set a callback function to remove it once we’re done. Pretty straightforward.

Next:

protocolCounter := bcc.NewTable(module.TableId("protocol_counter"), module)
<-sig
fmt.Printf("\n{IP protocol}: {total number of packets}\n")
for it := protocolCounter.Iter(); it.Next(); {
key := protocols[bcc.GetHostByteOrder().Uint32(it.Key())]
if key == "" {
key = "Unknown"
}
value := bcc.GetHostByteOrder().Uint64(it.Leaf())
if value > 0 {
fmt.Printf("%v: %v packets\n", key, value)
}
}

Once again, we use the BCC framework, this time to get an eBPF table called protocol_counter (we’ve seen it in the XDP program). Once the runner is signaled to stop, we iterate all protocols in the map and print their count to the user.

We encourage you to follow the README of the workshop and try to run it for yourself!

Conclusion

We have learned about and built a very simple XDP program that can run blazingly fast and with minimum overhead on the system.

The key takeaways are:

  • XDP allows creating high-performance packet processing programs powered by eBPF technology

  • Creating and loading an XDP program can be done pretty easily using existing infrastructures and solutions

  • XDP programs can have many use-cases, for example, observability tools, DDOS protection, firewalls, load-balancers

Come and join our eBPF quest!

Seekret is one of the leading IL companies in the eBPF ecosystem; come and join our

eBPF IL Community

References