Skip to content

Performance Testing our eBPF Traffic Capturing Tool Using K6

Daniel Lavie
By Daniel Lavie
5 min read
eBPFTesting
Performance Testing our eBPF Traffic Capturing Tool Using K6

This is the second article in our eBPF testing series. Feel free to check out the first article here.

The time has finally come. We are ready to release a new version of our eBPF capturing tool. We ran all of our functional tests and we feel confident in our product. We are going to press the “release” button and celebrate with some beer afterwards.

But wait, how can we make sure that our capturing tool can handle the amount of traffic that it’s going to see in a production environment? And if can handle it, what will be our success rate for capturing traffic? What will be the amount of memory and CPU that it will consume?

Performance tests to the rescue

Performance Testing is a software testing process that is designed to evaluate how a system performs in terms of speed, response time, stability, reliability, scalability, and resource usage under a particular workload.

Types of performance tests

  • Load testing - evaluate the performance of a system under an anticipated load

  • Stress testing - assessing the limits of a system and stability under extreme conditions

    • Spike testing - a specific type of stress test that checks the system’s reaction to sudden large spikes in the load

    • Breakpoint testing - a specific type of stress test that checks the point of system failure by gradually increasing the number of requests to a point where the system becomes unstable

  • Soak/Endurance testing - evaluate the reliability and performance of a system over an extended period of time

ebpf testing graph

A nice visualization of the different test types (taken from Team Merling's blog post)

So what did we measure in the end?

  • Data integrity - the actual amount of traffic our tool captured versus the amount of traffic we used in the test

  • System metrics:

    • CPU usage

    • Memory usage

    • Cluster stability - AKA that no pods were restarted or crashed during the tests

  • Our custom metrics - our capturing tool collects and reports different metrics while it is running. After running a performance test, we use these metrics to get additional insights into the behavior of our tool under different scenarios. For example, how much traffic did we process in the Kernel but lost in the user-mode agent due to an overload?

How did we perform the test?

One of the protocols that our eBPF tool can capture is HTTP. For the sake of simplicity, we will only discuss the HTTP scenario.

The setup

the ebpf testing setup

We used GKE to install and manage our HTTP test server and our eBPF capturing tool. In practice, we have several instances of the HTTP server and the capturing tool, so when measuring the performance within a test, we should take into account the number of instances of each component (this can affect the results, so we tried to be consistent between test runs).

Tools of the trade

For this protocol, we use a tool called k6. k6 is an open-source tool by Grafana Labs that enables us to perform our performance tests without too much hassle. It does most of the heavy lifting, such as simulating VUs (Virtual Users), changing the number of requests over time, and collecting and displaying the test result and metrics in a convenient way. Moreover, they have great and easily-digestible documentation about the topic of performance testing, which can be found here.

Let’s take a look at one of our k6 scripts for a load test:

import http from "k6/http";
import { check } from "k6";
export default function() {
let response = http.post(
`http://${__ENV.ADDRESS}/customResponse`,
JSON.stringify({size:1024*(Math.floor(Math.random() * 19) + 1)}),
headers: {headers: {"Content-Type": "application/json", },
timeout: "5m"}
);
// check() returns false if any of the specified conditions fail
const checkRes = check(response, {
"status is 200": (r) => r.status === 200,
});
};

All this script does is create an HTTP post request to our test server (which contains the customResponse endpoint). One thing to note is that a request size is a random number between 1K and 19K, so the data will be more varied throughout the test.

Now let’s see how can we run this script using the k6 command.

k6 run -e ADDRESS={server_address} -u 2500 -i 100000 --rps {rps} --summary-trend-stats "min,avg,med,max,p(90),p(95),p(99)" k6_script.js

This is where things get interesting. Here are the flags we used to instruct k6:

  1. -u flag tells k6 to use 2500 Virtual users

  2. -i flag tells k6 to run the script for 100K iterations. This means that, in total, we will have 100K requests for the test server

  3. --rps flag specifies the number of requests per second that k6 is allowed to use during the test. We use different values for the same test to check how our tool is behaving in different scenarios

  4. --summary-trend-stats flag tells k6 which stats to display when the test is finished, and we chose these values as the most interesting for our use case

The aftermath

Let’s take a look at the output of such a test:

running (00m13.0s), 0000/2500 VUs, 100000 complete and 0 interrupted iterations
default[======================================] 2500 VUs 00m13.0s/10m0s 100000/100000 shared iters
✓ status is 200
checks.........................: 100.00%1000000
data_received..................: 1.0 GB 80 MB/s
data_sent......................: 16 MB 1.2 MB/s
http_req_blocked...............: min=912ns avg=426.31µs med=1.73µs max=124.69ms p(90)=2.68µs p(95)=3.65µs p(99)=13.92ms
http_req_connecting............: min=0s avg=406.32µs med=0s max=105.18ms p(90)=0s p(95)=0s p(99)=13.84ms
http_req_duration..............: min=413.7µs avg=55.04ms med=8.93ms max=1.41s p(90)=71.84ms p(95)=331.2ms p(99)=940.35ms
{ expected_response:true }...: min=413.7µs avg=55.04ms med=8.93ms max=1.41s p(90)=71.84ms p(95)=331.2ms p(99)=940.35ms
http_req_failed................: 0.00%0100000
http_req_receiving.............: min=9.07µs avg=19.61ms med=694.38µs max=1.37s p(90)=29.29ms p(95)=70.08ms p(99)=517.42ms
http_req_sending...............: min=4.12µs avg=91.81µs med=9.01µs max=72.85ms p(90)=14.05µs p(95)=19.5µs p(99)=1.14ms
http_req_tls_handshaking.......: min=0s avg=0s med=0s max=0s p(90)=0s p(95)=0s p(99)=0s
http_req_waiting...............: min=372.12µs avg=35.33ms med=5.28ms max=1.35s p(90)=48.78ms p(95)=187.75ms p(99)=650.01ms
http_reqs......................: 100000 7672.92068/s
iteration_duration.............: min=101.39ms avg=311.48ms med=289.26ms max=1.62s p(90)=336.81ms p(95)=577.67ms p(99)=1.2s
iterations.....................: 100000 7672.92068/s
vus............................: 158 min=158 max=2500
vus_max........................: 2500 min=2500 max=2500

One interesting metric for us is the http_req_duration, which is defined well by k6:

The total time for the request. It's equal to http_req_sending + http_req_waiting + http_req_receiving (i.e. how long did the remote server take to process the request and respond, without the initial DNS lookup/connection times).

To see the full list of metrics available and their explanations, check out the Metrics page on the K6 website.

In addition, we used the GKE dashboard to check that no pod was restarted, and also the memory and CPU usage during the test:

k6 metrics

We have other scripts and commands for testing different scenarios. For example, to perform an endurance test, we told k6 to send requests at a steady pace for 48 hours. This allowed us to verify that our tool can handle a load for a long period of time without any unexpected behaviors.

Performing such tests can take a large amount of time and effort. We tried to perform such tests at “strategical“ points in the tool lifecycle, e.g. before releasing a major version. We can then compare the results with the previous version and see if there were any degradation or improvements in any one of the metrics.

Conclusion

Performance tests are a vital workflow in the testing process and one that shouldn’t be dismissed, especially for production-ready eBPF tools. It helps you uncover bugs and performance issues that otherwise wouldn’t be discovered and are definitely worth the investment in the long run.