Measuring 10Gbit/sec Load Balancing Performance

This page serves several purposes:

  • It’s a 10Gbit/sec performance testing report showing the capabilities of the chosen hardware.
  • It’s an optimisation guide with instructions to follow on your hardware.
  • It’s an application example showing a generic protocol-less virtual server.

The Hardware

Testing Machines

  • Intel Core i7-7700 processor (4 cores/8 threads, 3600 MHz)
  • 32 GB RAM
  • 512 GB M.2 SSD
  • Debian 10 (server only)
  • On the BalanceNG machine: Intel X540-T2 NIC in a PCIe x16 slot
  • On the client and target machine: Asus XG-C100C NIC in a PCIe x16 slot

Choose a matching slot, 8 PCIe lanes for the Intel X540-T2 and 4 lanes for the Asus XG-C100C. Make sure the fan airflow can keep the Intel X540-T2 cool.

Examine /var/log/syslog, the available PCIe bandwidth is reported like this:

ixgbe 0000:01:00.0: Intel(R) 10 Gigabit Network Connection
ixgbe 0000:01:00.1: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8 XDP Queue count = 0
ixgbe 0000:01:00.1: 32.000 Gb/s available PCIe bandwidth (5 GT/s x8 link)

The 10Gbe Switch

The Netgear XS508M 8-Port 10G unmanaged switch was used for this testing setup.

The BalanceNG Configuration

It’s important to determine the optimal number of threads that BalanceNG starts for packet processing on the NIC. In this particular case it’s 4 threads (with 3 there isn’t enough processing power, and with 5 threads the scheduling overhead already starts to slow everything down).

Try testing with different thread numbers on your BalanceNG machine, most likely the optimal value matches the number of available cores.

//        configuration taken Sun Jan 19 21:28:59 2020
//        BalanceNG 4.100 (created 2020/01/03)
modules   vrrp,arp,ping,hc,master,slb,tnat,nat,rt
interface 1 {
          name enp1s0f0
          access raw
          threads 4
}
register  interface 1
enable    interface 1
vrrp      {
          vrid 30
          priority 200
          network 1
}
network   1 {
          addr 10.0.0.0
          mask 255.255.0.0
          real 10.0.0.130
          virt 10.0.0.131
          interface 1
}
register  network 1
enable    network 1
server    1 {
          ipaddr 10.0.0.232
          target 1
}
register  server 1
enable    server 1
target    1 {
          ipaddr 10.0.0.31
          ping 3,8
          dsr enable
}
register  target 1
enable    target 1
//        end of configuration

Testing direct communication bandwidth with iperf

root@u32:~# iperf -c 10.0.0.31 -t 600
------------------------------------------------------------
Client connecting to 10.0.0.31, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.32 port 45366 connected with 10.0.0.31 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-600.0 sec   658 GBytes  9.41 Gbits/sec
root@u32:~#

Testing bandwidth through BalanceNG with iperf

root@u32:~# iperf -c 10.0.0.232 -t 600
------------------------------------------------------------
Client connecting to 10.0.0.232, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.32 port 51608 connected with 10.0.0.232 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-600.0 sec   646 GBytes  9.24 Gbits/sec
root@u32:~#

Examining reported number of dropped packets

You may examine the reported number of dropped packets with “show interfaces” within the “bng control” CLI like this:

root@u30:/var/log# bng control
BalanceNG: connected to PID 1481
bng-MASTER# sh interfaces

interface 1 (enp1s0f0)
  link detection: TRUE
  address:
    a0:36:9f:21:4b:90
  received:
    packets 478823564
    bytes   724937307932
  sent:
    packets 478823352
    bytes   724937266856
  dumped:
    bytes   0
  packet_statistics:
    fd 15
    tp_packets 478889375
    tp_drops 65811

bng-MASTER#

The “tp_drops” value is reset with each invocation of “show interfaces”, so invoking it after the iperf test has finished provides the total numbers counted so far. Ideally, the tp_drop value is 0. Overall, a drop of 0.013% is quite acceptable with the measured TCP bandwidth of 9.24 Gbits/sec through BalanceNG (compared with 9.41 Gbits/sec directly). For this particular Core i7 machine it looks like the optimum that can be achieved.

Conclusion

  • The optimal number of BalanceNG packet processing threads is important and needs to be determined for each particular machine.
  • The Intel X540-T2 10Gbe dual NIC is our recommendation for building BalanceNG 10Gbe load balancers.
  • On a Intel Code i7-7700 iperf/TCP over IPv4 measures a 10 minute bandwidth of 9.24 Gbits/sec.