Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad scaling due to flooding and overhead of copying packets in limactl #58

Open
nirs opened this issue Oct 5, 2024 · 3 comments
Open

Comments

@nirs
Copy link
Member

nirs commented Oct 5, 2024

Throughput decreases and cpu usage increases significantly when adding more vms connected to the same socket_vmnet daemon.

Tested using:

  • host: running iperf3 -c ...
  • server vm: running iper3 -s
  • 1-4 additional idle vms
vms bitrate (Gbits/sec) cpu (%)
1 3.52 51.23
2 2.42 58.17
3 1.22 81.28
4 0.81 93.07

Expected behavior

  • Performance and cpu usage should remain the same when adding more idle vms
  • Packets sent to one vm should not be forwarded to other vms
  • Packets should be copied directly to vz datagram socket in socket_vmnet, bypassing limactl

Why it happens

When we have multiple vms connected to socket_vmnet:

  • every packet sent from the vmnet interface is forwarded to every vm, instead of the vm with the right mac address.
  • every packet sent from any vm is forwarded to all other vms, and vmnet inteterface, instead of one of the vm or only vmnet interface
  • when a packet is forwarded to a vm, it is copied to the vz datagram socket via a socket pair in limactl
  • packets forwarded from limactl to the vz are copied and processed in the guest, where they are dropped (since the packets are not related to the guest).

Flow when receiving a packet from vmnet with 4 vms

host iperf3 ->
  host kernel ->
    vmnet -> 
      socket_vmnet ->
        host kernel ->
          limactl ->
            host kernel ->
              vz -> 
                guest kernel ->
                  guest iperf3
        host kernel ->
          limactl ->
            host kernel ->
              vz -> 
                guest kernel (drop)
        host kernel ->
          limactl ->
            host kernel ->
              vz -> 
                guest kernel (drop)
        host kernel ->
          limactl ->
            host kernel ->
              vz -> 
                guest kernel (drop)

Flow when receiving a packet from a vm

guest iperf3 ->
  guest kernel ->
    vz ->
      host kernel ->
        limactl ->
          host kernel ->
            socket_vmnet ->
              vmnet ->
                host_kernel ->
                  host iperf3
                host kernel ->
                  limactl ->
                    host kernel ->
                      vz -> 
                        guest kernel (drop)
                host kernel ->
                  limactl ->
                    host kernel ->
                      vz -> 
                        guest kernel (drop)
                host kernel ->
                  limactl ->
                    host kernel ->
                      vz -> 
                        guest kernel (drop)

CPU usage for all vms processes

Looking at cpu usage of socket_vmnet, vm service processes, and limactl processes, we see that there is extreme cpu usage related with processing partly or completely unrelated packets:

command %cpu related
com.apple.Virtua 136.9 yes
limactl 121.4 yes
iperf3-darwin 13.7 yes
socket_vmnet 106.6 partly
kernel_task 39.1 partly
com.apple.Virtua 83.5 no
com.apple.Virtua 81.0 no
com.apple.Virtua 77.4 no
limactl 67.1 no
limactl 65.6 no
limactl 62.9 no

Total cpu usage:

work %cpu
related 272.0
partly 145.7
unrelated 437.5

Tested on M1 Pro (8 performance cores, 2 efficiency cores)

Full results

1 vm

% caffeinate -d iperf3-darwin -c 192.168.105.58 -l 1m -t 10
Connecting to host 192.168.105.58, port 5201
[  5] local 192.168.105.1 port 60990 connected to 192.168.105.58 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd          RTT
[  5]   0.00-1.00   sec   460 MBytes  3.86 Gbits/sec    0   8.00 MBytes   9ms     
[  5]   1.00-2.00   sec   421 MBytes  3.53 Gbits/sec    0   8.00 MBytes   9ms     
[  5]   2.00-3.00   sec   435 MBytes  3.65 Gbits/sec    0   8.00 MBytes   10ms     
[  5]   3.00-4.00   sec   411 MBytes  3.45 Gbits/sec    0   8.00 MBytes   14ms     
[  5]   4.00-5.00   sec   317 MBytes  2.66 Gbits/sec    0   8.00 MBytes   9ms     
[  5]   5.00-6.00   sec   430 MBytes  3.61 Gbits/sec    0   8.00 MBytes   9ms     
[  5]   6.00-7.00   sec   423 MBytes  3.55 Gbits/sec    0   8.00 MBytes   9ms     
[  5]   7.00-8.00   sec   433 MBytes  3.63 Gbits/sec    0   8.00 MBytes   10ms     
[  5]   8.00-9.00   sec   437 MBytes  3.67 Gbits/sec    0   8.00 MBytes   9ms     
[  5]   9.00-10.00  sec   430 MBytes  3.61 Gbits/sec    0   8.00 MBytes   9ms     
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  4.10 GBytes  3.52 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  4.10 GBytes  3.52 Gbits/sec                  receiver

cpu usage

CPU usage: 20.3% user, 31.19% sys, 48.77% idle 

PID    COMMAND          %CPU  #TH   
49183  com.apple.Virtua 166.3 19/3  
49173  limactl          100.0 16/2  
48954  socket_vmnet     64.4  5/1   
0      kernel_task      57.8  561/10
54694  iperf3-darwin    18.6  1/1   

2 vms

% caffeinate -d iperf3-darwin -c 192.168.105.58 -l 1m -t 10
Connecting to host 192.168.105.58, port 5201
[  5] local 192.168.105.1 port 60997 connected to 192.168.105.58 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd          RTT
[  5]   0.00-1.00   sec   269 MBytes  2.26 Gbits/sec    0   8.00 MBytes   13ms     
[  5]   1.00-2.00   sec   299 MBytes  2.51 Gbits/sec    0   8.00 MBytes   14ms     
[  5]   2.00-3.00   sec   263 MBytes  2.21 Gbits/sec    0   8.00 MBytes   15ms     
[  5]   3.00-4.00   sec   296 MBytes  2.48 Gbits/sec    0   8.00 MBytes   13ms     
[  5]   4.00-5.00   sec   298 MBytes  2.50 Gbits/sec    0   8.00 MBytes   12ms     
[  5]   5.00-6.00   sec   284 MBytes  2.38 Gbits/sec    0   8.00 MBytes   13ms     
[  5]   6.00-7.00   sec   299 MBytes  2.51 Gbits/sec    0   8.00 MBytes   14ms     
[  5]   7.00-8.00   sec   298 MBytes  2.50 Gbits/sec    0   8.00 MBytes   14ms     
[  5]   8.00-9.00   sec   285 MBytes  2.39 Gbits/sec    0   8.00 MBytes   13ms     
[  5]   9.00-10.00  sec   298 MBytes  2.50 Gbits/sec    0   8.00 MBytes   12ms     
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.82 GBytes  2.42 Gbits/sec    0             sender
[  5]   0.00-10.01  sec  2.82 GBytes  2.42 Gbits/sec                  receiver

cpu usage

CPU usage: 20.84% user, 37.32% sys, 41.83% idle 

PID    COMMAND          %CPU  #TH   
49183  com.apple.Virtua 132.9 18/2  
49173  limactl          92.2  16/3  
48954  socket_vmnet     77.0  6/1   
49905  com.apple.Virtua 74.2  18/1  
49900  limactl          57.3  16/1  
0      kernel_task      41.4  561/12
54259  iperf3-darwin    22.1  1/1   

3 vms

% caffeinate -d iperf3-darwin -c 192.168.105.58 -l 1m -t 10
Connecting to host 192.168.105.58, port 5201
[  5] local 192.168.105.1 port 61004 connected to 192.168.105.58 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd          RTT
[  5]   0.00-1.00   sec   161 MBytes  1.35 Gbits/sec    0   2.91 MBytes   21ms     
[  5]   1.00-2.00   sec   138 MBytes  1.16 Gbits/sec    0   3.05 MBytes   17ms     
[  5]   2.00-3.00   sec   143 MBytes  1.20 Gbits/sec    0   3.15 MBytes   44ms     
[  5]   3.00-4.00   sec   139 MBytes  1.17 Gbits/sec    0   3.24 MBytes   19ms     
[  5]   4.00-5.00   sec   138 MBytes  1.16 Gbits/sec    0   3.30 MBytes   25ms     
[  5]   5.00-6.00   sec   144 MBytes  1.21 Gbits/sec    0   3.34 MBytes   22ms     
[  5]   6.00-7.00   sec   154 MBytes  1.29 Gbits/sec    0   3.37 MBytes   23ms     
[  5]   7.00-8.00   sec   145 MBytes  1.21 Gbits/sec    0   3.38 MBytes   15ms     
[  5]   8.00-9.00   sec   142 MBytes  1.19 Gbits/sec    0   3.39 MBytes   17ms     
[  5]   9.00-10.00  sec   154 MBytes  1.29 Gbits/sec    0   3.39 MBytes   23ms     
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.42 GBytes  1.22 Gbits/sec    0             sender
[  5]   0.00-10.01  sec  1.42 GBytes  1.22 Gbits/sec                  receiver

cpu usage

CPU usage: 24.13% user, 57.13% sys, 18.72% idle 

PID    COMMAND          %CPU  #TH   
49183  com.apple.Virtua 145.8 18/2  
49173  limactl          120.5 15/2  
48954  socket_vmnet     99.8  7/2   
49905  com.apple.Virtua 82.9  18/1  
50380  com.apple.Virtua 82.1  18/1  
50375  limactl          63.4  16/1  
49900  limactl          61.7  16/1  
0      kernel_task      43.4  561/11
53677  iperf3-darwin    15.2  1/1   

4 vms

% caffeinate -d iperf3-darwin -c 192.168.105.58 -l 1m -t 10
Connecting to host 192.168.105.58, port 5201
[  5] local 192.168.105.1 port 61014 connected to 192.168.105.58 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd          RTT
[  5]   0.00-1.00   sec  99.8 MBytes   837 Mbits/sec    0   2.90 MBytes   26ms     
[  5]   1.00-2.00   sec  98.3 MBytes   824 Mbits/sec    0   2.53 MBytes   25ms     
[  5]   2.00-3.00   sec  98.2 MBytes   823 Mbits/sec    0   3.03 MBytes   69ms     
[  5]   3.00-4.00   sec  99.7 MBytes   836 Mbits/sec    0   3.04 MBytes   30ms     
[  5]   4.00-5.00   sec   103 MBytes   860 Mbits/sec    0   3.03 MBytes   22ms     
[  5]   5.00-6.00   sec  91.2 MBytes   765 Mbits/sec    0   3.03 MBytes   27ms     
[  5]   6.00-7.00   sec   100 MBytes   842 Mbits/sec    0   3.03 MBytes   61ms     
[  5]   7.00-8.00   sec   102 MBytes   858 Mbits/sec    0   3.04 MBytes   33ms     
[  5]   8.00-9.00   sec  98.2 MBytes   823 Mbits/sec    0   3.04 MBytes   31ms     
[  5]   9.00-10.00  sec   103 MBytes   862 Mbits/sec    0   3.04 MBytes   28ms     
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   993 MBytes   833 Mbits/sec    0             sender
[  5]   0.00-10.02  sec   991 MBytes   830 Mbits/sec                  receiver

cpu usage

CPU usage: 25.28% user, 67.77% sys, 6.93% idle 

PID    COMMAND          %CPU  #TH   
49183  com.apple.Virtua 136.9 18/2  
49173  limactl          121.4 15/2  
48954  socket_vmnet     106.6 8/1   
50380  com.apple.Virtua 83.5  18/2  
50731  com.apple.Virtua 81.0  18/1  
49905  com.apple.Virtua 77.4  18/2  
50375  limactl          67.1  16/1  
50726  limactl          65.6  16/1  
49900  limactl          62.9  16/1  
0      kernel_task      39.1  561/10
53126  iperf3-darwin    13.7  1     
@AkihiroSuda
Copy link
Member

Yes, this is a long-standing TODO

socket_vmnet/main.c

Lines 531 to 562 in 0b6aed9

// Flood the packet to other VMs in the same network too.
// (Not handled by vmnet)
// FIXME: avoid flooding
dispatch_semaphore_wait(state->sem, DISPATCH_TIME_FOREVER);
struct conn *conns = state->conns;
dispatch_semaphore_signal(state->sem);
for (struct conn *conn = conns; conn != NULL; conn = conn->next) {
if (conn->socket_fd == accept_fd)
continue;
DEBUGF("[Socket-to-Socket i=%lld] Sending from socket %d to socket %d: "
"4 + %d bytes",
i, accept_fd, conn->socket_fd, header);
struct iovec iov[2] = {
{
.iov_base = &header_be,
.iov_len = 4,
},
{
.iov_base = buf,
.iov_len = header,
},
};
ssize_t written = writev(conn->socket_fd, iov, 2);
DEBUGF("[Socket-to-Socket i=%lld] Sent from socket %d to socket %d: %ld "
"bytes (including uint32be header)",
i, accept_fd, conn->socket_fd, written);
if (written < 0) {
perror("writev");
continue;
}
}
}

@tamird
Copy link

tamird commented Nov 18, 2024

@nirs can you point to the code where the copy in limactl occurs? I don't understand why there are so many copies.

@nirs
Copy link
Member Author

nirs commented Nov 18, 2024

The pipeline

Lima:

kernel <-vmnet-> socket_vment <-unixstream-> lima <-unixgram-> vz service <-virtio-> guest

QEMU:

kernel <-vmnet-> socket_vment <-unixstream-> qemu <-virtio-> guest

Receiving a packet from a vm

This happens in the thread forwarding packets from client socket fd:

static void on_accept(struct state *state, int accept_fd, interface_ref iface) {

For each packet we read:

ssize_t received = read(accept_fd, buf, header);

We send the packet to vmnet interface (copy 1):

vmnet_return_t write_status = vmnet_write(iface, &pd, &written_count);

and all other sockets (N-1 copies):

ssize_t written = writev(conn->socket_fd, iov, 2);

Receiving packet from vmnet

This happens int the vmnet handler block, called when some packets are ready on the vmnet interface:

vmnet_interface_set_event_callback(

We read multiple packets (up to 32 packets per call):

vmnet_return_t read_status = vmnet_read(iface, pdv, &received_count);

For each packet we iterate over all connection and write the packet to the connection (N copies):

ssize_t written = writev(conn->socket_fd, iov, 2);

Additional copies in lima

Each packet read from VZ is copied to the socket_vment socket via a socketpair:
https://github.com/lima-vm/lima/blob/1f0113c2b0ecd5b21a5c84f60cb83a09ffab0dee/pkg/vz/network_darwin.go#L68

Each packet read from socket_vmnet is copied to VZ via a socketpair:
https://github.com/lima-vm/lima/blob/1f0113c2b0ecd5b21a5c84f60cb83a09ffab0dee/pkg/vz/network_darwin.go#L75

This is done for every VM using lima:shared, lima:bridged, or socket - regardless of the actual packet destination.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants