Analysis of System Soft Interrupt Rising Caused by Multicast Inbound Packets

Problem background:

Recently, the bureau's streaming media server added more than a dozen inbound media streams, which caused the application process to be blocked and caused business exceptions.

      

problem analysis:

According to the previous experience in dealing with related problems, it is believed that the code stream is not standardized, which leads to the blocking of streaming media software processing. But there are other dual-node nodes, there is no code stream processing blocking problem, and combined with the subsequent analysis, the code stream problem is also ruled out.

Combined with the system monitoring log, we found that when the streaming media software is abnormal, the soft interrupt of multiple cpu cores in the system is relatively high, resulting in the ilde of the cpu core being basically 0.

  

 

 

The soft interrupt is generally related to the reception of inbound packets, so start to check the reception of inbound packets of the network card. It is further found that the number of packets in some receiving queues of the network card is 2-3 times the number of packets in other queues.

The troubleshooting direction focuses on the hash algorithm of the network card RSS.

  

 

   

We captured online multicast packets and reproduced them in the lab, but there was no soft interrupt spike. Using professional software, there is no abnormality in the media TS stream.

Later, it was found that the source and destination ports of the multicast streams we captured were the same, the source IP s were of the same encoder, and the multicast addresses were different. For this traffic model, the result of the RSS hash,

It is very likely to concentrate on some incoming packet receiving queues. Adjusted the flow model of the laboratory, and immediately reproduced the problem of high soft interrupt.

The problem is caused by the RSS hash. It seems to be a real hammer, but later I found out that I am still too young, too navie :)

      

The default RX flow hash algorithm of the network port is calculated based on the source and destination IP + port, as shown in the following figure:

      

 

Since in the traffic model, the source and destination ports are the same, first adjust the udp4 rx-flow-hash as the destination IP address. After the inbound multicast traffic comes up, the soft interrupt is still high. This result overturns the previous conclusion.

  

Since the flow of laboratory pressure is larger than the actual online, the soft interrupt is not concentrated in a few cpu cores, but almost all of them are high. So use perf top to check the system hot spots:

The function __udp4_lib_mcast_deliver is called frequently, and spin lock is also a hotspot function.

 

The streaming media server uses the 3.10.0-327.22.2.el7.x86_64 kernel version. Compared with the 3.10.0-693.21.1.el7.x86_64 kernel version, the processing of the __udp4_lib_mcast_deliver function is somewhat different.

Decisively patched the 3.10.0-327.22.2.el7.x86_64 kernel version, the test results are not very good :)

  

__udp4_lib_mcast_deliver function comparative analysis:

struct udp_table definition:

 1 /**
 2  *    struct udp_table - UDP table
 3  *
 4  *    @hash:    hash table, sockets are hashed on (local port)
 5  *    @hash2:    hash table, sockets are hashed on (local port, local address)
 6  *    @mask:    number of slots in hash tables, minus 1
 7  *    @log:    log2(number of slots in hash table)
 8  */
 9 struct udp_table {
10     struct udp_hslot    *hash;
11     struct udp_hslot    *hash2;  //key: hash table based on local port and multicast address
12     unsigned int        mask;
13     unsigned int        log;
14 };

     

3.10.0-327.22.2.el7.x86_64 kernel version:

  

 1 /*
 2  *    Multicasts and broadcasts go to each listener.
 3  *
 4  *    Note: called only from the BH handler context.
 5  */
 6 static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 7                     struct udphdr  *uh,
 8                     __be32 saddr, __be32 daddr,
 9                     struct udp_table *udptable)
10 {
11     struct sock *sk, *stack[256 / sizeof(struct sock *)];
12     struct udp_hslot *hslot = udp_hashslot(udptable, net, ntohs(uh->dest));  //Use the destination port hash of the incoming packet to find the corresponding hash slot
13     int dif;
14     unsigned int i, count = 0;
15 
16     spin_lock(&hslot->lock);
17     sk = sk_nulls_head(&hslot->head); //Get the first sock in the hash slot
18     dif = skb->dev->ifindex;
19     sk = udp_v4_mcast_next(net, sk, uh->dest, daddr, uh->source, saddr, dif);  //Find the first matching sock
20 while (sk) { // If a matching sock is found, put it into the stack array and wait for the data to be processed 21 stack[count++] = sk; 22 sk = udp_v4_mcast_next(net, sk_nulls_next(sk), uh->dest, 23 daddr, uh->source, saddr, dif); //Find the next matching sock in the linked list, that is, receive the sock of the same multicast group 24 if (unlikely(count == ARRAY_SIZE(stack))) { //If the stack array is full, refresh the stack 25 if (!sk) //possible 26 break; 27 flush_stack(stack, count, skb, ~0); //Pass data to the sock receive queue 28 count = 0; 29 } 30 } 31 /* 32 * before releasing chain lock, we must take a reference on sockets 33 */ 34 for (i = 0; i < count; i++) 35 sock_hold(stack[i]); 36 37 spin_unlock(&hslot->lock); 38 39 /* 40 * do the slow work with no lock held 41 */ 42 if (count) { 43 flush_stack(stack, count, skb, count - 1); 44 45 for (i = 0; i < count; i++) 46 sock_put(stack[i]); 47 } else { 48 kfree_skb(skb); //If there is no matching sock, release the skb 49 } 50 return 0; 51 }

 

3.10.0-693.21.1.el7.x86_64 kernel version

   

 1 /*
 2  *    Multicasts and broadcasts go to each listener.
 3  *
 4  *    Note: called only from the BH handler context.
 5  */
 6 static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 7                     struct udphdr  *uh,
 8                     __be32 saddr, __be32 daddr,
 9                     struct udp_table *udptable)
10 {
11     struct sock *sk, *stack[256 / sizeof(struct sock *)];
12     struct hlist_nulls_node *node;
13     unsigned short hnum = ntohs(uh->dest);
14     struct udp_hslot *hslot = udp_hashslot(udptable, net, hnum);  //Use the destination port hash of the incoming packet to find the corresponding hash slot
15     int dif = skb->dev->ifindex;
16     unsigned int count = 0, offset = offsetof(typeof(*sk), sk_nulls_node);
17     unsigned int hash2 = 0, hash2_any = 0, use_hash2 = (hslot->count > 10); //If the number of hash slot s is greater than 10, use hash2 in udptable, which is the destination port and multicast address of the multicast packet for hash calculation
18 
19     if (use_hash2) {
20         hash2_any = udp4_portaddr_hash(net, htonl(INADDR_ANY), hnum) & 
21                 udp_table.mask;   
22         hash2 = udp4_portaddr_hash(net, daddr, hnum) & udp_table.mask;
23 start_lookup:
24         hslot = &udp_table.hash2[hash2];  //Get the hash slot of the hash2 table
25         offset = offsetof(typeof(*sk), __sk_common.skc_portaddr_node);
26     }
27 
28     spin_lock(&hslot->lock);
29     sk_nulls_for_each_entry_offset(sk, node, &hslot->head, offset) { //Traverse the udp sock hash table based on (multicast port + multicast address) for hash calculation
30         if (__udp_is_mcast_sock(net, sk,
31                     uh->dest, daddr,
32                     uh->source, saddr,
33                     dif, hnum)) {   //If it is a matching multicast sock, put it into the stack array and wait for the received skb data to be processed
34             if (unlikely(count == ARRAY_SIZE(stack))) {
35                 flush_stack(stack, count, skb, ~0);  //Pass skb to sock receive queue
36                 count = 0;
37             }
38             stack[count++] = sk;
39             sock_hold(sk);
40         }
41     }
42 
43     spin_unlock(&hslot->lock);
44 
45     /* Also lookup *:port if we are using hash2 and haven't done so yet. */
46     if (use_hash2 && hash2 != hash2_any) {  //There may also be a sock that only binds the port, and can also receive the multicast packet
47         hash2 = hash2_any;
48         goto start_lookup;
49     }
50 
51     /*
52      * do the slow work with no lock held
53      */
54     if (count) {
55         flush_stack(stack, count, skb, count - 1);   //Pass skb to sock receive queue
56     } else {
57         kfree_skb(skb);
58     }
59     return 0;
60 }

 

analysis Summary:

From the analysis of the __udp4_lib_mcast_deliver function of the two kernels, the 3.10.0-327.22.2.el7.x86_64 kernel only traverses the port-based hash table in udptable.

If the multicast packets received by the server all have the same port, the more multicast groups the server joins, the more time it takes to traverse the port-based hash table, and the more soft interrupts it occupies.

And the 3.10.0-693.21.1.el7.x86_64 kernel version, when the port-based hash table exceeds 10 (that is, the same port has more than 10 socks), the port-based and local socks are used.

The hash2 table of the address is traversed. This reduces the time-consuming sock traversal of the same port and improves the efficiency of delivering incoming multicast packets to the socket layer; because the __udp4_lib_mcast_deliver function

It is called in the lower half of the soft interrupt, so it also reduces the occupancy of the soft interrupt.

    

 

Your support is the greatest encouragement to bloggers👍, thank you for your careful reading.

This blog is continuously updated, and you are welcome to pay attention and communicate!


The copyright of this article belongs to the author, and you are welcome to reprint it, but this statement must be retained without the author's consent, and a link to the original text is given in an obvious position on the article page, otherwise the right to pursue legal responsibility is reserved.

 

Tags: Linux kernel

Posted by nahydy on Tue, 31 May 2022 17:27:56 +0530