Congestion Detection Optimization

With the rapid development of key computing infrastructures, to meet the growing demand for the transmission of massive amounts of data across wide area networks, the bandwidth of data network transmission has also been continuously upgraded from 10 Gigabits to 25Gbps, 100Gbps, 200Gbps, 400Gbps and even 800Gbps. However, traditional transmission control protocols(eg TCP) experience a sharp decline in good throughput as transmission distances increase and packet loss rates rise. Therefore,design a high-throughput data transmission solution over wide area networks to improve data transmission efficiency is of significant importance. Traffic control and congestion control determine the efficiency of data transmission and are key technologies in high-throughput network transmission. Identifying congestion points in the network quickly, accurately and not costly is of great significance for traffic and congestion control mechanisms.

Traditional schemes for identifying congestion points are as follows: 1.Based on packet loss, delay, etc., require a detection time of at least one RTT (Round-Trip Time). 2.Based on active feedback from intermediate congestion nodes, the detection time can be compressed to less than one RTT (depending on the location of the congestion node), but this introduces new problems: 2.1 Feedback messages bring additional link bandwidth overhead, resulting in low bandwidth utilization, and can even cause new congestion on the back path, leading to detection time greater than one RTT. 2.2 To ensure the timely detection, congestion status report messages are sent frequently, wasting bandwidth. 2.3 Maintaining the state of traffic flows at intermediate nodes requires high-performance equipment at these nodes, affecting scalability.

This draft introduces a mechanism that, when adjacent nodes communicate frequently (with an adjustable threshold, by default, a sending interval between two consecutive service message packets of no less than 0.5 RTT), utilizes normal service traffic packets to carry congestion information with the flow. When communication is infrequent (with an adjustable threshold, by default, a sending interval between two consecutive service packets of less than 0.5 RTT), it actively generates congestion indication packets, ensuring zero bandwidth overhead during heavy load and timely perception of downstream node's congestion during light load or idle times. When utilizing normal service traffic messages to carry information, this can be accomplished by reusing certain fields in the packet header, such as the flow label of an IPv6 message; When actively generating congestion indication packets for notification, it will directly generate a packet that is recognized by the signal source, such as a RoCEv2 CNP message Define some value(eg A55A) as the congestion indication magic number when utilizing normal service traffic packets to carry congestion information. The congestion indication magic number can be transmitted using the ToS field of two consecutive IPv4 service packets or the TC bits of two consecutive IPv6 service packets. If service messages happen to transmit A5 and end without subsequent packets (within 0.5 RTT), the congested node replicates the packet header that sent the A5 magic number, constructs a payload of all 0s in a 64-byte packet, and modifies the ToS or TC field to 5A, completing the transmission of the congestion indication magic number. If the congested node does not have any service messages to send for 0.5 RTT or more, it proactively generates congestion indication packets such as CNP and sends back. The sending frequency of the two types of congestion indication methods is not within the scope of this draft , and can be based on the mechanisms of existing congestion control algorithms, such as determining the sending frequency of packets based on the degree of congestion in the queue.

TBD.