TCP Congestion Control

ยท

5 min read

TCP Congestion Control

Background ๐Ÿ“š

What is TCP?

TCP (Transmission Control Protocol) is a standard that defines how to establish and maintain a network conversation by which applications can exchange data. TCP is a connection-oriented protocol, which means a connection is established and held until the applications at each end have finished exchanging messages.

3 Way Handshake

To establish a connection between client and server a 3-way communication happens prior to the exchange of data. The client chooses an initial sequence number, set in the first SYN packet. The server also determines its own initial sequence number, set in the SYN/ACK packet. Each side acknowledges the other's sequence number by incrementing it; this is the acknowledgment number. Using sequence and acknowledgment numbers allows both sides to detect missing or out-of-order segments. Once a connection is established, ACKs typically follow for each segment.

Congestion Collapse ๐Ÿฅด

There can be a scenario where a host is not receiving ACKs within the expected time. In this scenario, the host will begin to introduce more and more copies of the same datagram into the net. The network is now in serious trouble ๐Ÿ˜ฎโ€๐Ÿ’จ. Eventually, all available buffers will be exhausted and this is a Congestion Collapse. In simple terms, more and more packets will start to drop causing trouble not just for the receiver and sender but also for the underlying network.

This became a major problem in the 1980s and TCP protocol was enhanced to deal with these scenarios.

Flow Control ๐Ÿ„๐Ÿปโ€โ™‚๏ธ

Flow Control is a mechanism to prevent the sender from overwhelming the receiver with data it may not be able to handle. To prevent this each side of the TCP connection publish something called receive window. This implies the size of available buffer space to hold the incoming data. When the connection is first established, both sides initiate their receive window(rwnd) values by using their system default settings.

If, for any reason, one of the sides is not able to keep up, then it can advertise a smaller window to the sender. If the window reaches zero, then it is treated as a signal that no more data should be sent until the existing data in the buffer has been cleared by the application layer.

This value is dynamic and can be updated with each ACK message exchange between sender and receiver.

This way Flow Control makes sure that both sides of a TCP connection do not overwhelm each other.

Problem: Flow control prevented the sender from overwhelming the receiver, but there is no mechanism to prevent either side from overwhelming the underlying network: neither the sender nor the receiver knows the available bandwidth at the beginning of a new connection, and hence need a mechanism to estimate it and also to adapt their speeds to the continuously changing conditions within the network.

Slow Start ๐Ÿ‘จโ€๐Ÿฆฏ

The only way to estimate the available capacity between the client and the server is to measure it by exchanging data, and this is precisely what slow-start is designed to do. To start, the server initializes a new congestion window (cwnd) variable per TCP connection and sets its initial value to a conservative value.

Sender-side limit on the amount of data the sender can have in flight before receiving an acknowledgment (ACK) from the client. The cwnd variable is not advertised or exchanged between the sender and receiver โ€” in this case, it will be a private variable maintained by the server.

In simple terms, a congestion window is putting a limit on the amount of data in flight(data that is not acknowledged yet).

At present congestion window size starts with 10 segments and with each ACK received it is increased by 1. As a result, the time required to reach a throughput target is a function of both roundtrip time between the client and server and the initial congestion window size.

As we start slow with a limit of 10 segments and then increase gradually, it takes time for the connection to utilize maximum bandwidth and this is called the "exponential growth algorithm".

When a connection is idle for a defined period of time congestion window is reset leading to a performance penalty and this is called Slow-Start Restart. So to improve the performance of your application it is recommended to turn this off. On Linux platform, it can be checked and disabled via -

$> sysctl net.ipv4.tcp_slow_start_after_idle
$> sysctl -w net.ipv4.tcp_slow_start_after_idle=0

Congestion Avoidance ๐Ÿช

It is important to recognize that TCP is specifically designed to use packet loss as a feedback mechanism to help regulate its performance. In other words, it is not a question of if, but rather of when the packet loss will occur. Slow-start initializes the connection with a conservative window and, for every roundtrip, doubles the amount of data in flight until it exceeds the receiverโ€™s flow-control window, a system-configured congestion threshold window, or until a packet is lost, at which point the congestion avoidance algorithm takes over.

The implicit assumption in congestion avoidance is that packet loss is indicative of network congestion: somewhere along the path we have encountered a congested link or a router, which was forced to drop the packet, and hence we need to adjust our window to avoid inducing more packet loss to avoid overwhelming the network.

Once the congestion window is reset, congestion avoidance specifies its own algorithms for how to grow the window to minimize further loss. At a certain point, another packet loss event will occur, and the process will repeat once over. If you have ever looked at a throughput trace of a TCP connection and observed a sawtooth pattern within it, now you know why it looks as such: it is the congestion control and avoidance algorithms adjusting the congestion window size to account for packet loss in the network.

If you are here...blah blah blah....some topics for further study :

Bandwidth Delay Product
Head-of-Line Blocking
Proportional Rate Reduction For TCP
Receive Window Scaling
ย