Proposal to turn on TCP_NODELAY mode as the default setting

Marc Brooker, engineer at Amazon Web Services (AWS), disassembled misconceptions associated with increasing the efficiency of small message transmission when using
Nagle's algorithm, used by default in the TCP/IP stack. The recommendations boil down to disabling the Nagle algorithm by default by setting the TCP_NODELAY option for network sockets using the setsockopt call, which has long been done in projects such as Node.js and curl.

Nagle's algorithm allows you to aggregate small messages to reduce traffic – it suspends the sending of new TCP segments until confirmation of receipt of previously sent data is received. For example, without applying aggregation, when sending 1 byte, an additional 40 bytes are sent with TCP and IP packet headers. In modern conditions, the use of the Nagle algorithm leads to a noticeable increase in delays that are unacceptable for interactive and distributed applications.

Advertisement

There are three main reasons for using the default TCP_NODELAY option, which disables Nagle's algorithm:

  • Incompatibility of Nagle's algorithm with the “delayed ACK” optimization, in which the ACK response is not sent immediately, but after receiving the response data. The problem is that in Nagle's algorithm, the arrival of an ACK packet is a signal to send aggregated data, and if the ACK packet is not received, the sending is performed when a timeout occurs. Thus, a vicious circle arises and the ACK packet as a signal does not work, since the other side does not receive the data due to its accumulation on the sender's side, and the sender does not send it before the timeout, since it does not receive the ACK packet.
  • The RFC for Nagle's algorithm was adopted in 1984 and is not designed for the parameters of modern high-speed networks and servers in data centers, which leads to responsiveness problems. The delay between sending a request and receiving a response (RTT) in modern networks is 0.5 ms + a few milliseconds when exchanging data between data centers in the same region + up to hundreds of milliseconds when sending around the world. In these milliseconds, a modern server is capable of performing a huge amount of work.
  • Modern distributed applications no longer send single bytes of data, and aggregation of small data is usually implemented at the application level. Even if the payload size is only a matter of bytes, the actual size of the information sent increases significantly after applying serialization, using JSON API wrapping, and sending using TLS encryption. Saving 40 bytes becomes less relevant.

Thanks for reading:

Advertisement