Hitchhiker's Guide to the Ward by a Codebreaker: Tuning and optimization of NGINX for 2 Million HTTP concurrent connections

As heard from the news about the crash of Click Frenzy website, they were going to deal with 1 million connections. But it turned out to be more than 2 million connections.

While talking about a reliable IT infrastructure to support this level of user group at peak hours, does anyone mention about NGINX acting as the front-end server to deal with that?

An article from Taobao.com mentioned about an experiment using NGINX to simulate a condition of 2 million concurrent connections to a single server. I was surprised that people out there shared it on Twitter even it wasn't posted in plain English. Here's the translation:

Tuning and optimization of NGINX for 2 Million concurrent connections

For the server performance, one of the vital indicators is the maximum number of queries per second, i.e., qps. There are exceptions for certain types of applications that we do care more about the maximum number of concurrent connections rather than qps, although we still consider qps as one of the system performance indicators. The comet server for Twitter belongs to this kind of species. Other examples like online chat room and instant messaging applications are also similar in nature. For the introduction of Comet-typed applications, please refer to the previous posts. For this kind of system, there would be a lot of messages produced and delivered to the clients. Those client connections are on hold even during the idle time. When a huge number of clients connect to the system, a long queue of concurrent connections would be made and held by the system.

First of all, we need to analyse the resources consumptions for this kind of service. These mainly involve CPU usage, network bandwidth and memory. To optimize the system, we need to find the whereabouts of the bottleneck. Among those concurrent connections, part of them may not be sending data at all time and be considered as idle connections occasionally. These connections at idle state don’t actually exhaust the CPU or network resources, but barely occupy some spaces in memory.

Based on the assumptions above, the system can support a much higher number of concurrent connections desired, with adequate amount of system memory. Could this happen in the real world? If yes, this could also be a challenge to the CPU core for supporting such a huge client group.

To start an experiment for the theory above, we need to have a running server and a huge number of clients as well. Also, server program and client program are required to finish the tasks.

Here’s the scenario:

Each client initiates a connection and sends out a request to the server.
The connection is on hold at the server side with no actual response.
This state is maintained until a maximum number of concurrent connections are reached, i.e., 2 million concurrent connections.

1. Preparation of Server-side

As per the assumption above, we need to have a server with large amount of memory for the deployment of Comet application by using NGINX. Here’s the specification of the server:

Summary: Dell R710, 2 x Xeon E5520 2.27GHz, 23.5GB / 24GB 1333MHz
System: Dell PowerEdge R710 (Dell 0VWN1R)
Processors: 2 x Xeon E5520 2.27GHz 5860MHz FSB (16 cores)
Memory: 23.5GB / 24GB 1333MHz == 6 x 4GB, 12 x empty
Disk-Control: megaraid_sas0: Dell/LSILogic PERC 6/i, Package 6.2.0-0013, FW 1.22.02-0612,
Network: eth0 (bnx2):Broadcom NetXtreme II BCM5709 Gigabit Ethernet,1000Mb/s
OS: RHEL Server 5.4 (Tikanga), Linux 2.6.18-164.el5 x86_64, 64-bit

The program at the server-side is so simple. An NGINX based comet module can be written to achieve the purposes of accepting user request and putting the connection on hold with no response. Apart from this, Status module in NGINX can be used to monitor the maximum number of concurrent connection at real time.

Let’s tweak some system parameters on the server, within the file /etc/sysctl.conf:

net.core.somaxconn = 2048

net.core.rmem_default = 262144

net.core.wmem_default = 262144

net.core.rmem_max = 16777216

net.core.wmem_max = 16777216

net.ipv4.tcp_rmem = 4096 4096 16777216

net.ipv4.tcp_wmem = 4096 4096 16777216

net.ipv4.tcp_mem = 786432 2097152 3145728

net.ipv4.tcp_max_syn_backlog = 16384

net.core.netdev_max_backlog = 20000

net.ipv4.tcp_fin_timeout = 15

net.ipv4.tcp_max_syn_backlog = 16384

net.ipv4.tcp_tw_reuse = 1

net.ipv4.tcp_tw_recycle = 1

net.ipv4.tcp_max_orphans = 131072

Then, issue the following command to make the new settings effective:

/sbin/sysctl -p

Here are a couple of things to notice:

Net.ipv4.tcp_rmem: this is used to allocate the size of READ buffer. The first one is for the minimum value, the second one is for the default value and the third one is for maximum value. To keep the memory consumption to minimum for each socket, the minimum value is set to 4096.

Net.ipv4.tcp_wmem: this is used to allocate the size of WRITE buffer. Both READ and WRITE buffer can actually affect the memory consumption of the socket within the core system.

Net.ipv4.tcp_mem is used to allocate the memory size for TCP, in terms of pages, not words. When the memory size exceeds the second parameter as defined, TCP will enter the pressure mode. At this time, TCP will try to stabilize its memory usage. In case, the memory size is reduced less than the first parameter, TCP will leave the pressure mode. However, if the memory size exceeds the third parameter, TCP will refuse to allocate extra socket for the other connections. At this moment, a whole list of log entries like “TCP: too many of orphaned sockets” will be shown by using dmesg command at the server-side.

Also, the parameters for net.ipv4.tcp_max_orphans should be set to allow a maximum number of sockets being held by no process. This should be considered carefully especially when we need to create a huge number of connections.

Apart from this, the server needs to open a certain amount of files descriptors, i.e., 2 million. There is a problem when setting a large size for file descriptors. But, no worry. Let’s talk about this later.

2. Preparation of client side

In this scenario we need to initiate a large amount of client connections. However, the number of local ports on the computer is limited. With the port range between 0 and 65535, there is a reserved range between 0 and 1023. In other words, only 64511 ports within the port range between 1024 and 65534 can be allocated. To achieve 2 million concurrent connections, 34 client computers will be required.

Of course, we can use Virtual IP to realize this number of clients. For each Virtual IP, around 64500 ports can used for binding whereas 34 Virtual IP can get things done. Fortunately, we have the company resources to carry out this experiment by using physical machines.

The default parameters set for the auto allocation of the ports is limited, i.e., from 32768 to 61000, so we need to modify the parameter on the client computer, within the file /etc/sysctl.conf:

net.ipv4.ip_local_port_range = 1024 65535

Followed by a line of commands like this to make it effective:

/sbin/sysctl -p

A program at the client side is writtern based on libevent module which continuously initiate new requests for connection.

3. Adjustment of File Descriptors

Due to a large number of socket required by Client-side and Server-side, we need to adjust the maximum value of File Descriptor.

At Client-side, we need to create more than 60,000 sockets, it’s fine to set the parameter to 100,000.

Within the file /etc/security/limits.conf, please add these two lines:

admin soft nofile 100000

admin hard nofile 100000

At Server-side, we need more than 2,000,000 sockets. That would be a problem if we set it to something like, “nofile 2000000”. In this case, the server will no longer be able to be accessed. After several attempts, it is discovered that the maximum value can only set to 1 million. After checking the source code of the kernel v2.6.25, there is a global defined value for this as 1024*1024, i.e., almost 1 million. Anyway, Linux kernel version 2.6.25 or later can be tweaked through the settings in /proc/sys/fs/nr_open. So, I can’t wait to upgrade the kernel to 2.6.32.

For the blog post about “ulimit”, please visit:

http://blog.yufeng.info/archives/1380

After the kernel upgrade, we can optimize it by the following command:

sudo bash -c 'echo 2000000 > /proc/sys/fs/nr_open'

Now, we can modile the nofile parameters again within the file /etc/security/limits.conf:

admin soft nofile 2000000

admin hard nofile 2000000

4. Final test

Throughout the test, “dmesg” console shows continuing messages about the newly optimized parameters for allocating server ports according to the settings in /sbin/sysctl. At the end, we have finished a test for 2 million concurrent connections.

To minimize the memory consumption for NGINX, the default value of “request_pool_size” is changed from 4k to 1k. Also, the default values of net.ipv4_wmem and net.ipv4.tcp_rmem have been changed to 4k.

At the time of 2 million connections, the following data has been collected via NGINX:

At the same time, the condition of the system cache:

5. Conclusion

Normally, parameters like “request_pool_size” need to be adjusted during the real time. The default sizes of “net.ipv4.tcp_rmem” and “net.ipv4.tcp_wmem” should also be changed as well.

Hitchhiker's Guide to the Ward by a Codebreaker

Wednesday, November 21, 2012

Tuning and optimization of NGINX for 2 Million HTTP concurrent connections

No comments:

Post a Comment

Byron Bay

About Me