As heard from the news about the crash of Click Frenzy website, they were going to deal with 1 million connections. But it turned out to be more than 2 million connections.
While talking about a reliable IT infrastructure to support this level of user group at peak hours, does anyone mention about NGINX acting as the front-end server to deal with that?
An article from Taobao.com mentioned about an experiment using NGINX to simulate a condition of 2 million concurrent connections to a single server. I was surprised that people out there shared it on Twitter even it wasn't posted in plain English. Here's the translation:
Tuning and optimization of NGINX for 2 Million concurrent
connections
For the server performance, one of the vital indicators is
the maximum number of queries per second, i.e., qps. There are exceptions for
certain types of applications that we do care more about the maximum number of
concurrent connections rather than qps, although we still consider qps as one
of the system performance indicators. The comet server for Twitter belongs to
this kind of species. Other examples like online chat room and instant
messaging applications are also similar in nature. For the introduction of
Comet-typed applications, please refer to the previous posts. For this kind of
system, there would be a lot of messages produced and delivered to the clients.
Those client connections are on hold even during the idle time. When a huge
number of clients connect to the system, a long queue of concurrent connections
would be made and held by the system.
First of all, we need to analyse the resources consumptions for
this kind of service. These mainly involve CPU usage, network bandwidth and
memory. To optimize the system, we need to find the whereabouts of the bottleneck.
Among those concurrent connections, part of them may not be sending data at all
time and be considered as idle connections occasionally. These connections at
idle state don’t actually exhaust the CPU or network resources, but barely
occupy some spaces in memory.
Based on the assumptions above, the system can support a
much higher number of concurrent connections desired, with adequate amount of
system memory. Could this happen in the real world? If yes, this could also be
a challenge to the CPU core for supporting such a huge client group.
To start an experiment for the theory above, we need to have
a running server and a huge number of clients as well. Also, server program and
client program are required to finish the tasks.
Here’s the scenario:
- Each client initiates a connection and sends out a request to the server.
- The connection is on hold at the server side with no actual response.
- This state is maintained until a maximum number of concurrent connections are reached, i.e., 2 million concurrent connections.
1. Preparation of Server-side
As per the assumption above, we need to have a server with
large amount of memory for the deployment of Comet application by using NGINX.
Here’s the specification of the server:
- Summary: Dell R710, 2 x Xeon E5520 2.27GHz, 23.5GB / 24GB 1333MHz
- System: Dell PowerEdge R710 (Dell 0VWN1R)
- Processors: 2 x Xeon E5520 2.27GHz 5860MHz FSB (16 cores)
- Memory: 23.5GB / 24GB 1333MHz == 6 x 4GB, 12 x empty
- Disk-Control: megaraid_sas0: Dell/LSILogic PERC 6/i, Package 6.2.0-0013, FW 1.22.02-0612,
- Network: eth0 (bnx2):Broadcom NetXtreme II BCM5709 Gigabit Ethernet,1000Mb/s
- OS: RHEL Server 5.4 (Tikanga), Linux 2.6.18-164.el5 x86_64, 64-bit
The program at the server-side is so simple. An NGINX based
comet module can be written to achieve the purposes of accepting user request
and putting the connection on hold with no response. Apart from this, Status
module in NGINX can be used to monitor the maximum number of concurrent
connection at real time.
Let’s tweak some system parameters on the server, within the
file /etc/sysctl.conf:
net.core.somaxconn = 2048
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 4096 16777216
net.ipv4.tcp_wmem = 4096 4096 16777216
net.ipv4.tcp_mem = 786432 2097152 3145728
net.ipv4.tcp_max_syn_backlog = 16384
net.core.netdev_max_backlog = 20000
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_orphans = 131072
Then, issue the following command to make the new settings
effective:
/sbin/sysctl -p
Here are a couple of things to notice:
Net.ipv4.tcp_rmem: this is used to allocate the size of READ
buffer. The first one is for the minimum value, the second one is for the
default value and the third one is for maximum value. To keep the memory
consumption to minimum for each socket, the minimum value is set to 4096.
Net.ipv4.tcp_wmem: this is used to allocate the size of
WRITE buffer. Both READ and WRITE buffer can actually affect the memory
consumption of the socket within the core system.
Net.ipv4.tcp_mem is used to allocate the memory size for
TCP, in terms of pages, not words. When the memory size exceeds the second
parameter as defined, TCP will enter the pressure mode. At this time, TCP will
try to stabilize its memory usage. In case, the memory size is reduced less
than the first parameter, TCP will leave the pressure mode. However, if the
memory size exceeds the third parameter, TCP will refuse to allocate extra
socket for the other connections. At this moment, a whole list of log entries
like “TCP: too many of orphaned sockets” will be shown by using dmesg command
at the server-side.
Also, the parameters for net.ipv4.tcp_max_orphans should be
set to allow a maximum number of sockets being held by no process. This should
be considered carefully especially when we need to create a huge number of
connections.
Apart from this, the server needs to open a certain amount
of files descriptors, i.e., 2 million. There is a problem when setting a large
size for file descriptors. But, no worry. Let’s talk about this later.
2. Preparation of client side
In this scenario we need to initiate a large amount of
client connections. However, the number of local ports on the computer is
limited. With the port range between 0 and 65535, there is a reserved range
between 0 and 1023. In other words, only 64511 ports within the port range
between 1024 and 65534 can be allocated. To achieve 2 million concurrent
connections, 34 client computers will be required.
Of course, we can use Virtual IP to realize this number of
clients. For each Virtual IP, around 64500 ports can used for binding whereas
34 Virtual IP can get things done. Fortunately, we have the company resources
to carry out this experiment by using physical machines.
The default parameters set for the auto allocation of the
ports is limited, i.e., from 32768 to 61000, so we need to modify the parameter
on the client computer, within the file /etc/sysctl.conf:
net.ipv4.ip_local_port_range = 1024 65535
Followed by a line of commands like this to make it
effective:
/sbin/sysctl -p
A program at the client side is writtern based on libevent
module which continuously initiate new requests for connection.
3. Adjustment of File Descriptors
Due to a large number of socket required by Client-side and
Server-side, we need to adjust the maximum value of File Descriptor.
At Client-side, we need to create more than 60,000 sockets,
it’s fine to set the parameter to 100,000.
Within the file /etc/security/limits.conf, please add these
two lines:
admin soft nofile 100000
admin hard nofile 100000
At Server-side, we need more than 2,000,000 sockets. That
would be a problem if we set it to something like, “nofile 2000000”. In this
case, the server will no longer be able to be accessed. After several attempts,
it is discovered that the maximum value can only set to 1 million. After
checking the source code of the kernel v2.6.25, there is a global defined value
for this as 1024*1024, i.e., almost 1 million. Anyway, Linux kernel version
2.6.25 or later can be tweaked through the settings in /proc/sys/fs/nr_open.
So, I can’t wait to upgrade the kernel to 2.6.32.
For the blog post about “ulimit”, please visit:
After the kernel upgrade, we can optimize it by the
following command:
sudo bash -c 'echo 2000000 > /proc/sys/fs/nr_open'
Now, we can modile the nofile parameters again within the
file /etc/security/limits.conf:
admin soft nofile 2000000
admin hard nofile 2000000
4. Final test
Throughout the test, “dmesg” console shows continuing
messages about the newly optimized parameters for allocating server ports
according to the settings in /sbin/sysctl. At the end, we have finished a test
for 2 million concurrent connections.
To minimize the memory consumption for NGINX, the default
value of “request_pool_size” is changed from 4k to 1k. Also, the default values
of net.ipv4_wmem and net.ipv4.tcp_rmem have been changed to 4k.
At the time of 2 million connections, the following data has
been collected via NGINX:
At the same time, the condition of the system cache:
5. Conclusion
Normally, parameters like “request_pool_size” need to be
adjusted during the real time. The default sizes of “net.ipv4.tcp_rmem” and “net.ipv4.tcp_wmem”
should also be changed as well.