on 2010 Apr 12 11:55 AM
From: iLLer
Newsgroups: sybase.public.sqlanywhere.web
Subject: ASA9 http listener, FIN_WAIT2 socket state
Date: 12 Apr 2010 01:47:12 -0700
The problem looks, as accumulation of sockets in the system, being in state FIN_WAIT2. On a server (RedHat5EL) works ASA9.0.2 (86_x64) with http listener on 80 port. I have seized the moment in which occurs FIN_WAIT2 state a socket. And also an example of normal work. In an attachment tcp dump.
In files the X.X.X.X is the first client, Y.Y.Y.Y is the second client, and S.S.S.S is a server. Unique difference which I have found, this absence of R packet in the first case. It is very probable, what the packet was lost because of the heavy loaded channel, but it after all not an occasion to leave a socket in a half-closed state (FIN_WAIT2)?
In what the reason and how it to overcome?
16:52:06.135974 IP (tos 0x20, ttl 116, id 39126, offset 0, flags [DF], proto: TCP (6), length: 48) X.X.X.X.13280 > S.S.S.S.80: S, cksum 0xd63c (correct), 4233340922:4233340922(0) win 16384 16:52:06.135980 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto: TCP (6), length: 48) S.S.S.S.80 > X.X.X.X.13280: S, cksum 0x3e97 (correct), 2809928008:2809928008(0) ack 4233340923 win 5840 16:52:06.149353 IP (tos 0x20, ttl 116, id 39127, offset 0, flags [DF], proto: TCP (6), length: 40) X.X.X.X.13280 > S.S.S.S.80: ., cksum 0x3dbb (correct), ack 1 win 17520 16:52:06.156742 IP (tos 0x20, ttl 116, id 39128, offset 0, flags [DF], proto: TCP (6), length: 560) X.X.X.X.13280 > S.S.S.S.80: P 1:521(520) ack 1 win 17520 16:52:06.156746 IP (tos 0x0, ttl 64, id 54242, offset 0, flags [DF], proto: TCP (6), length: 40) S.S.S.S.80 > X.X.X.X.13280: ., cksum 0x6703 (correct), ack 521 win 6432 16:52:06.157288 IP (tos 0x0, ttl 64, id 54243, offset 0, flags [DF], proto: TCP (6), length: 238) S.S.S.S.80 > X.X.X.X.13280: P 1:199(198) ack 521 win 6432 16:52:06.157301 IP (tos 0x0, ttl 64, id 54244, offset 0, flags [DF], proto: TCP (6), length: 1500) S.S.S.S.80 > X.X.X.X.13280: . 199:1659(1460) ack 521 win 6432 16:52:06.157358 IP (tos 0x0, ttl 64, id 54245, offset 0, flags [DF], proto: TCP (6), length: 1500) S.S.S.S.80 > X.X.X.X.13280: . 1659:3119(1460) ack 521 win 6432 16:52:06.157362 IP (tos 0x0, ttl 64, id 54246, offset 0, flags [DF], proto: TCP (6), length: 1500) S.S.S.S.80 > X.X.X.X.13280: . 3119:4579(1460) ack 521 win 6432 16:52:06.157365 IP (tos 0x0, ttl 64, id 54247, offset 0, flags [DF], proto: TCP (6), length: 1500) S.S.S.S.80 > X.X.X.X.13280: . 4579:6039(1460) ack 521 win 6432 16:52:06.157367 IP (tos 0x0, ttl 64, id 54248, offset 0, flags [DF], proto: TCP (6), length: 249) S.S.S.S.80 > X.X.X.X.13280: FP 6039:6248(209) ack 521 win 6432 16:52:06.174613 IP (tos 0x20, ttl 116, id 39129, offset 0, flags [DF], proto: TCP (6), length: 40) X.X.X.X.13280 > S.S.S.S.80: ., cksum 0x3539 (correct), ack 1659 win 17520 16:52:06.177759 IP (tos 0x20, ttl 116, id 39130, offset 0, flags [DF], proto: TCP (6), length: 40) X.X.X.X.13280 > S.S.S.S.80: ., cksum 0x29d1 (correct), ack 4579 win 17520 16:52:06.181102 IP (tos 0x20, ttl 116, id 39131, offset 0, flags [DF], proto: TCP (6), length: 40) X.X.X.X.13280 > S.S.S.S.80: ., cksum 0x234b (correct), ack 6249 win 17520
Request clarification before answering.
According to RFC 793 the FIN-WAIT-2 means that the local socket has been closed and the system is waiting for the remote system to also closed the socket.
So you are correct, if the remote FIN packet is being lost (for any reason), the local socket will remain in the FIN-WAIT-2 state until the socket time-out period expires, at which point the local system will delete the socket.
ASA 9 only supports HTTP/1.0 and therefore closes the HTTP socket after each request is processed. For a system that is seeing a heavy HTTP request load and uses a network that is dropping packets, this could cause the system to get overloaded and overwhelm the socket pool.
There are three possible solutions:
Determine the cause of the dropped packets and fix that problem
Reduce the TCP/IP socket timeout amount so that the local sockets that are in the CLOSE-WAIT or FIN-WAIT-2 state expire faster. How this is done will be OS dependent. For Windows, I found this Microsoft article that talks about how to change the TcpTimedWaitDelay. Similar articles can be found for other OSes by googling "change tcp/ip socket timeout". Note however that twiddling with OS parameters should be done with care and well tested before putting into a production system.
Upgrade to SQL Anywhere 10, 11, or 12 (currently in Beta) - these versions support HTTP/1.1 pipelining of requests and hence reduces the number of sockets that are used / closed.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Many thanks for answers!
Certainly, tcp/ip layer OS should delete timed out sockets. But for any reasons it does not occur. Sockets in such state hang minutes, hours and in the days that leads to limit and stop ASA http listener. I began to understand with this problem and have compared behaviour a web-server Apache and web-server ASA. Here it! After ostensibly closing of a socket on it there is an open file descriptor from process because of what ASA sockets with FIN_WAIT2 state not dropping. Sockets of a Apache have no references from working processes, they remain orphans and in 60 seconds disappear. Here a output "netstat-aenpt | grep FIN_WAIT2":
tcp 0 0 S.S.S.S:20080 x.x.177.40:2713 FIN_WAIT2 0 7501069 30545/dbsrv9
tcp 0 0 S.S.S.S:20080 x.x.177.40:3243 FIN_WAIT2 0 7502643 30545/dbsrv9
tcp 0 0 S.S.S.S:20080 x.x.177.40:4779 FIN_WAIT2 0 7505568 30545/dbsrv9
tcp 0 0 S.S.S.S:20080 x.x.177.40:3515 FIN_WAIT2 0 7502854 30545/dbsrv9
tcp 0 0 S.S.S.S:20080 x.x.177.40:2923 FIN_WAIT2 0 7502353 30545/dbsrv9
tcp 0 0 S.S.S.S:20080 x.x.177.40:2919 FIN_WAIT2 0 7502351 30545/dbsrv9
tcp 0 0 S.S.S.S:20080 x.x.113.210:39772 FIN_WAIT2 0 7927402 30545/dbsrv9
tcp 0 0 S.S.S.S:20080 x.x.113.210:39774 FIN_WAIT2 0 7927403 30545/dbsrv9
tcp 0 0 S.S.S.S:20080 x.x.177.40:1836 FIN_WAIT2 0 7488153 30545/dbsrv9
tcp 0 0 ::ffff:S.S.S.S:80 ::ffff:x.x.71.168:32895 FIN_WAIT2 0 0 -
tcp 0 0 ::ffff:S.S.S.S:80 ::ffff:x.x.17.232:49585 FIN_WAIT2 48 14143506 15723/httpd
tcp 0 0 ::ffff:S.S.S.S:80 ::ffff:x.x.240.180:50752 FIN_WAIT2 0 0 -
tcp 0 0 ::ffff:S.S.S.S:80 ::ffff:x.x.229.249:53128 FIN_WAIT2 0 0 -
tcp 0 0 ::ffff:S.S.S.S:80 ::ffff:x.x.110.86:4359 FIN_WAIT2 0 0 -
tcp 0 0 ::ffff:S.S.S.S:80 ::ffff:x.x.228.36:2125 FIN_WAIT2 0 0 -
tcp 0 0 ::ffff:S.S.S.S:80 ::ffff:x.x.97.137:3776 FIN_WAIT2 48 14143507 17424/httpd
tcp 0 0 ::ffff:S.S.S.S:80 ::ffff:x.x.97.137:3781 FIN_WAIT2 0 0 -
tcp 0 0 ::ffff:S.S.S.S:80 ::ffff:x.x.17.173:59643 FIN_WAIT2 0 0 -
tcp 0 0 ::ffff:S.S.S.S:80 ::ffff:x.x.162.186:53165 FIN_WAIT2 0 0 -
tcp 0 0 ::ffff:S.S.S.S:80 ::ffff:x.x.7.205:50912 FIN_WAIT2 0 0 -
tcp 0 0 ::ffff:S.S.S.S:80 ::ffff:x.x.7.205:50909 FIN_WAIT2 0 0 -
tcp 0 0 ::ffff:S.S.S.S:80 ::ffff:x.x.7.205:50910 FIN_WAIT2 0 0 -
tcp 0 0 ::ffff:S.S.S.S:80 ::ffff:x.x.7.205:50911 FIN_WAIT2 0 0 -
tcp 0 0 ::ffff:S.S.S.S:80 ::ffff:x.x.234.218:56799 FIN_WAIT2 0 0 -
Where S.S.S.S:80 - a Apache listener, S.S.S.S:20080 - a ASA http listener. Last column is the pid. Pay attention that the Apache destroys the reference to sockets with such state, and they have crossed out sections. It is literally in seconds they leave this list, and here from ASA hang...
I suspect that here takes place to be incorrect end of work of a socket on party ASA. Whether the socket shutdown before its closing is carried out? Whether buffers are released? It seems to me, there there is any hole.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
What build of 9.0.2 are you using? If you are not using the lastest EBF then I would recommend trying it because there were some problems found and fixed in 9.0.2 that could result in sockets not being closed - I don't know if these issues resulted in the sockets being put into a FIN-WAIT-2 state? (e.g. QTS 421320 "Server leaks handles after incomplete HTTP request") Note that 9.0.2 has been EOLed as of the end of January 2010.
The httpd sockets have no reference to a process above because Apache uses processes to do its multitasking. The process to which the sockets belong actually go away. That doesn't happen with dbsrv9 because we actually use threads...and we don't destroy a thread when the work has completed.
I suspect the OS closes sockets more quickly when the owning process exits. But it doesn't do that instantly either. The fact that Apache has these FIN_WAIT2 sockets sitting around for any length of time tells me that the ASA behaviour is reasonable. The OS needs to close the socket in the end.
Strange. I use model prefork. Anyway the list of processes of a server is constant. As to socket closing, I have come to a conclusion that feature of closing of a socket an operating system probably takes place. Look please it: http://www.linuxforums.org/forum/linux-kernel/150100-tcp-connections-stuck-fin_wait2-state.html and http://linux.derkeiler.com/Newsgroups/comp.os.linux.networking/2008-11/msg00281.html Very similar situations. As the decision it is offered, or recompile a kernel, or to make especial processing at close socket.
User | Count |
---|---|
96 | |
11 | |
9 | |
9 | |
7 | |
5 | |
4 | |
4 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.