cancel
Showing results for 
Search instead for 
Did you mean: 

Connections that cannot be dropped

justin_willey
Participant
5,398

This has been reported a few times but no resolutions reported and I wondered if anyone had got to the bottom of it?

We occasionally have database connections where the client application has been disconnected but the connection is still active on the server and cannot be dropped. Attempting to drop the connection either through Sybase Central or using DROP CONNECTION has no effect (and no error is reported). It makes no difference when the client machine is rebooted.

The connection appears in the list of active connection in Sybase Central and is consuming CPU time (slightly less than a full processor's worth - though it varies). The statement reported as the last statement is a fairly straightforward and is being executed by the system for other connections thousands of times per hour without problem. The connection is not shown as being blocked.

The server version is 10.0.1.4239

Related questions:

http://sqlanywhere-forum.sap.com/questions/1772

http://sqlanywhere-forum.sap.com/questions/1368

Accepted Solutions (1)

Accepted Solutions (1)

jeff_albion
Advisor
Advisor

We have seen customer reports in the past (known to exist 11.0.1) of the bug CR #594568 leaving 'dead connections' around that can no longer be dropped. (e.g. We started a HashFilter/parallelized intra-query task and 'accidentally' left around some worker threads afterwards).

Notably, CR #594568 was only the first fix for a class of 'HashFilter' problems - there were additional issues (in particular, CR #620095 ) fixed on later builds/versions (including 10.0.1/12.0.1). I would recommend seeing if this 'zombie connection' behaviour still occurs after these fixes/builds:


Update (2012/03/30):

We have now diagnosed more instances of a 'hung connection' problem from other customer reports. The updated list for parallelized HashFilter problems (that may leave Exchange worker tasks 'abandoned' and 'undroppable') now includes the following issues/builds:

  1. CR #594568 - 11.0.1.2401
  2. CR #620095 - 10.0.1.4182, 11.0.1.2544, 12.0.1.3126
  3. CR #693560 - 10.0.1.4309, 11.0.1.2786, 12.0.1.3713
  4. CR #702733 - 11.0.1.2786, 12.0.1.3713

The list of associated parallelized 'HashFilter' issues (server hang/crashes/incorrect results) include:

  1. CR #663991 - 10.0.1.4205, 11.0.1.2584, 12.0.1.3320
  2. CR #674917 - 11.0.1.2631, 12.0.1.3388

Note: CR #702733 was identified after the End-of-Life date for SQL Anywhere 10.0.x. As such, this fix was not backported to this codeline and it remains a possibility for 10.0.x builds to encounter undroppable internal connections in very rare circumstances. If this is a concern for you, please upgrade to a later version of SQL Anywhere (11.0.1 / 12.0.1), which includes this fix. Setting the connection option 'MAX_QUERY_TASKS' to '1' or restarting the database server is a workaround for the issue in 10.0.1.

VolkerBarth
Contributor
0 Kudos

Nevertheless, a great summary!

FWIW, the last CR notes a version 4.0 - is Watcom SQL (as product, not as the great SQL dialect) still under active development?

jeff_albion
Advisor
Advisor
0 Kudos

The last reference was a recently fixed issue; the builds I have posted should be accurate. I believe the public facing system is still 'catching up' with the versions. Thanks for the report!

justin_willey
Participant
0 Kudos

It's not directly reproducible but is occurring regularly. Potentially more helpfully: the server in question crashed with this issue on-going and as part of the investigation of that (probably unrelated) issue (case express #116801982) a full core dump and a request level log have been uploaded to support today.

justin_willey
Participant
0 Kudos

That second sentence could have been clearer - I meant the investigation of the crashing problem!

jeff_albion
Advisor
Advisor

If the server is also crashing, it sounds like the environment is possibly 'unstable'. There is likely another issue involved (that is responsible for the crash specifically) that may be contributing to the hanging connections (e.g. potentially something 'like' CR #674917). It could be the case that getting a connection into this state precipitates the crash, or...

I would be interested to see if this behaviour still happens after we figure out the cause of the crash internally - it looks like we're still currently investigating that issue.

justin_willey
Participant
0 Kudos

Thanks for the updated info!

Breck_Carter
Participant
0 Kudos

Let me add my thanks, too, for taking the time to update two three other discussions as well as this one

jeff_albion
Advisor
Advisor

You're quite welcome - I've been trying to keep track of the reports for this issue and I think I've found all of the questions now. Feel free to link back to this thread if you find any more around.

If I hear of anything else regarding this situation, I will try to keep this answer updated. Please feel free to report future instances of any 'undroppable connections' as a new question after these builds have been released. (These builds have also been currently requested in QA).

jeff_albion
Advisor
Advisor
0 Kudos

The 11.0.1 Windows / Linux EBFs have now been released for the CR #702733 issue:

11.0.1.2789 - Windows x86/x64 - http://downloads.sybase.com/swd/detail.do?baseprod=144&client=ianywhere&relid=15647

11.0.1.2794 - Linux x86/x64 - http://downloads.sybase.com/swd/detail.do?baseprod=144&client=ianywhere&relid=15643

The 12.0.1 EBFs are still progressing through QA.

Answers (1)

Answers (1)

johnsmirnios
Participant

My guess would be that the connection has a huge transaction to rollback. The communications link may be gone but the server must actually rollback all uncommitted operations performed by that connection before the connection will disappear from the list reported by the server.

VolkerBarth
Contributor
0 Kudos

...so this would be a "delayed" disconnect and not a failed one?

justin_willey
Participant
0 Kudos

That would make sense John - thanks. There shouldn't be any really big transactions but .... If this is what is happening, what would you expect the ReqType (Type of active request) to show - these connections are showing 'FETCH'? In case it helps ReqStatus is 'Executing'.

johnsmirnios
Participant
0 Kudos

As far as the server is concerned, the communication link and the internal representation of a database connection are distinct (though clearly related) entities. The internal representation of a database connection cannot go away until the rollback completes. I don't know for sure what information is returned by the connection info procedures when the communication link has terminated but the internal representation of a connection is still being torn down but I wouldn't be surprised if it was 'active'.

One thing that could be done while in the state described by Volker is to try to execute a checkpoint on another connection. Checkpoints cannot be performed while a rollback is in progress. If you can execute a checkpoint, my guess is wrong and perhaps there is a bug. If you cannot execute a checkpoint, you can't really tell for sure if there's a rollback in progress since other things can prevent checkpoints too but it's a good indicator.

johnsmirnios
Participant
0 Kudos

Without looking to be certain, I would imagine that the last active request indicates the last operation performed over the communications link. The rollback on a disconnect is generated internally as part of the tear-down process.

justin_willey
Participant
0 Kudos

Thank for that tip - it hadn't occurred to me. Checkpoints are happening regularly (about every 10 mins) as normal for the server, taking a few seconds (the connection has been in this state for a few hours now) - so does that suggest that there could be an issue? It's the only odd thing we've been able to establish so far about this server that is crashing regularly for no apparent reason. I'm just wondering if we could be running into some sort of resource leak.

johnsmirnios
Participant
0 Kudos

If you are doing checkpoints, the connection in question is not performing a rollback. There must be something else going on.