cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

Is there a way to avoid or reduce the "thread deadlock"?

8,647

I'm receiving a lot "Thread deadlocked occurred" in the messages log. I'm trying to figure out what could be the best combination of Multiprogramming level (MPL) configuration and max_query_task for my database servers. There are several DBs running on Windows servers with 8 CPU. The MPL varies up and down indicating the server is unable to correctly determine the best value.

The problem is when the "thread deadlock" arrive, the server adjust incrementing the MPL but in some points this situation turn worst, causing a bottleneck and receiving the error "Connection terminated abnormally; error code 121". I even think that some threads could be being orphaned or hanged that we have to setup the option "Disconnects inactive connections" (-ti) in 5 minutes.

These are the actual values:

SQL Anywhere Network Server Version 12.0.1.4436,
Processors detected: 8,
Processor limit (licensed processors): 7,
Multiprogramming level: minimum:7, current:20, maximum:80,
AutoMultiProgrammingLevel = 1 (Automatic tuning of multiprogramming level is enabled),
max_query_tasks = 0 (I'm thinking setup in 4 following best practice with 8 processors)

Thanks for your comments.

Accepted Solutions (0)

Answers (2)

Answers (2)

Breck_Carter
Participant

There is a big difference between "not enough threads to satisfy the load" and "all threads are blocked".

For example, it is possible to have dreadfully slow performance because the multiprogramming level is so low that hundreds of connections are waiting for threads, without getting "all threads are blocked":

alt text

The reason you don't get "all threads are blocked" is because there are a few threads actually doing work... "all threads are blocked" means just that, all the threads are blocked.

Glenn Paulley's article What exactly is thread deadlock? is worth reading.

Important points to note: "All threads are blocked" is not easy to get, it isn't really "deadlock" in the sense of "deadly embrace", and chances are good that isolation level is involved.

VolkerBarth
Contributor
0 Kudos

Isolation level - or external environments...

Breck_Carter
Participant

AFAIK there is no such message text as "Thread deadlock occurred".

Please tell us exactly what the message says, and the SQLCODE or SQLSTATE.

There are two likely culprits, both vastly different in their usual causes (the second one usually isn't a "deadlock", just a "running out of resources")...

Deadlock detected
Error constant  SQLE_DEADLOCK
ODBC 2 state    40001
ODBC 3 state    40001
Severity    13
SQLCODE -306
SQLSTATE    40001
Sybase error code   1205

You attempted to read or write a row and it is locked by another user. Also, the other user is blocked directly or indirectly on your own transaction. This is a deadlock situation and your transaction has been chosen as the one to rollback.

All threads are blocked
Error constant  SQLE_THREAD_DEADLOCK
ODBC 2 state    40001
ODBC 3 state    40001
Severity    13
SQLCODE -307
SQLSTATE    40W06
Sybase error code   1205

You attempted to read or write a row and it is locked by another user. Also, all other threads (see the -gn server option) are blocked waiting for a lock to be released. This is a deadlock situation and your transaction has been chosen as the one to rollback.

If it's the SQLCODE -307 then first try increasing the -gn. Here's a tip from the upcoming Foxhound 5 Help:

Performance Tip: If Alert #1 and All Clear #1 messages appear repeatedly, along with up-and-down changes in the multiprogramming level (Max Req), that may indicate the server is thrashing while it tries to determine the best value of Max Req.

If the computer is dedicated to running this SQL Anywhere server and no other process of importance, consider disabling the dynamic tuning of the multiprogramming level by specifying the dbsrv -gna 0 and -gn options to permanently set a fixed value. With a dedicated computer there may be no need for SQL Anywhere to perform extra work changing the multiprogramming level up and down.

You can temporarily disable the dynamic tuning of the multiprogramming level as follows:

-- Set fixed multiprogramming level.

CALL sa_server_option ( 'AutoMultiProgrammingLevel',    'NO' );  
CALL sa_server_option ( 'MinMultiProgrammingLevel',     '100' );
CALL sa_server_option ( 'MaxMultiProgrammingLevel',     '100' );
CALL sa_server_option ( 'CurrentMultiProgrammingLevel', '100' ); 

SELECT @@VERSION,
       PROPERTY ( 'AutoMultiProgrammingLevel'    ) AS "-gna",
       PROPERTY ( 'MultiProgrammingLevel'        ) AS "-gn",
       PROPERTY ( 'MinMultiProgrammingLevel'     ) AS "-gnl",
       PROPERTY ( 'MaxMultiProgrammingLevel'     ) AS "-gnh",
       PROPERTY ( 'CurrentMultiProgrammingLevel' ) AS "current";

@@VERSION,     -gna, -gn,   -gnl,  -gnh,  current
'16.0.0.2512', '0',  '100', '100', '100', '100'
Here's how to re-enable dynamic tuning of the multiprogramming level:

-- Enable dynamic tuning of multiprogramming level.

CALL sa_server_option ( 'AutoMultiProgrammingLevel',    'YES' );
CALL sa_server_option ( 'MinMultiProgrammingLevel',     '8' );
CALL sa_server_option ( 'MaxMultiProgrammingLevel',     '100' );
CALL sa_server_option ( 'CurrentMultiProgrammingLevel', '100' ); 

SELECT @@VERSION,
       PROPERTY ( 'AutoMultiProgrammingLevel'    ) AS "-gna", 
       PROPERTY ( 'MultiProgrammingLevel'        ) AS "-gn",
       PROPERTY ( 'MinMultiProgrammingLevel'     ) AS "-gnl",
       PROPERTY ( 'MaxMultiProgrammingLevel'     ) AS "-gnh",
       PROPERTY ( 'CurrentMultiProgrammingLevel' ) AS "current";

@@VERSION,     -gna, -gn,   -gnl,  -gnh,  current
'16.0.0.2512', '1',  '100', '8',   '100', '100'
0 Kudos

Breck, thanks for the information, the user did not inform me of any errors, I'm just seeing these messages in the message log and, for my opinion, it is a coincidence that error 121 occurs after a thread deadlock. I'm trying to minimize these abnormal disconnections by preventing processes from handling much intra-query parallelism and the MPL going higher than the server can handle.

Breck_Carter
Participant
0 Kudos

> these messages in the message log

Please show us an example... thanks!

This topic is very important because it is closely related to the worst performance problem ever: an unresponsive server.

0 Kudos

Here is an extract of the messagelog (-o messagelog):

Command: findstr "parallelism error Adjusting deadlocked Multiprogramming adjusted warning caching" messagelog

I. 03/19 15:41:45. Adjusting the multiprogramming level to 7
I. 03/19 15:45:42. Multiprogramming level increased by 1 to handle intra-query parallelism
I. 03/19 15:45:42. Adjusting the multiprogramming level to 18
I. 03/19 15:45:42. Thread deadlocked occurred. Adjusted the multiprogramming level to 18
I. 03/19 15:47:23. Adjusting the multiprogramming level to 7
I. 03/19 15:48:37. Multiprogramming level increased by 1 to handle intra-query parallelism
I. 03/19 15:48:37. Adjusting the multiprogramming level to 18
I. 03/19 15:48:37. Thread deadlocked occurred. Adjusted the multiprogramming level to 18
I. 03/20 02:00:45. Cache size adjusted to 825212K
I. 03/20 02:01:45. Cache size adjusted to 830216K
I. 03/20 02:02:45. Cache size adjusted to 831700K
I. 03/20 02:03:45. Cache size adjusted to 831844K
I. 03/20 14:27:40. Adjusting the multiprogramming level to 7
I. 03/20 14:28:10. Multiprogramming level increased by 2 to handle intra-query parallelism
I. 03/20 14:28:10. Multiprogramming level increased by 1 to handle intra-query parallelism
I. 03/20 14:28:10. Adjusting the multiprogramming level to 20
I. 03/20 14:28:10. Thread deadlocked occurred. Adjusted the multiprogramming level to 20
I. 03/20 14:28:21. Connection terminated abnormally; error code 121
I. 03/20 15:13:00. Adjusting the multiprogramming level to 7
I. 03/20 15:13:04. Adjusting the multiprogramming level to 17
I. 03/20 15:13:04. Thread deadlocked occurred. Adjusted the multiprogramming level to 17
I. 03/20 15:14:11. Adjusting the multiprogramming level to 10
I. 03/20 15:14:45. Adjusting the multiprogramming level to 8
I. 03/20 15:14:46. Adjusting the multiprogramming level to 18
I. 03/20 15:14:46. Thread deadlocked occurred. Adjusted the multiprogramming level to 18
I. 03/20 15:15:22. Connection terminated abnormally; error code 121
I. 03/20 15:15:31. Connection terminated abnormally; error code 121
I. 03/21 02:04:29. Adjusting the multiprogramming level to 7
I. 03/21 07:36:43. Multiprogramming level increased by 1 to handle intra-query parallelism
I. 03/21 07:36:43. Adjusting the multiprogramming level to 18
I. 03/21 07:36:43. Thread deadlocked occurred. Adjusted the multiprogramming level to 18
I. 03/21 09:28:34. Adjusting the multiprogramming level to 7
I. 03/21 09:29:57. Multiprogramming level increased by 1 to handle intra-query parallelism
I. 03/21 09:29:57. Adjusting the multiprogramming level to 18
I. 03/21 09:29:57. Thread deadlocked occurred. Adjusted the multiprogramming level to 18
I. 03/21 13:56:32. Cache size adjusted to 832748K
I. 03/21 13:57:32. Cache size adjusted to 832976K
I. 03/21 15:42:38. Cache size adjusted to 832908K
I. 03/21 16:00:39. Cache size adjusted to 832824K
I. 03/21 16:10:39. Cache size adjusted to 832752K
I. 03/21 17:10:42. Cache size adjusted to 832632K
I. 03/21 17:12:42. Cache size adjusted to 832548K
I. 03/21 17:54:44. Cache size adjusted to 832476K
I. 03/22 02:04:49. Adjusting the multiprogramming level to 7
I. 03/22 08:01:45. Multiprogramming level increased by 1 to handle intra-query parallelism
I. 03/22 08:01:45. Adjusting the multiprogramming level to 18
I. 03/22 08:01:45. Thread deadlocked occurred. Adjusted the multiprogramming level to 18

I didn't setup all DBs monitoring with Foxhound but in the DBs that I have didn't receive the Alert#1 "unresponsive server", some times we get samples interval with "unscheduled request" but not arrive the Alert#14.

VolkerBarth
Contributor
0 Kudos

What happens when you increase the minimum MPL value to 18 - does that also lead to thread deadlocks? Or otherwise, when you limit the maximum MPL value to below 18 or altogether use a fixed MPL?

VolkerBarth
Contributor

Have you tried to find out what connections make use of intra-query parallelism and to prevent that by using max_query_tasks = 1?

Breck_Carter
Participant
0 Kudos

Where are these messages coming from?

I suspect they are coming from MESSAGE statements in application programs or user-written stored procedures.

I am fairly certain they are NOT coming from the SQL Anywhere server, especially the one with the spelling mistake "Thread deadlocked occurred".

A Hexplorer binary search of dbserv16.exe and an empty V16 *.db file did NOT find any occurrence of "deadlocked". The closest match was "Thread deadlock detected, see -gn option".

The following SELECTs do a search on all stored procedure code...

BEGIN
SELECT * FROM SYSTABLE     WHERE view_def     LIKE '%deadlocked%';
SELECT * FROM SYSPROCEDURE WHERE proc_defn    LIKE '%deadlocked%';
SELECT * FROM SYSTRIGGER   WHERE trigger_defn LIKE '%deadlocked%';
SELECT * FROM SYSEVENT     WHERE source       LIKE '%deadlocked%';
END;

Breck_Carter
Participant
0 Kudos

> error code 121

Note that SQLCODE 121 is Cursor option values changed, and -121 is Permission denied.

VolkerBarth
Contributor
0 Kudos

"thread deadlock occurred" is contained in the dbserv16.dll, FWIW. Note the correct grammar:)

Besides that, I share your suspicion that those are not built-in messages...

Breck_Carter
Participant
0 Kudos

Arrggh! I missed that string entirely... Hexplorer searches are case-sensitive (of course) and I was searching for "Thread" (stupid)...

...BUT actually the string in dbserv16.dll is this: "thread deadlock occured. Current num="

The grammer, er, grammar may be OK but the spelling isn't: occured needs two r's 🙂

0 Kudos

Ok, Volker: I'm preparing a test environment to play with the MPL. The connections that use intra-query parallelism are randomly, it depend of the workload. I have a dedicated server to 1 DB in production, I configured the max_query_tasks option in 4 and I think that is helping to lead less thread deadlock but at the end everything depends on the concurrence and throughput.

Breck: I have setup this parameters in the start up (-zp -zl -zt -gns -cs -o d:\\PATH\\messagelog) so I get details in the messagelog file, so them coming from the SQL Anywhere (Ver 12.0.1). I don't know why the spelling is wrong but definitely are coming from SQL Anywhere, did you check the dbsrv12.exe?

Breck_Carter
Participant
0 Kudos

I searched dbserv12.dll for "hread dead" in both the Bin32 and Bin64 folders, with only the following hits (in different order in the two binaries)...

Bin32
Thread deadlock detected, see -gn option
thread deadlock occured. Current num=

Bin64
thread deadlock occured. Current num=
Thread deadlock detected, see -gn option

Google gives zero hits for "Thread deadlocked occurred"... which would be impossible IMO, if SQL Anywhere produces this message.

Did you search the SYSPROCEDURE and other tables?