cancel
Showing results for 
Search instead for 
Did you mean: 

pyodbc runs "select" queries very slowly when providing parameter

jtdawso1
Explorer
0 Kudos
330

Hello,

For a bit of context, I'm using ASE 16.0 SP03, an up-to-date copy of libsybdrvodb.so, Python 3.8.10, and an up-to-date copy of pyodbc ( https://github.com/mkleehammer/pyodbc )

I found a strange issue when running a simple "select" query through using pyodbc's "?" syntax. When I run this query directly in a sqsh console, it executes *very* quickly:

select * from files where filename='pgb.197901.bz2';

Adaptive Server cpu time: 0 ms. Adaptive Server elapsed time: 0 ms.

However, when I run that exact same query in a python script, it takes over a minute! The exact syntax I'm using is this:

cursor.execute("select * from files where filename=?", ['pgb.197901.bz2'])
print(cursor.fetchall())
 
I could've sworn that this wasn't an issue two months ago, which has me quite confused!
 
The above query takes over a minute, but this syntax takes just 18 ms:
cursor.execute("select * from files where filename='pgb.197901.bz2'", [])
print(cursor.fetchall())
This does work, but it's not a great solution--we want to use pyodbc's placeholder "?" syntax, because it helps to prevent SQL injection attacks.
 

I have found one other workaround which doesn't work perfectly here: https://community.sap.com/t5/technology-q-a/ase-15-5-poor-performance-from-odbc/qaq-p/12120748 , which suggests adding "DYNAMICPREPARE=0;" to my connection string. That indeed speeds up performance! When I use this connection string:

"DRIVER={};SERVER={};UID={};PWD={};PORT=5000;DYNAMICPREPARE=1;APP=SelectTest;"
My "cursor.execute("select * from files where filename=?", ['pgb.197901.bz2'])" query takes just 20 milliseconds (including the time taken to create the connection).
The problem with this approach is that it appears to break stored procedures! When I run this:
cursor.execute("exec get_files @filename=?", ['pgb.197901.bz2'])
print(cursor.fetchall())
I receive this error message:
"[42000] [SAP][ASE ODBC Driver][Adaptive Server Enterprise]Incorrect syntax near '?'.\n (102) (SQLExecDirectW)"
So that solution doesn't work very well for me either
 
 
To summarize, it seems like I have three bad options:
  1. Keep everything the same, but swap all my queries to only use stored procedures (not great because we want to be able to test queries out on-the-fly)
  2. Break all stored procedures, but speed up our other queries (by adding DYNAMICPREPARE). Would require large changes to our code base, so this isn't viable
  3. Make ourselves vulnerable to sql injection (insert params directly into the query string)

I'm not sure what to do--am I missing something here? Is there some way to speed up our queries so that they don't run orders-of-magnitude more slowly that *doesn't* break our existing stored procedures? Am I doing something strange in my connection information? Is there some argument that I'm missing out on?

Thank you for your help!

Accepted Solutions (1)

Accepted Solutions (1)

ryan_hansen
Product and Topic Expert
Product and Topic Expert

Hi,

I tried to reproduce with newer versions, but I didn't see any performance issues.

$ python3 --version
Python 3.11.2

$ strings $SYBASE/DataAccess64/ODBC/lib/libsybdrvodb.so | grep "SAP ASE"
SAP ASE ODBC Driver/16.1.00.00/16.1 GA/P/x86_64/Linux/drv161/Build 1/64-bit/Normal/Fri Dec 20 UTC 15:11:35 2024/SQLLEN8

$ setenv PYTHONPATH /work/pyodbc-5.1.0/build/lib.linux-x86_64-cpython-311

$ vi odbc.ini
[sampledsn]
Driver = /my/sybase/DataAccess64/ODBC/lib/libsybdrvodb.so
Server = asehost
Port = 5000
UserID = sa
Password = password

$ setenv ODBCINI /my/sybase/DataAccess64/ODBC/odbc.ini

$ cat test.py
import pyodbc
import time

conn = pyodbc.connect("DSN=sampledsn;", dynamicprepare=1)
print("Successfully established connection")
cur = conn.cursor()
start_time = time.perf_counter()
cur.execute("select * from sysusers where name = ?", ['guest'])
end_time = time.perf_counter()
duration = end_time - start_time
while True:
row = cur.fetchall()
if (not row):
break
print("%s" % (row[0]))
print ("Duration: %f" % duration)
cur.close()
conn.close()

dynamicprepare=1
$ vi test.py
$ python3 test.py
Successfully established connection
(-1, 2, 0, 'guest', None, 0)
Duration: 0.002088

dynamicprepare=0
$ vi test.py
$ python3 test.py
Successfully established connection
(-1, 2, 0, 'guest', None, 0)
Duration: 0.001405


I was, however, able to reproduce the dynamicprepare issue.
1> create proc test @V1 varchar(10) as select * from sysusers where name = @V1
2> go
1> test guest
2> go
suid uid gid name
environ
user_status
----------- ----------- ----------- ------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------
-1 2 0 guest
NULL
0

(1 row affected)
(return status = 0)


change query
$ vi test.py
#cur.execute("select * from sysusers where name = ?", ['guest'])
cur.execute("exec test @V1=?", ['guest'])

$ python3 test.py
Successfully established connection
Traceback (most recent call last):
File "/my/sybase/OCS-16_1/sample/python/test.py", line 9, in <module>
cur.execute("exec test @V1=?", ['guest'])
pyodbc.ProgrammingError: ('42000', "[42000] [SAP][ASE ODBC Driver][Adaptive Server Enterprise]Incorrect syntax near '?'.\n (102) (SQLExecDirectW)")

dynamicprepare=1
It actually doesn't change the query into a stored proc but has a ?:
LANGUAGE Token (0x21); variable length.
Length [4]: 16
Status [1]: PARAMETERIZED (0x01)
Text Length [0]: [15]
Text [15]: "exec test @V1=?"

PARAMFMT Token (0xEC); variable length.
Length [2]: 14
Number of Params [2]: 1
Param 1
Name Length [1]: 0
Status [1]: PARAM_NULLALLOWED (0x20)
User Type [4]: 35
Data Type [1]: LONGBINARY
Length [4]: 255
Locale Length [1]: 0

PARAMS Token (0xD7); variable length.
Param 1
Length [4]: 10
Param data [10]: "guest"
(as bytes) [10]: 0x67007500650073007400

Capture Record Header
Source [4]: RESPONSE (0x00000002)
Length [4]: 72

PDU Header
TDS Packet Type [1]: BUF_RESPONSE (0x04)
Status [1]: BUFSTAT_EOM (0x01)
Length [2]: 72
Channel [2]: 0
Packet No. [1]: 0
Window [1]: 0

EED Token (0xE5); variable length.
Length [2]: 52
Message Number [4]: 102
Message State [1]: 181
Message Class [1]: 15
SQL State Length [1]: 5
SQL State [5]: "42000"
Status [1]: NO_EED (0x00)
Transaction State [2]: TDS_NOT_IN_TRAN (0x0000)
Message Length [2]: 27
Message Text [27]: "Incorrect syntax near '?'.
"
Server Name Length [1]: 4
Server Name [4]: "asename"
Stored Proc. Name Length [1]: 0
Line Number [2]: 1


Interesting enough, when you use dynamicprepare=0 it uses a named parameter
LANGUAGE Token (0x21); variable length.
Length [4]: 22
Status [1]: PARAMETERIZED (0x01)
Text Length [0]: [21]
Text [21]: "exec test @V1=@dr_ta0"

PARAMFMT Token (0xEC); variable length.
Length [2]: 21
Number of Params [2]: 1
Param 1
Name Length [1]: 7
Name [7]: "@dr_ta0"
Status [1]: PARAM_NULLALLOWED (0x20)
User Type [4]: 35
Data Type [1]: LONGBINARY
Length [4]: 255
Locale Length [1]: 0

PARAMS Token (0xD7); variable length.
Param 1
Length [4]: 10
Param data [10]: "guest"
(as bytes) [10]: 0x67007500650073007400

I did get an error about chained mode with my proc so I had to add "autocommit=1" to the connection string, but it succeeds if you use command:
cur.execute("test @V1=?", ['guest'])
Note: we remove the exec part of the query.

Let me know if you see performance issues with the example above and if this workaround is successful for the dynamicprepare=0.

Hope this helps,
Ryan

ryan_hansen
Product and Topic Expert
Product and Topic Expert
0 Kudos
I do not believe this is a bug.
ryan_hansen
Product and Topic Expert
Product and Topic Expert
0 Kudos
I believe the issue is that you are using named parameters and a ? in the same statement. Mixing named parameters and ? is not supported. You can remove the stored procedure parameter name and just give a ?. Other option is name = "guest" cur.execute("exec test ?", (name)). or name = "guest" cur.execute("exec test @V1=" + name)
jtdawso1
Explorer
0 Kudos
Thank you for the extensive comment! A lot to chew on there, I'm playing around with things now, but that last bit you said about "we remove the exec part of the query" is very interesting! It appears that, for a simple solution, one can just add DYNAMICPREPARE=1, autocommit=1, and remove 'exec' from their stored procedure calls, and everything works correctly (at least, in my tests so far). If none of the other ideas in this thread pan out for me, I can at least use that. Thank you!
jtdawso1
Explorer
0 Kudos

One other note for investigation purposes: you noticed that the "SELECT" statement didn't perform differently between your two DYNAMICPREPARE=1 tests. Well, I've found in some extra testing that this difference only exists when querying large tables (mine has 83 million rows), which might make a difference. My table *has* an index on the column I'm querying, and running "set showplan on" before my query reveals that the reason why it runs more slowly is because the plan says "Positioning at index start", whereas running the same query in sqsh says "Positioning by key". To be clear, the slow-running "SELECT" query *does* use the index, which really surprised me! I'm not too familiar with what that "positioning" statement means, but the plans are otherwise identical. I don't really know what to make of this, but it could be useful for further investigation

ryan_hansen
Product and Topic Expert
Product and Topic Expert
0 Kudos
My guess is there is a difference in the TDS sent to the ASE database. Maybe one does Named Parameters and one does ? Could be that the TDS token itself is Dynamic or RPC for one and not the other. Not sure if that is a framework thing, but in ASE it may be picking a different path based on this. Looking up sqsh it looks like an Open Client application and pyodbc is ODBC driver. I showed the TDS trace above for the other example. You can do this by adding this connection property for ODBC: protocolcapture=/some/location/cap /some/location = directory this will be writing to. cap = the prefix for the file going to be generated per connection. Run the test and then translate the file to text. cd $SYBASE/jutils-3_0/ribo setenv RIBO_HOME $SYBASE/jutils-3_0/ribo setenv JAVA_HOME $SYBASE/jre64 mv cap* $RIBO_HOME ./Ribo capname capname.txt Then vi the text file and search for your query. Open Client: setenv SYBOCS_DEBUG_FLAGS CS_DBG_PROTOCOL sqsh or isql It should generate a file in your execution directory $ cd $SYBASE $ source SYBASE.csh $ setenv SYBOCS_DEBUG_FLAGS CS_DBG_PROTOCOL $ isql -Usa -Psybase -Sccus 1> exit $ ls cap* captureWnc9fM
jtdawso1
Explorer
Hey Ryan, thanks for that comment. That's given me a little bit of a lead--the file that Ribo translated says that my filename has been auto-translated to a "LONGBINARY" type. Indeed, when I run "select * from files where filename=convert(varchar(100),?)" as my query, the query runs near-immediately! So, the problem seems to be that pyodbc translates Python strings to that "LONGBINARY" type for whatever reason. That's a great lead!

Answers (3)

Answers (3)

sladebe
Active Participant

Whoops, entered a bad answer here by mistake.  No way to delete it.

 

sladebe
Active Participant
Yo Mark, thanks for the "kudo" for this answer. It made me snicker. Especially since I got the notification email saying, "Mark_A_Parsons thinks your post is awesome and just gave you kudos!"
Mark_A_Parsons
Contributor
0 Kudos

Hilarious email. I have email notifications disabled so didn't realize that's a thing. Thanks for the EOW chuckle!

Mark_A_Parsons
Contributor
0 Kudos

runs more slowly is because the plan says "Positioning at index start", whereas running the same query in sqsh says "Positioning by key".

Positioning by key means the index was used to quickly zero in on the desired row(s) based on a distinct search value (eg, key='abc').

Positioning at index start implies an index scan (as opposed to a table scan) and will occur when (in this case) the only search clause is on the index key but the search value is not a distinct value (eg, key like '%abc%', or key=substring('abc',1,3)).  In this situation ASE is left with scanning the entire table and applying the search clause against every row; if there's an index on the search column and ASE determines the index resides on fewer pages than the raw data then it will opt for an index scan.  In your case ASE is opting to scan the index (fewer pages than scanning the entire table) while applying the search clause against every key value in the index ... this will (usually) be faster than a full table scan but it will still be quite a bit slower than a query that can benefit from Positioning by key.

Why is ASE opting for an index scan? This would require knowing what exactly ASE is seeing as the search value (submitted by your python script).  This is where the previous comment about the DBA and MDA tables comes into play ... have the DBA capture the actual SQL being submitted by your application.  [Or use the trace method provided by Ryan.] Having the actual (likely mangled) search value in hand should get you a step closer to determining if there's an issue with the way you're constructing the query or an issue with the library you're using.

Does the 'slow' python query eventually return the same results as the sqsh query? or does it return no results?

sladebe
Active Participant
0 Kudos
Here's a SAP Note about a similar behavior, but with cached statements (automatically precompiled SQL statements) instead of prepared statements:
sladebe
Active Participant
0 Kudos

Here's a SAP Note about a similar behavior, but with cached statements (automatically precompiled SQL statements) instead of prepared statements: https://me.sap.com/notes/2077161/E

Query takes a long time to complete due to bad table scan when statement cache or literal autoparam is on

ASE query showplan shows that optimizer is choosing a table scan instead of more efficient index scan when statement cache and literal autoparam are on.
Optimizer chooses the more efficient index scan when literal autoparam is off
Query contains multiple copies of the same literal value

Environment
SAP Adaptive Server Enterprise (ASE) 15.7
SAP Adaptive Server Enterprise (ASE) 16.0

Cause
When literal autoparam is used, the optimizer cannot assume that the literal values are all the same, as different values could be substituted for the next execution and the cached plan will need to be reasonable for both circumstances. When literal autoparam is not active, the optimizer can generate a better plan based on knowing the values of the literals.

The issue has been identified and logged under SAP CR 644531. Its description reads "When statement cache or literal autoparam is on, the query chooses a bad table scan that takes a long time to complete. With statement cache off, or literal autoparam off, the query chooses the index."

Resolution
There is currently no target Fix Pack or Service Pack scheduled for SAP CR 644531. It is being reviewed by our Product Group. Subscribe to this KBA to be automatically notified when SAP CR 644531is fixed.
Workarounds are:

set statement_cache off
or
sp_configure "literal autoparam", 0

Mark_A_Parsons
Contributor
0 Kudos

Ahhh, you updated your comment while I was typing mine.

-------------

In OP's case I'd opt for set statement_cache off; select ...; set statement_cache on.

Disabling configs (eg, literal autoparam) at the server level should be thoroughly tested as said changes could adversely affect other queries, users and/or applications. ("Duh, Mark!" ?)

sladebe
Active Participant
0 Kudos

Maybe it's the thing were precompiled SQL statement (either via a "prepare" call or with ASE statement caching) has to use a different query plan because it can't optimize for a specific value of the filename.

Since DYNAMICPREPARE=0 in your connection string shouldn't affect calling existing stored procs (according to Chris), it sounds like that should be safe to use.

Also, try running the good query with "set showplan on" to see what index it's using, then add an index hint to the cursor.execute() call.  Eg.

cursor.execute("select * from files (index <myindex)) where filename=?", ['pgb.197901.bz2'])

Based on https://github.com/mkleehammer/pyodbc/issues/214 it sounds like a statement is automatically prepared when cursor.execute() is called, but then subsequent cursor.execute() with similar queries will automatically use the same prepared statement.   If this is true, and if auto-prepare on cursor.execute() is on, and if you execute a large number of queries that can't be recognized as similar, you stand the chance creating a huge number of unused prepared statement memory structures in the ASE server.  I had a large ASE server overload once because of this.