cancel
Showing results for 
Search instead for 
Did you mean: 

pyodbc runs "select" queries very slowly when providing parameter

jtdawso1
Explorer
0 Kudos
508

Hello,

For a bit of context, I'm using ASE 16.0 SP03, an up-to-date copy of libsybdrvodb.so, Python 3.8.10, and an up-to-date copy of pyodbc ( https://github.com/mkleehammer/pyodbc )

I found a strange issue when running a simple "select" query through using pyodbc's "?" syntax. When I run this query directly in a sqsh console, it executes *very* quickly:

select * from files where filename='pgb.197901.bz2';

Adaptive Server cpu time: 0 ms. Adaptive Server elapsed time: 0 ms.

However, when I run that exact same query in a python script, it takes over a minute! The exact syntax I'm using is this:

cursor.execute("select * from files where filename=?", ['pgb.197901.bz2'])
print(cursor.fetchall())
 
I could've sworn that this wasn't an issue two months ago, which has me quite confused!
 
The above query takes over a minute, but this syntax takes just 18 ms:
cursor.execute("select * from files where filename='pgb.197901.bz2'", [])
print(cursor.fetchall())
This does work, but it's not a great solution--we want to use pyodbc's placeholder "?" syntax, because it helps to prevent SQL injection attacks.
 

I have found one other workaround which doesn't work perfectly here: https://community.sap.com/t5/technology-q-a/ase-15-5-poor-performance-from-odbc/qaq-p/12120748 , which suggests adding "DYNAMICPREPARE=0;" to my connection string. That indeed speeds up performance! When I use this connection string:

"DRIVER={};SERVER={};UID={};PWD={};PORT=5000;DYNAMICPREPARE=1;APP=SelectTest;"
My "cursor.execute("select * from files where filename=?", ['pgb.197901.bz2'])" query takes just 20 milliseconds (including the time taken to create the connection).
The problem with this approach is that it appears to break stored procedures! When I run this:
cursor.execute("exec get_files @filename=?", ['pgb.197901.bz2'])
print(cursor.fetchall())
I receive this error message:
"[42000] [SAP][ASE ODBC Driver][Adaptive Server Enterprise]Incorrect syntax near '?'.\n (102) (SQLExecDirectW)"
So that solution doesn't work very well for me either
 
 
To summarize, it seems like I have three bad options:
  1. Keep everything the same, but swap all my queries to only use stored procedures (not great because we want to be able to test queries out on-the-fly)
  2. Break all stored procedures, but speed up our other queries (by adding DYNAMICPREPARE). Would require large changes to our code base, so this isn't viable
  3. Make ourselves vulnerable to sql injection (insert params directly into the query string)

I'm not sure what to do--am I missing something here? Is there some way to speed up our queries so that they don't run orders-of-magnitude more slowly that *doesn't* break our existing stored procedures? Am I doing something strange in my connection information? Is there some argument that I'm missing out on?

Thank you for your help!

EDIT/TL;DR/SOLUTION:

Thanks to everyone who commented, I think that this is much more clear. It appears that the issue is caused by pyodbc encoding python strings incorrectly (as BINARY data rather than CHAR data), which then forces the database to use an inefficient plan to execute its query. I've got two stable solutions, but I recommend using the first. Here they are:

  1. conn = pyodbc.connect(
    "DRIVER={};SERVER={};UID={};PWD={};PORT=5000;APP=SelectTest;".format(
    db_config["driver"],
    db_config["server"],
    db_config["uid"],
    db_config["pwd"],
    ),
    autocommit=True,
    )
    conn.setencoding(encoding='iso-8859-1') # this is the most important line!
    cursor = conn.cursor()
    cursor.execute("select * from files where filename = ?", [filename_to_query'])
    print(cursor.fetchall())
  2. conn = pyodbc.connect(
    "DRIVER={};SERVER={};UID={};PWD={};PORT=5000;APP=SelectTest;DYNAMICPREPARE=1;".format( # this DYNAMICPREPARE thing is the most important part here
    db_config["driver"],
    db_config["server"],
    db_config["uid"],
    db_config["pwd"],
    ),
    autocommit=True,
    )
    cursor = conn.cursor()
    cursor.execute("select * from files where filename = ?", [filename_to_query]) # note: if the query is a stored proc, make sure that you *do not* use the "exec" keyword, otherwise the query appears to break. Unsure why
    print(cursor.fetchall())
View Entire Topic
ryan_hansen
Product and Topic Expert
Product and Topic Expert

Hi,

I tried to reproduce with newer versions, but I didn't see any performance issues.

$ python3 --version
Python 3.11.2

$ strings $SYBASE/DataAccess64/ODBC/lib/libsybdrvodb.so | grep "SAP ASE"
SAP ASE ODBC Driver/16.1.00.00/16.1 GA/P/x86_64/Linux/drv161/Build 1/64-bit/Normal/Fri Dec 20 UTC 15:11:35 2024/SQLLEN8

$ setenv PYTHONPATH /work/pyodbc-5.1.0/build/lib.linux-x86_64-cpython-311

$ vi odbc.ini
[sampledsn]
Driver = /my/sybase/DataAccess64/ODBC/lib/libsybdrvodb.so
Server = asehost
Port = 5000
UserID = sa
Password = password

$ setenv ODBCINI /my/sybase/DataAccess64/ODBC/odbc.ini

$ cat test.py
import pyodbc
import time

conn = pyodbc.connect("DSN=sampledsn;", dynamicprepare=1)
print("Successfully established connection")
cur = conn.cursor()
start_time = time.perf_counter()
cur.execute("select * from sysusers where name = ?", ['guest'])
end_time = time.perf_counter()
duration = end_time - start_time
while True:
row = cur.fetchall()
if (not row):
break
print("%s" % (row[0]))
print ("Duration: %f" % duration)
cur.close()
conn.close()

dynamicprepare=1
$ vi test.py
$ python3 test.py
Successfully established connection
(-1, 2, 0, 'guest', None, 0)
Duration: 0.002088

dynamicprepare=0
$ vi test.py
$ python3 test.py
Successfully established connection
(-1, 2, 0, 'guest', None, 0)
Duration: 0.001405


I was, however, able to reproduce the dynamicprepare issue.
1> create proc test @V1 varchar(10) as select * from sysusers where name = @V1
2> go
1> test guest
2> go
suid uid gid name
environ
user_status
----------- ----------- ----------- ------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------
-1 2 0 guest
NULL
0

(1 row affected)
(return status = 0)


change query
$ vi test.py
#cur.execute("select * from sysusers where name = ?", ['guest'])
cur.execute("exec test @V1=?", ['guest'])

$ python3 test.py
Successfully established connection
Traceback (most recent call last):
File "/my/sybase/OCS-16_1/sample/python/test.py", line 9, in <module>
cur.execute("exec test @V1=?", ['guest'])
pyodbc.ProgrammingError: ('42000', "[42000] [SAP][ASE ODBC Driver][Adaptive Server Enterprise]Incorrect syntax near '?'.\n (102) (SQLExecDirectW)")

dynamicprepare=1
It actually doesn't change the query into a stored proc but has a ?:
LANGUAGE Token (0x21); variable length.
Length [4]: 16
Status [1]: PARAMETERIZED (0x01)
Text Length [0]: [15]
Text [15]: "exec test @V1=?"

PARAMFMT Token (0xEC); variable length.
Length [2]: 14
Number of Params [2]: 1
Param 1
Name Length [1]: 0
Status [1]: PARAM_NULLALLOWED (0x20)
User Type [4]: 35
Data Type [1]: LONGBINARY
Length [4]: 255
Locale Length [1]: 0

PARAMS Token (0xD7); variable length.
Param 1
Length [4]: 10
Param data [10]: "guest"
(as bytes) [10]: 0x67007500650073007400

Capture Record Header
Source [4]: RESPONSE (0x00000002)
Length [4]: 72

PDU Header
TDS Packet Type [1]: BUF_RESPONSE (0x04)
Status [1]: BUFSTAT_EOM (0x01)
Length [2]: 72
Channel [2]: 0
Packet No. [1]: 0
Window [1]: 0

EED Token (0xE5); variable length.
Length [2]: 52
Message Number [4]: 102
Message State [1]: 181
Message Class [1]: 15
SQL State Length [1]: 5
SQL State [5]: "42000"
Status [1]: NO_EED (0x00)
Transaction State [2]: TDS_NOT_IN_TRAN (0x0000)
Message Length [2]: 27
Message Text [27]: "Incorrect syntax near '?'.
"
Server Name Length [1]: 4
Server Name [4]: "asename"
Stored Proc. Name Length [1]: 0
Line Number [2]: 1


Interesting enough, when you use dynamicprepare=0 it uses a named parameter
LANGUAGE Token (0x21); variable length.
Length [4]: 22
Status [1]: PARAMETERIZED (0x01)
Text Length [0]: [21]
Text [21]: "exec test @V1=@dr_ta0"

PARAMFMT Token (0xEC); variable length.
Length [2]: 21
Number of Params [2]: 1
Param 1
Name Length [1]: 7
Name [7]: "@dr_ta0"
Status [1]: PARAM_NULLALLOWED (0x20)
User Type [4]: 35
Data Type [1]: LONGBINARY
Length [4]: 255
Locale Length [1]: 0

PARAMS Token (0xD7); variable length.
Param 1
Length [4]: 10
Param data [10]: "guest"
(as bytes) [10]: 0x67007500650073007400

I did get an error about chained mode with my proc so I had to add "autocommit=1" to the connection string, but it succeeds if you use command:
cur.execute("test @V1=?", ['guest'])
Note: we remove the exec part of the query.

Let me know if you see performance issues with the example above and if this workaround is successful for the dynamicprepare=0.

Hope this helps,
Ryan

ryan_hansen
Product and Topic Expert
Product and Topic Expert
0 Kudos
I do not believe this is a bug.
ryan_hansen
Product and Topic Expert
Product and Topic Expert
0 Kudos
I believe the issue is that you are using named parameters and a ? in the same statement. Mixing named parameters and ? is not supported. You can remove the stored procedure parameter name and just give a ?. Other option is name = "guest" cur.execute("exec test ?", (name)). or name = "guest" cur.execute("exec test @V1=" + name)
jtdawso1
Explorer
0 Kudos
Thank you for the extensive comment! A lot to chew on there, I'm playing around with things now, but that last bit you said about "we remove the exec part of the query" is very interesting! It appears that, for a simple solution, one can just add DYNAMICPREPARE=1, autocommit=1, and remove 'exec' from their stored procedure calls, and everything works correctly (at least, in my tests so far). If none of the other ideas in this thread pan out for me, I can at least use that. Thank you!
jtdawso1
Explorer
0 Kudos

One other note for investigation purposes: you noticed that the "SELECT" statement didn't perform differently between your two DYNAMICPREPARE=1 tests. Well, I've found in some extra testing that this difference only exists when querying large tables (mine has 83 million rows), which might make a difference. My table *has* an index on the column I'm querying, and running "set showplan on" before my query reveals that the reason why it runs more slowly is because the plan says "Positioning at index start", whereas running the same query in sqsh says "Positioning by key". To be clear, the slow-running "SELECT" query *does* use the index, which really surprised me! I'm not too familiar with what that "positioning" statement means, but the plans are otherwise identical. I don't really know what to make of this, but it could be useful for further investigation

ryan_hansen
Product and Topic Expert
Product and Topic Expert
0 Kudos
My guess is there is a difference in the TDS sent to the ASE database. Maybe one does Named Parameters and one does ? Could be that the TDS token itself is Dynamic or RPC for one and not the other. Not sure if that is a framework thing, but in ASE it may be picking a different path based on this. Looking up sqsh it looks like an Open Client application and pyodbc is ODBC driver. I showed the TDS trace above for the other example. You can do this by adding this connection property for ODBC: protocolcapture=/some/location/cap /some/location = directory this will be writing to. cap = the prefix for the file going to be generated per connection. Run the test and then translate the file to text. cd $SYBASE/jutils-3_0/ribo setenv RIBO_HOME $SYBASE/jutils-3_0/ribo setenv JAVA_HOME $SYBASE/jre64 mv cap* $RIBO_HOME ./Ribo capname capname.txt Then vi the text file and search for your query. Open Client: setenv SYBOCS_DEBUG_FLAGS CS_DBG_PROTOCOL sqsh or isql It should generate a file in your execution directory $ cd $SYBASE $ source SYBASE.csh $ setenv SYBOCS_DEBUG_FLAGS CS_DBG_PROTOCOL $ isql -Usa -Psybase -Sccus 1> exit $ ls cap* captureWnc9fM
jtdawso1
Explorer
Hey Ryan, thanks for that comment. That's given me a little bit of a lead--the file that Ribo translated says that my filename has been auto-translated to a "LONGBINARY" type. Indeed, when I run "select * from files where filename=convert(varchar(100),?)" as my query, the query runs near-immediately! So, the problem seems to be that pyodbc translates Python strings to that "LONGBINARY" type for whatever reason. That's a great lead!
jtdawso1
Explorer
0 Kudos

I just had a breakthrough: if you just add the line "conn.setencoding(encoding='iso-8859-1')", it appears that the "LONGBINARY" type issue disappears, and the query executes near-instantly. I'm investigating further now, I'll edit this comment once I know more

EDIT: I'm also going to add this as a TL;DR, but this code now works for me:

conn = pyodbc.connect(
"DRIVER={};SERVER={};UID={};PWD={};PORT=5000;APP=SelectTest;".format(
db_config["driver"],
db_config["server"],
db_config["uid"],
db_config["pwd"],
),
autocommit=True,
)
conn.setencoding(encoding='iso-8859-1') # this is the most important line!
cursor = conn.cursor()
cursor.execute("select * from files where filename = ?", [filename_to_query])

print(cursor.fetchall())

ryan_hansen
Product and Topic Expert
Product and Topic Expert
0 Kudos
I don't think we got into this, but if you have an iso_1 database you shouldn't really have multibyte characters in it. You should have the database changed to what characters are supported and can be converted in and out based on the client's request. This is usually set by the environment variables LC_ALL or LANG. There are connection properties based on the client settings like charset=xxx. Example: $ pwd /my/sybase/locales $ echo $LANG en_US.UTF-8 $ grep "en_US.UTF-8" * locales.dat: locale = en_US.UTF-8, us_english, utf8 My client would request utf8 charset and my database is iso_1 by default. There will be a conversion between client and ASE because they are not the same. Encoding or code page is different. That is more display or how the app handles the characters/hex. On windows you have chcp or "change code page" which displays the characters differently in the cmd prompt or window or app. Just make sure you can display the character and the hex in the database is what you want. select convert(columnname, binary) from tablename Compare the output with the hex that it should be. Hope that helps, Ryan