on 2014 Jul 31 11:18 AM
Hi,
I have a table with over 500k rows and when I execute a query that supposed to return me a count for a subset of the data (28k) the invocation of ResultSet.next() takes over 5 seconds to perform. I don't see any delay when calling executeQuery(), only when calling ResultSet.next() to get the actual count value. What's more interesting is that when I actually need to fetch the values and use
select top x start at y ...the performance of ResultSet.next is much better though I'm fetching multiple columns.
I'm running UltraliteJ 12.0.1.3726 on Android OS using Galaxy Nexus device. The schema for the table in question is as follows:
CREATE TABLE "dba"."LIST_DATA_FLAT" ( "LIST_DATA_ID" integer NOT NULL , "LANGUAGE_ID" integer NOT NULL , "VALUE" varchar(512) NOT NULL , "DESCRIPTION" varchar(1024) NULL , "ATTR_0" varchar(512) NULL , "ATTR_1" varchar(512) NULL , "ATTR_2" varchar(512) NULL , "ATTR_3" varchar(512) NULL , "ATTR_4" varchar(512) NULL , "ATTR_5" varchar(512) NULL , "ATTR_6" varchar(512) NULL , "ATTR_7" varchar(512) NULL , "ATTR_8" varchar(512) NULL , "ATTR_9" varchar(512) NULL , "ATTR_OBJECT" long binary NULL , "TIMESTAMP_CREATED" timestamp DEFAULT CURRENT UTC TIMESTAMP NOT NULL , "TIMESTAMP_UPDATED" timestamp NOT NULL , "LISTS_ID" integer NOT NULL , "BACKEND_ID" varchar(128) NULL , "PARENT_ID" integer NOT NULL , "SORT_ORDER" integer NOT NULL , "LEVEL" integer NOT NULL , PRIMARY KEY ( "LIST_DATA_ID" asc , "LANGUAGE_ID" asc ) )
FYI LISTS_ID is a foreign key.
The query that I'm running is this:
select count(distinct(ld.list_data_id)) from list_data_flat ld where ld.lists_id = 384 AND ld.level = 0
Am I reaching the boundaries of Ultralite/UltraliteJ performance with that much data or is there something wrong with my schema?
Any hints/advice is much appreciated,
Ross
Please confirm you posted the correct plan.
I'm guessing it is because of the distinct it has to build a temp table which, given the volume of rows, just beyond what the device can do.
If you wish to investigate this further, I suggest opening a support case.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
An index on lists_id would help.
Execute query just builds the query. The first next() is when it goes through the entire table and does the count. The first next on your top x query just has to get to the first row, which may or may not be real easy to find depending on your data.
You can use dbisql to play with your queries and view the resulting plans. Index scans are good - full table scans not good.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I'm not familiar with Galaxy Nexus device so I can't comment on its performance. How much memory is in use?
The plan does look very good. Perhaps an index on lists_id, level if there are many different levels. Really depends on the distribution of your data. Each extra index slows down insert/update/delete operations and requires more space. Are you using the default max hash size for indexes when you created your database?
The device has 1GB of memory with 300MB free, the app itself uses about 40MB. The hash size is default of 4 - haven't touched that as I'm not sure how it would affect the rest of the db performance. I've tried running the same scenario using Ultralite db on PC (same exact schema) using custom JNI and had the same result so I doubt it has something to do with the device. The table is mostly used for querying with very little updates done to it so I'll try to add the new index for (lists_id,level) and see if it does anything.
I've tried adding the index (lists_id,level) but it did not affect the query performance one bit. The data for lists_id = 384 has level = 0 so there are no extra entries that have any other level. Does it make sense to open a case with SAP/Sybase for this or is it just that my number of rows is a bit much for UlraliteJ?
The page size of the database is 4k. I've run some tests using different types of count queries and got pretty different values depending on the query:
select count(*) from list_data_flat ld where ld.lists_id = 384 and ld.level = 0 >>> 1632 ms select count(ld.list_data_id) from list_data_flat ld where ld.lists_id = 384 and ld.level = 0 >>> 707 ms select count(distinct(ld.list_data_id)) from list_data_flat ld where ld.lists_id = 384 AND ld.level = 0 >>> 7202 msselect count(*) from list_data_flat ld where ld.lists_id = 384 and ld.level = 0 >>> 1592 ms select count(ld.list_data_id) from list_data_flat ld where ld.lists_id = 384 and ld.level = 0 >>> 761 ms select count(distinct(ld.list_data_id)) from list_data_flat ld where ld.lists_id = 384 AND ld.level = 0 >>> 10155 ms
The third query is what being executed right now by the app. Looks like second query is the fastest, I'll have to figure out how to account for non-distinct rows if I were to use it.
Wow I feel like an idiot. My apologies, I've managed to post the wrong schema AND SQL which of course from your context changes things dramatically. I've updated the schema with the proper one, the pk is LIST_DATA_ID and LANGUAGE_ID, that's why I'm using distinct(list_data_id) in my count query.
User | Count |
---|---|
57 | |
10 | |
7 | |
7 | |
6 | |
6 | |
5 | |
5 | |
5 | |
5 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.