on 2010 Jan 22 9:00 AM
How do I import data from a file while skipping existing rows?
Once in a while I have to import data from a file into a table, where some rows may already exists (row with same primekey and same or diffent contents).
For an INSERT statement I can specify ON EXISTING {SKIP | UPDATE}.
Using temporary tables I can combine INPUT INTO or LOAD TABLE with an INSERT statement, but that's an additional overhead I'd like to avoid.
So I'd like to see LOAD TABLE and / or INPUT INTO to be enhanced with the same option.
Thank you for your suggestion. We have previously discussed this issue within engineering - that is, handling of errors while loading data - and have this request listed as an enhancement to be consider for a future release.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Just for the record:
v16 has introduced according options for LOAD TABLE to handle errors, from the docs:
ALLOW ( integer | ALL | NO ) ERRORS clause This clause can only be specified once for the statement. The default value for this clause is 0, which means that a violation generates an error and the statement is rolled back. If you specify an integer, n, then on error n+1 the database server rolls back the statement. The values ALLOW NO ERRORS and ALLOW 0 ERRORS are equivalent. This clause allows the database server to set problematic data aside and progress with the load operation.
The database server reports the last error that was encountered to the user, and this error is also logged to the MESSAGE log. Rows that are written to the ROW log can be changed and used as input to a subsequent LOAD TABLE statement.
If a ROW LOG is written to a database server or client file, its contents are written in the same character set as the original input file. If a MESSAGE LOG is written to a server or client file, its contents are written in the client's language and in the client connection's CHAR character set. If a ROW or MESSAGE LOG is written to a CHAR or NCHAR variable, it is written in the CHAR or NCHAR (respectively) character set.
So these three new options ALLOW n ERRORS, MESSAGE LOG and ROW LOG should help to identify (and correct) invalid rows. Note, that's not the same as an ON EXISTING SKIP but should do as well for any kind of errors.
AFAIK, no similar change has been made for the INPUT statement.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I think that a good way to achieve what you ask for in existing versions of the server would be to use the OPENSTRING construct in an INSERT statement:
INSERT INTO T_INS
ON EXISTING UPDATE
SELECT *
FROM OPENSTRING( VALUE '1,1\\n2,3')
WITH( TABLE T_INS ) O;
The OPENSTRING construct is the same code used to scan files for the LOAD TABLE statement and the INSERT will operate mostly the same way that LOAD TABLE does. There are differences, such as the automatic commit used for LOAD TABLE, the way that rows are logged, the checkpoint before and optionally after the statement, and the way that computed columns, check constraints, or triggers are handled. There may be some other differences I have not recalled. Using the OPENSTRING in an INSERT or even MERGE statement gives you the full flexibility of inserting and also allows you to use conditions in the WHERE clause to identify problematic rows (for example, not matching the data type or not matching foreign key declarations).
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
As addition to Reimer's proposal:
Besides ON EXISTING, there might be other reasons for failure:
I found it sometimes difficult to track down such errors, particularly when they only show up in some rows of a huge file.
One (uncomfortable) way I have used is to load into a local temp table with just varchar data to find out the failing values, and then to check the column values.
LOAD TABLE is fine and fast IMHO, but looking for unfitting values seems hard.
So a "LOAD TABLE ... [STOP | SKIP] ON ERROR" (instead of a rollback as it does IIRC) would be useful. I'm not sure if it currently lists the according line number - that would otherwise be of help, too.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
cf. the response with John Smirnios in the NG: news://forums.sybase.com:119/4b59e12f$1@forums-1-dub
User | Count |
---|---|
71 | |
10 | |
8 | |
7 | |
7 | |
6 | |
6 | |
6 | |
6 | |
5 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.