cancel
Showing results for 
Search instead for 
Did you mean: 

First line of text file is column names

4,395

It's common for the first line of a comma-separated or tab-separated text file to be the column names for the data that follows, so it's natural to want to take advantage of this in OPENSTRING or LOAD TABLE.

I'm sure that this would have already been considered by the SQLA development folks, so I'm wondering:
- Have you found a reason why this isn't a good idea or not feasible, so don't hold my breath?
OR
- Is this on a long list of "good ideas", but you've been adding more important stuff and you can't do everything all at once, so maybe someday down the road, no promises?

Thanks!

Loyal fan,
Dan K.

Accepted Solutions (0)

Answers (2)

Answers (2)

Former Member

Maybe I am completely off the mark here, but assuming your server is on Windows, could you not create a remote data access server using the Microsoft Access Text Driver as the underlying ODBC driver. I believe the driver will then interpret the first line as the column names and make the appropriate guess as to what datatype the data is. You can then create a proxy table to the csv file and use select * into ... to create the base table with the proper column names, datatypes and data.

Try the following:

1) create a file called foo.csv with the following contents

col1,col2,col3,col4,col5
1,abc,3,4,5
6,def,8,9,10

2) create a dsn called mycsvdsn using the Mictosoft Access Text Driver and set the directory to point to the directory that foo.csv resides in.

3) connect to SA and create a remote server

CREATE SERVER rem CLASS 'odbc' USING 'mycsvdsn'

4) create the proxy table

CREATE EXISTING TABLE foo AT 'rem;;;foo.csv;

5) query the proxy table to make sure the column names and data look correct

SELECT * FROM foo

6) create and load the base table

SELECT * INTO mybasetab FROM foo

7) verify that mybasetab has the correct schema and data

HTH

OPENSTRING WITH and LOAD TABLE (...) serve multiple roles:

  1. identify the columns in the input,
  2. map the columns to the destination, and
  3. provide the schema for the input (implicitly or explicitly).

I'm suggesting that you want to sever the first role from them when input can self-identify its columns.

Case 1:

The input's first row says it has columns InA, InB, InC, InD, not necessarily in that order, and where you don't need InC. You might have

OPENSTRING WITH ( InA CHAR(30), InB CHAR(30), InD CHAR(30)) 
           OPTION(COLUMNNAMES 1 SKIP 1) 

and not care that there's an extra column or if the file actually has the columns in a different order.

Or, your destination table has columns DestP, DestQ, DestR, where InA -> DestQ, InB -> DestP, InD -> DestR. Making up a silly syntax just as an example, you might have

LOAD TABLE DestTbl (DestP COMEFROM InB, 
                    DestQ COMEFROM InA, 
                    DestR COMEFROM InD) COLUMNNAMES 1 SKIP 1

Case 2:

The input file's first row has the same column names as your destination table, but not necessarily in the same order, and it might have extra columns or missing columns. Then, you might want something like a WITH AUTO NAME option, e.g.

LOAD TABLE DestTbl (DestP, DestQ, DestR) COLUMNNAMES 1 SKIP 1 
           WITH AUTO NAME

which won't care about extra columns in the input and will set missing columns to their default.

A capability like this would make external unload/reload much more robust and flexible. The old I-SQL INPUT had some self-defining formats, and though those formats are obsolete, the concept was good. A column name header row is so common in CSVs and TSVs that (humbly, and with great respect) I'd rather not have to go through Microsoft Desktop Driver ODBC support to use them.

[All syntax examples are only to illustrate the point - not actual syntax proposals.]