cancel
Showing results for 
Search instead for 
Did you mean: 

SQLA 12 for datamart / warehouse. Is this method going to cost us?

1,990

We're experimenting with developing our own 'data warehouse' with SQL Anywhere (v 12 right now) using sqla native sql for our ETL process. We looked into using the Infomaker pipeline, but feeling nervous about the future of the product so we wrote our own "pipeline". Our database is growing quite large, and I'm not sure if it's totally because we continue to add new data to the db as we develop or if part of it is because of the method we are using each night to refresh. Basically, this is our "pipeline". Loop through a table on the warehouse containing names and locations of tables to be refreshed:

  1. Drop table if exists [proxy table or view name name] from remote database.
  2. Create existing table [proxy table or view name] at [remote name].
  3. Truncate 'local' warehouse table.
  4. Insert into warehouse table from proxy table.
  5. Drop proxy table.

We assumed that more often that not, empty space from the truncated table would be used but is it possible that it's looking for new space with each new run? We're up to 10gb but actually that is coming from 6 other databases and I'm not totally shocked at that size. The process runs quite fast so we're happy with the performance, but don't want to fill up the server unnecessarily. Of course we will continue to fine tune what data we're bringing over but if this is creating bad use of space, we may need to re-think our pipeline process. tia.

Becky Snyder Bradley University

Accepted Solutions (0)

Answers (1)

Answers (1)

VolkerBarth
Contributor

So basically the question is whether a daily repetition of truncate table/insert statements will lead to undesired fragmentation? - If so, I would think that SQL Anywhere should reclaim most of the free space from deleted rows for new ones. Of course you can monitor that with builtin tools like the sa_table_page_usage system procedure and/or properties like "FreeSpace" and the like.


FWIW: You might also be able to store the remote data in a local not transactional temporary table and then use a MERGE statement to update the "real" warehouse table. That way you might have much less modifications on the real table (preventing fragmentation in the system dbspace) at the cost of a possibly more intense usage of the temporary dbspace...

IMHO, it's a common approach to store remote data in a temporary table before it is applied/joined with local data simply because joins etc. on remote data are often very slow and inefficient. If that is the main reason you have chosen to always truncate and re-insert remote data, I'd guess a "temporary local copy" would be more efficient.


In case the remote data is coming from different database systems, you might also consider the "data import from MS SQL Server test series" Breck has published on his great blog a while ago:

In-Memory Downloading from SQL Server (5)

Yes, I should have specified that some data is coming from different database systems. Thanks for your comments and suggestions. Off to find Breck's blog entry!

VolkerBarth
Contributor
0 Kudos

FWIW, in case comparing "old" and "new" data from remote systems is part of your ETL steps, the following questions might be helpful:

VolkerBarth
Contributor
0 Kudos

Off to find Breck's blog entry!

Just in order to prevent a - probably unlikely - misunderstanding: The link to Breck's blog series is contained in the answer above and the linked article leads to its prequels...