on 2012 Apr 28 10:01 AM
Fairly often, a customer will ask me to refresh their Test database with Production. Sometimes, if the customer is performing certain tests using data stored only in the Test database, I will need to manually input data from the Test application to the Test database, after it has been refreshed with Production. It can be difficult to know for certain, all the data which needs to be contained within the newly refreshed Test database. Does anyone know a method that I might use to quickly compare the data in a certain subset of tables? I am thinking about a procedure which will output the data of approximately a dozen tables to one test file. Prior to refreshing Test, I would run this procedure against both Test and Production and use a "vdiff" type utility to compare text in the two output files. Currently, I have unload statements which output data from the target tables to individual files. Does anyone know a similar process which would result in a single file for each database? Thanks.
Request clarification before answering.
We have to do this kind of compare on a regular basis to test the results of complex algorithms in our application software. We also have recovery situations where we selectively restore data.
There are two methodologies that we use.
We unload one database's tables using the unload statement from SQL Central - this with the unload method that creates the tables on reload. We then edit the table names to be recreated on the load (e.g. table name > tablename_test) We then execute the load script which populates the test data in the database. We can then do various NOT EXISTS queries or variation compares on column values for matching keys between tablename and tablename_test. Note that you can you can use the MERGE Statement to insert rows from one into the other based on conditional tests (MERGE is a fantastic tool for this stuff and modifying the data in the source (insert) tables gives you a lot of control over the final result). I note that in your case you might be going from production to test - we have had to do a similar process when we needed to selectively restore information from a historical backup.
For prolonged complicated sets with lots of columns (some containing text) we export the tables into two excel spreadsheets (use the new version of excel that can handle a large number of rows). We then use an Excel Compare software that does a fantastic job of identifying and highlighting differences - including Inserts and Deletes.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You can use the UNLOAD SELECT statement to output the result of a query which (presumably) will join all of the necessary tables so that the output is a single file.
This approach will work well only if you have a star-schema (or something close) that will limit the size of the output file and, more importantly, make it easier to determine where the missing/additional data is coming from. My point is that if you're joining ten tables, and then compare the result from two databases, it can be considerably harder to pinpoint the missing/extra data if the output file differs by tens or hundreds of thousands of rows - all because 5 rows were missing from a single table.
Another way of doing the comparison is to set up proxy tables to the other database, and then compose a FULL OUTER JOIN between the like tables in each database. This avoids the need to export the database contents to a flat file, but will not perform as well for large datasets.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
61 | |
8 | |
7 | |
6 | |
6 | |
4 | |
4 | |
4 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.