cancel
Showing results for 
Search instead for 
Did you mean: 

Separating out BLOB data from main DB

thelmacottage
Participant
2,487

We currently have a replicating system. Some of the tables contain columns that store images,PDF's,email messages etc. Subsequently the databases, especially the consolidated DB have become very large (currently 250GB for the consolidated). We are looking at separating out all the BLOB fields into separate areas. This would make tasks such as backing up off-site easier to schedule.

Because we have a replicating system across the WAN storing them as file locations is not really an option - yes we could do this but administration could be a nightmare.

An option is to create a completely new database dedicated to storing the images - so the main DB would have a field 'ImageIndex', this would point to the relevant field in the new IMAGE database . This to would be a replicated DB. The only down side I can see here is that both DB's might not replicate back to their corresponding consolidated DB's at the same time and there could be a data sync issue.

Another option is to use DBSPACES on the current database. I'm not sure how I would do this on the current replicating DB. Would I have to unload the consolidated AND the remote DB's ?

I suppose I could also just create a new table in the current DB just for images - but it would mean the database remains the same size - quite large for backing up (especially to the off-site backup) .

Any suggestions/comments would be welcome.

Accepted Solutions (0)

Answers (2)

Answers (2)

regdomaratzki
Product and Topic Expert
Product and Topic Expert

I agree with the previous responses that a separate DBSapce to hold the BLOB data doesn't really help you, and neither does moving all the blobs into a separate table in the same database.

One possible solution would involve taking all of your BLOBs and putting them in a single table in a completely separate database. Your main database would then define a proxy table to the table in the blob database that holds all the BLOB data. As you point out, this does lead to the potential that your main database and the blob database could be out of synch. You'd need to add code to your application to handle this situation, particularly when the main DB is ahead of the blob DB. After breaking out the BLOBs to another database you then have two options :

1) Each remote location would now have two databases, that bother replicated separately.

2) Because your replication is over a WAN, you could consider having a single BLOB database running as a network server that all your remote sites have access to (still via a proxy table). Whether this is a viable option depends on how often you are accessing the blobs, the size of the blobs, and what the performance is like over the WAN.

How you would migrate to either of these environments from your current single database with as little disruption as possible is also an interesting project. I can give my thought on how that could be done if it's of interest.

Reg

thelmacottage
Participant
0 Kudos

Thanks Reg - This separate DB for the BLOB's could be the answer - I would also consider using the compressed option for the BLOB field as mentioned by Breck. I have lot's of different types of BLOB to store including images (JPEG), Email messages (.EML/.MSG) , Wordprocessing documents, and possibly some video (future development). Our WAN has quite good performance but I would need to do some tests on the single (non-replicated) DB option.

Yes, I think the migration might need some thought (I'd be interested to hear yours) - I would first just move everything to a new table in the current DB - when all this is working with our application then move the table into the new DB ? I could then remove all the 'old' BLOB fields and re-build the DB, this should shrink it down a bit.

regdomaratzki
Product and Topic Expert
Product and Topic Expert
0 Kudos

How you do the migration depends on the answer to this question :

Can you pick a certain date and say the old system no longer works you have to do X to get things to work now? If the answer to that is yes, migration is easy. If the answer to that is no (it usually is), then there is more work involved, as we need to make sure that both the single-DB and multi-DB system will work at the same time.

Reg

regdomaratzki
Product and Topic Expert
Product and Topic Expert

In general, here's how I would do the migration to minimize the effect on existing users and allow you to have both the old and new schema active at the same time.

  1. Stop dbremote from running on the consolidated and ensure that you are the only connected user to the consolidated database during steps 2-10. I'm not 100% sure it's necessary, but I'm very paranoid, and being the only connected user will definitely avoid possible concurrency issues while you change the schema of the consolidated.
  2. For each table that includes a blob column, alter the publication such that the table definition includes all the columns of the table that you are replicating, not the entire table.
  3. Create the new blob table in the consolidated database. Do NOT add it to any publication at this time.
  4. For each table that includes a blob column, add a new column to the table that will eventually have a foreign key relationship to the new blob table. Because you've altered the table definition in the publication to a list of columns that does not include this new column in step 1, data in this new column will not be sent to existing remote database.
  5. For each table that includes a blob column, populate the new blob table with the blobs that currently exist in the table, change the new column on the table to reference the primary key of new row you just inserted into the blob table, and then create a foreign key on the table to the new blob table when all the rows have been added.
  6. For each table that includes a blob column, write triggers so that changes to the blob column in the table (insert, update + delete) are also reflected in the new blob table
  7. Write triggers on the new blob table (insert, update + delete) so that changes to the new blob table are reflected back in the original table. Aside : I know what you're thinking. I've just doubles the size of my consolidated and created a whole pile of triggers to maintain. This is harder to maintain now than when we started this whole process. Patience, there will be short term pain for long term gain.
  8. Create two new publications in the consolidated, one called mainPUB and one called blobPUB. These two publications will define the schema of the two databases for you new schema. Ensure that the mainPUB schema does NOT include the blob columns one the base tables, and blobPUB will likely just include the new blob table.
  9. Create two remote users called mainUSER and blobUSER and subscribe them to the two new publications.
  10. Extract remote database for these two new remote users. You now have two databases will your new schema and data, and only one copy of the blobs in these new database, in the new blob database. Changes made to the tables in these two new databases will be sent up the the original consolidated database, and changes made in the original consolidated (either directly or from all your remote users) will be sent to the new databases. These two new databases are now your NEW consolidated databases for your new schema. Create new remote users on these databases and extract new database for users as they migrate to the new schema. You now have both the old and new schema active at the same time.
  11. Once all the old schema databases have been migrated to the new schema, you can dispose of the original consolidated database. Ensure that all the changes in the old consolidated have been sent to the new consolidated databases, and then take it offline (forever). Remove references to the consolidated users in the new consolidated databases that reference the old consolidated database.
  12. Celebrate a successful migration with a beverage of your choosing.

Reg

regdomaratzki
Product and Topic Expert
Product and Topic Expert
0 Kudos

PS: Sorry for the late reply. I'm only working part time remotely right now, and my hours vary greatly. Hopefully those interested are still following this thread.

VolkerBarth
Contributor
0 Kudos

While this forum has been rather quiet (and boring IMHO...) the last weeks, thanks for sharing those very interesting, detailed and smart steps, Reg.

VolkerBarth
Contributor
0 Kudos

IMHO using dbspaces won't reduce backup time as the separate dbspaces must still be backed up with the main database. Unless there's some advantage because of location on different physical drives (rare nowadays in the times of SAN, RAIDs and VMs), I see more disadvantages than advantages. However, to do so, you would usually need to rebuild the cons, which should work fine with remotes as long as you care for the desired precautions like keeping log offsets accordingly... AFAIK there are some FAQs here dealing with adding and dropping dbspaces.

Breck_Carter
Participant
0 Kudos

I agree... to put it bluntly, dbspaces aren't a solution to anything, not in the year 2017 🙂