At one of our long term client sites the production data warehouse server was running out of space and for various technical and political reasons the disk space could not be extended quickly or easily.
The IT manager was looking into all the options which were either too expensive or would not be completed before the production server ran out of space and wasn’t sure what other quick options were available.
I suggested I could extract the data that was older than two years from the 15 largest tables and store on an external USB drive and also copy this data to an additional USB drive once all historic data had been extracted.
The plan was given the go ahead and I went out and purchased two 1TB external USB drives for around $100 each.
The process I used was:
Use database utility (BCP for SQL Server in this case) to extract data from the fifteen largest tables to text files for historic data only.
Take a copy of the DDL of the tables as they are now in case the data ever needs to be reloaded.
Take a copy and edit this DDL to create copies of all the tables with a suffix of “_ARCHIVE”
Use database utility (BCP for SQL Server in this case) to reload the data files extracted into the “_ARCHIVE” tables to ensure data extracted will reload (this will highlight any problems with commas, pipes etc and will ensure your delimeter is correct).
Once all extracted data files have loaded successfully, map a drive to the production server from your local machine and copy to 1st external USB storage device along with each table DDL, unload script, load script in separate folders.
Once all these have copied successfully, copy all files again to the second external USB storage device from the 1st external USB storage device.
Once all these have copied successfully, carefully delete the data that was archived from the production tables and shrink the database if required.
Store the external USB storage devices in separate locations in client site machine rooms and/or off site safe locations.
The whole process only took a couple of days and can also be run as a background task. A year later I repeated this exercise and was able to re-use all the load/unload scripts with a change to the data values and output filenames.
So, for only $200 and a couple of days of my time, the client has ended up with a rapid data warehouse archive process that is also mirrored and the data can be reloaded with confidence at any time if required.