Connecting Tech Pros Worldwide Help | Site Map

Shrinking really big files

Newbie
 
Join Date: Sep 2008
Posts: 2
#1: Sep 2 '08
Hi Everyone,
when I use streams there's always possibility to add data - so file grows, but is it possible to shrink file by just cutting it's tail ? I have to work with really big files (~2 GB) so it takes time to re-create smaller file. C or C++ solution will do :)
Thank you in advance.
Moderator
 
Join Date: Mar 2007
Location: North Bend Washington USA
Posts: 5,366
#2: Sep 2 '08

re: Shrinking really big files


You may need some database design here.

Suppose your data records are 10K. You could create a file of 10MB to hold 1000 records. Then another file of 10MB for another 1000 records. Call these files segments.

Next you create a file with the names of the segment files and the path and the disc volume identifier. Let's say the full path to the segments is 1K. Therefore, a 10MB file would contain the names of 10000 segments where each sement is 10MB.

You can now manage 10000 10MB files and never have to create or delete any file larger than 10 MB.

Alternatively, the segments are 10K and could be identified with a key. LIke a number inside the segment itself. Now create the segment files with pre-numbered segments. Next, create a file that has the segment number, the filename and the location in that file of the segment. Finally, create a tree (C++ map) that has the segment number as a key and the association record as a value.

Then your application data files would just be files of segment numbers.

To read ther application files, you use the segment number to get the associated filename and location in that segment file. Then you just read/write that location.

You end up with two trees. One has the available segment numbers and the other has the used segment numbers. When the available tree is emtpy your database is full and all of the segment numbers would be in the used tree.

Note that this scheme means your database can span multiple hard drives and you get a truly large database working without ever deleting or creating a file.

I actually used this in a financial institution to manage thousands of mortage loan applications. Once I got the database interface functions working it was pretty quick.

Lastly, is there a reason you can't use a SQL database? It would save all the above trouble.
myusernotyours's Avatar
Familiar Sight
 
Join Date: Nov 2007
Posts: 168
#3: Sep 2 '08

re: Shrinking really big files


Quote:

Originally Posted by weaknessforcats


I actually used this in a financial institution to manage thousands of mortage loan applications. Once I got the database interface functions working it was pretty quick.

Lastly, is there a reason you can't use a SQL database? It would save all the above trouble.

It will be interesting to know why you coulnd't use an SQL database yourself.
Newbie
 
Join Date: Sep 2008
Posts: 2
#4: Sep 2 '08

re: Shrinking really big files


weaknessforcats:
Thank you very much for the answer. I am doing b-tree based indexing for a database (although not sql) so your approach looks good.
Did you use b-trees? If so, how did you store pages?
I see two possibilities:
- each page as separate file
- whole tree in one large file
gpraghuram's Avatar
Expert
 
Join Date: Mar 2007
Location: Chennai
Posts: 1,256
#5: Sep 3 '08

re: Shrinking really big files


Quote:

Originally Posted by kudlatykot

weaknessforcats:
Thank you very much for the answer. I am doing b-tree based indexing for a database (although not sql) so your approach looks good.
Did you use b-trees? If so, how did you store pages?
I see two possibilities:
- each page as separate file
- whole tree in one large file

Since you have trouble creating a huge file then better have a separate file for each page.


Raghu
Moderator
 
Join Date: Mar 2007
Location: North Bend Washington USA
Posts: 5,366
#6: Sep 3 '08

re: Shrinking really big files


Quote:

Originally Posted by myusernotyours

It will be interesting to know why you coulnd't use an SQL database yourself.

It was 1975 and SQL was not available on the client mainframe.
Moderator
 
Join Date: Mar 2007
Location: North Bend Washington USA
Posts: 5,366
#7: Sep 3 '08

re: Shrinking really big files


Quote:

Originally Posted by kudlatykot

weaknessforcats:
Thank you very much for the answer. I am doing b-tree based indexing for a database (although not sql) so your approach looks good.
Did you use b-trees? If so, how did you store pages?
I see two possibilities:
- each page as separate file
- whole tree in one large file

Pages were stored in text mode in the segment files.

An application record was just segment numbers:

2747945
8392783
3993204

To read data you took the segment number and accessed the tree of used segments. Maybe 2747945 was on volume SEGS123 along the path servername\root\segments\segments2747000.txt. If the segments are 10K, segment 0 would be at seek location 0 (this is segment 2747000). Therefore, segment 2747945 should be at seek location 945 * 10K (assuming 10K segments).

The format of the data in the segment was application independent.

Once the database was generated, segment files were never created or deleted. The trees were re-organized daily in the wee hours.

As I say, once the file handlers were working, I was off to the races and could devote the rest of my time to the client's application.

I'm sure there are better methods today but this is what I did in 1975. I had to manage a 500MB master file when the largest disc drive I had held 7.5MB. Each server supported 8 disc volumes so about 65 servers were used for the application.
Reply