Connecting Tech Pros Worldwide Help | Site Map
Reply
 
LinkBack Thread Tools Search this Thread
  #1  
Old September 2nd, 2008, 04:54 PM
Newbie
 
Join Date: Sep 2008
Posts: 2
Default Shrinking really big files

Hi Everyone,
when I use streams there's always possibility to add data - so file grows, but is it possible to shrink file by just cutting it's tail ? I have to work with really big files (~2 GB) so it takes time to re-create smaller file. C or C++ solution will do :)
Thank you in advance.
Reply
  #2  
Old September 2nd, 2008, 05:40 PM
Moderator
 
Join Date: Mar 2007
Location: North Bend Washington USA
Age: 68
Posts: 4,928
Default

You may need some database design here.

Suppose your data records are 10K. You could create a file of 10MB to hold 1000 records. Then another file of 10MB for another 1000 records. Call these files segments.

Next you create a file with the names of the segment files and the path and the disc volume identifier. Let's say the full path to the segments is 1K. Therefore, a 10MB file would contain the names of 10000 segments where each sement is 10MB.

You can now manage 10000 10MB files and never have to create or delete any file larger than 10 MB.

Alternatively, the segments are 10K and could be identified with a key. LIke a number inside the segment itself. Now create the segment files with pre-numbered segments. Next, create a file that has the segment number, the filename and the location in that file of the segment. Finally, create a tree (C++ map) that has the segment number as a key and the association record as a value.

Then your application data files would just be files of segment numbers.

To read ther application files, you use the segment number to get the associated filename and location in that segment file. Then you just read/write that location.

You end up with two trees. One has the available segment numbers and the other has the used segment numbers. When the available tree is emtpy your database is full and all of the segment numbers would be in the used tree.

Note that this scheme means your database can span multiple hard drives and you get a truly large database working without ever deleting or creating a file.

I actually used this in a financial institution to manage thousands of mortage loan applications. Once I got the database interface functions working it was pretty quick.

Lastly, is there a reason you can't use a SQL database? It would save all the above trouble.
Reply
  #3  
Old September 2nd, 2008, 06:07 PM
myusernotyours's Avatar
Member
 
Join Date: Nov 2007
Posts: 114
Default

Quote:
Originally Posted by weaknessforcats

I actually used this in a financial institution to manage thousands of mortage loan applications. Once I got the database interface functions working it was pretty quick.

Lastly, is there a reason you can't use a SQL database? It would save all the above trouble.
It will be interesting to know why you coulnd't use an SQL database yourself.
Reply
  #4  
Old September 2nd, 2008, 08:58 PM
Newbie
 
Join Date: Sep 2008
Posts: 2
Default

weaknessforcats:
Thank you very much for the answer. I am doing b-tree based indexing for a database (although not sql) so your approach looks good.
Did you use b-trees? If so, how did you store pages?
I see two possibilities:
- each page as separate file
- whole tree in one large file
Reply
  #5  
Old September 3rd, 2008, 07:26 AM
gpraghuram's Avatar
Expert
 
Join Date: Mar 2007
Location: Chennai
Age: 29
Posts: 1,169
Default

Quote:
Originally Posted by kudlatykot
weaknessforcats:
Thank you very much for the answer. I am doing b-tree based indexing for a database (although not sql) so your approach looks good.
Did you use b-trees? If so, how did you store pages?
I see two possibilities:
- each page as separate file
- whole tree in one large file
Since you have trouble creating a huge file then better have a separate file for each page.


Raghu
Reply
  #6  
Old September 3rd, 2008, 09:46 PM
Moderator
 
Join Date: Mar 2007
Location: North Bend Washington USA
Age: 68
Posts: 4,928
Default

Quote:
Originally Posted by myusernotyours
It will be interesting to know why you coulnd't use an SQL database yourself.
It was 1975 and SQL was not available on the client mainframe.
Reply
  #7  
Old September 3rd, 2008, 10:09 PM
Moderator
 
Join Date: Mar 2007
Location: North Bend Washington USA
Age: 68
Posts: 4,928
Default

Quote:
Originally Posted by kudlatykot
weaknessforcats:
Thank you very much for the answer. I am doing b-tree based indexing for a database (although not sql) so your approach looks good.
Did you use b-trees? If so, how did you store pages?
I see two possibilities:
- each page as separate file
- whole tree in one large file
Pages were stored in text mode in the segment files.

An application record was just segment numbers:

2747945
8392783
3993204

To read data you took the segment number and accessed the tree of used segments. Maybe 2747945 was on volume SEGS123 along the path servername\root\segments\segments2747000.txt. If the segments are 10K, segment 0 would be at seek location 0 (this is segment 2747000). Therefore, segment 2747945 should be at seek location 945 * 10K (assuming 10K segments).

The format of the data in the segment was application independent.

Once the database was generated, segment files were never created or deleted. The trees were re-organized daily in the wee hours.

As I say, once the file handlers were working, I was off to the races and could devote the rest of my time to the client's application.

I'm sure there are better methods today but this is what I did in 1975. I had to manage a 500MB master file when the largest disc drive I had held 7.5MB. Each server supported 8 disc volumes so about 65 servers were used for the application.
Reply
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over 204,687 network members.
Post your question now . . .
It's fast and it's free

Popular Articles