473,386 Members | 1,867 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Is there any standard way to sort a disk file?

I have used the standard template library quite a bit, and std::sort has been
very useful. Now I have reached the point where my memory requirements are
exceeded. Is there some standard way to sort a binary file with a specifically
defined structure?

The only thing that I could think of would be to std::sort portions of this file
in memory, and then later manually merge these sorted portions together using a
home brewed merge sort . Does anyone have any better ideas?
Jul 19 '06 #1
5 2349
* Peter Olcott:
[bad cross-posting]
Please don't cross-post to [clc++] and Microsoft-specific groups.

I have used the standard template library quite a bit, and std::sort has been
very useful. Now I have reached the point where my memory requirements are
exceeded. Is there some standard way to sort a binary file with a specifically
defined structure?
Yes, COBOL does that.

That does not mean you should cross-post to [clc++] and [comp.lang.cobol].

The only thing that I could think of would be to std::sort portions of this file
in memory, and then later manually merge these sorted portions together using a
home brewed merge sort . Does anyone have any better ideas?
Follow-ups set to the two Microsoft groups.

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Jul 19 '06 #2
Peter Olcott schrieb:
I have used the standard template library quite a bit, and std::sort has been
very useful. Now I have reached the point where my memory requirements are
exceeded. Is there some standard way to sort a binary file with a specifically
defined structure?
Load the key and some index as pair into memory and sort it by key.

After that, you can build a new file from the old file in the right
order using the index.

--
Thomas
Jul 19 '06 #3
On Wed, 19 Jul 2006 11:42:02 -0500, "Peter Olcott" <ol****@att.net>
wrote:
>I have used the standard template library quite a bit, and std::sort has been
very useful. Now I have reached the point where my memory requirements are
exceeded. Is there some standard way to sort a binary file with a specifically
defined structure?
http://www.informatik.hs-bremen.de/~brey/stlbe.html

Chapter 10 'External sorting' may be helpful.

Best wishes,
Roland Pibinger
Jul 19 '06 #4

"Thomas J. Gritzan" <Ph*************@gmx.dewrote in message
news:e9**********@newsreader2.netcologne.de...
Peter Olcott schrieb:
>I have used the standard template library quite a bit, and std::sort has been
very useful. Now I have reached the point where my memory requirements are
exceeded. Is there some standard way to sort a binary file with a
specifically
defined structure?

Load the key and some index as pair into memory and sort it by key.

After that, you can build a new file from the old file in the right
order using the index.

--
Thomas
That won't work in my case because all the data is used as the key.
Jul 19 '06 #5
In article <rhtvg.272$5H.39@dukeread06>, ol****@att.net says...
I have used the standard template library quite a bit, and std::sort has been
very useful. Now I have reached the point where my memory requirements are
exceeded. Is there some standard way to sort a binary file with a specifically
defined structure?

The only thing that I could think of would be to std::sort portions of this file
in memory, and then later manually merge these sorted portions together using a
home brewed merge sort . Does anyone have any better ideas?
Yup, that seems pretty much correct. A few points: first of all,
std::merge may simplify writing the merge portion of the sort. It'll
handle most of the details of an individual merge run, so from there
it's mostly a matter of deciding what merge pattern you want.
Traditionally, the polyphase merge was generally considered the best,
but I can't say I've done any recent testing to see how well it does
for hard-drive based sorting. My immediate reaction is that it'll
probably do quite well, but may not improve efficiency enough to be
worth the trouble.

To create your sorted runs, you're probably better off using a
priority_queue than reading data into something like a vector,
sorting it, and then writing it out. In particular, a priority_queue
will typically allow you to create longer runs in your initial sorted
files -- as you write out each record, you can read in another from
the original input. As long as it doesn't precede those that have
already been written out, you can add it to the priority queue, and
from there it'll end up in the same file. On average this can nearly
double the length of the initial runs, which generally helps quite a
bit.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jul 20 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Lad | last post by:
What is the best( easiest)way how to sort a file? I have a file where each record consists of 3 fields( 3 words) and I would like to sort records by the first field( word)in each record. Any idea?...
0
by: Reuben Pearse | last post by:
Hi all, I ran the myisamchk tool with the options --sort-index --sort-records=1 against an MYI file. Is there something I can use to confirm what this command did? I would like to see what the...
5
by: sathyashrayan | last post by:
Friends, As I was going through the standard draft of C (C99) I come across the header file <time.h>. I wonder why the standard has included the header file since it depends on the hardware that...
6
by: Alan Wang | last post by:
Hi All, My application puts standard output from command line(using process.start()) into a file on the hard disk then reads the info from that file after command has finished. The problem is...
22
by: David T. Ashley | last post by:
I'm writing code which will compile on multiple Windows and multiple *nix platforms. Some will have 64-bit integers, I believe. What is my best option to define platform-independent types such...
22
by: Chris Thomasson | last post by:
I am thinking about using this technique for all the "local" memory pools in a paticular multi-threaded allocator algorithm I invented. Some more info on that can be found here: ...
270
by: jacob navia | last post by:
In my "Happy Christmas" message, I proposed a function to read a file into a RAM buffer and return that buffer or NULL if the file doesn't exist or some other error is found. It is interesting...
20
by: Peter Olcott | last post by:
Is there any standard C++ way to determine the size of a file before it is read?
5
by: Timothy Madden | last post by:
Hello I see there is now why to truncate a file (in C or C++) and that I have to use platform-specific functions for truncating files. Anyone knows why ? I mean C/C++ evolved over many years...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.