473,769 Members | 1,674 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Sorting records using sort()

I want to sort a set of records using STL's sort() function,
but dont see an easy way to do it.

I have a

char *data;
which has size mn bytes where m is size of the record and
n is the number of records. Both these numbers are known
only dynamically. I have a function less_than that can compare
two records of size m given the pointers to the two records.

Is there an easy way to call STL sort() on this data and sort it.
The data is big and I do NOT want to allocate a list of pointers
of size n or anything linear in size. Assume that except the data,
we do not have much space...

I thought of tricking sort() using a dummy Record class that is
templated using the size of the record...But since m can change
dynamically this doesnt work.
Thanks in advance for your comments,
--Elijah

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.m oderated. First time posters: Do this! ]
Jul 22 '05
40 4325
In message <3f************ **********@read ing.news.pipex. net>, Stephen
Howe <NO**********@d ial.pipex.com> writes
If the idea is to sort more data than fits in memory, it has been done. In
such a situation, you read as much as possible in chunks, sort, and write to
temporary files.
Later you perform an n-way merge of the temporary files, writing the final
result o some output file. All in Knuth.


Even here index files are often used and when the data has been
completely processed the data file is re-written in sorted order. In
many systems writing is much more expensive than reading and so we gain
by keeping it to a minimum.

Analysing performance on systems with mixed memory speeds (all the way
from level 1 cache through to disc or even tape) does not respond well
to big 'O' formulae. Worse, it is very sensitive to the resources
currently available (I can still remember the problem I had that
resulted from a client having used a single directory on a MSDOS machine
for all his word-processing files for three years -- he had never
deleted anything and he had set the application to keep two back-ups. It
took six hours just to remove the second back-ups, and almost 24-hours
for a batch file to reorganise the directory into 26 sub-directories on
the basis of the first letter of the file names)

Even though all your memory-devices may be random access the difference
in speed may still favour treating the slowest as if it were a serial
access device.
--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
or http://www.robinton.demon.co.uk
Happy Xmas, Hanukkah, Yuletide, Winter/Summer Solstice to all.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.m oderated. First time posters: Do this! ]
Jul 22 '05 #31
Elijah Bailey wrote:
http://home.comcast.net/~jeffrey.schwab/code/records/

I tried compiling it with g++ 2.96 and 3.1...both dont compile the code...
Thanks for the code thou...wud be nice if I could test it on my linux box.
--Elijah


What's the error? It works fine on Mandrake with gcc 3.3.

Jul 22 '05 #32
Stephen Howe wrote:
It is tough. sort() is templatised which means it _has_ to determine the
size of its elements at compile-time rather than run-time.


std::sort is templatized on an iterator type. "All" you need to do is
provide a random access iterator that knows how to deal with the array
of bytes in question. The value type is going to need to be a proxy
class for dealing with that underlying memory. And that size of an
element controlled by such an iterator certainly does not have to be
part of its type.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.m oderated. First time posters: Do this! ]
Jul 22 '05 #33
In message <10************ **@master.nyc.k bcfp.com>, Hyman Rosen
<hy*****@mail.c om> writes
Stephen Howe wrote:
It is tough. sort() is templatised which means it _has_ to determine the
size of its elements at compile-time rather than run-time.


std::sort is templatized on an iterator type. "All" you need to do is
provide a random access iterator that knows how to deal with the array
of bytes in question. The value type is going to need to be a proxy
class for dealing with that underlying memory. And that size of an
element controlled by such an iterator certainly does not have to be
part of its type.


And the real benefit from using this approach is that it allows you to
use many other algorithms (and in the context of sorting it allows you
to use stable_sort and partial_sort).

I doubt that this kind of iterator is worth it for the Standard Library,
but if you use flat (probably fixed record width) databases and mmap it
could be worth the effort. However if you are concerned with performance
(in terms of speed) I suspect you would be better off looking at the
specialised sorts for data held in serial access storage.
--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
or http://www.robinton.demon.co.uk
Happy Xmas, Hanukkah, Yuletide, Winter/Summer Solstice to all.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.m oderated. First time posters: Do this! ]
Jul 22 '05 #34
On Tue, 06 Jan 2004 15:46:26 -0500, Jeff Schwab wrote:
http://home.comcast.net/~jeffrey.schwab/code/records/


Very nice! Shows the power of the STL with user defined iterators.

So in the end, it was not that difficult to do. Nobody really commented if
this would be 100% complient, but I doubt it isn't. I retract my earlier
comment on it being a difficult technique, even though I described what
you coded.

M4

Jul 22 '05 #35
Martijn Lievaart wrote:
On Tue, 06 Jan 2004 15:46:26 -0500, Jeff Schwab wrote:

http://home.comcast.net/~jeffrey.schwab/code/records/

Very nice! Shows the power of the STL with user defined iterators.


The STL has been very good to me.
So in the end, it was not that difficult to do. Nobody really commented if
this would be 100% complient, but I doubt it isn't. I retract my earlier
comment on it being a difficult technique, even though I described what
you coded.


It should be compliant, but it ain't standard-library quality. :) At
any rate, there's a lot of potential for misuse. I think the fact that
you and I had about the same idea of how this should be done does
suggest that it's a reasonable approach, though.

Jul 22 '05 #36
On Sun, 11 Jan 2004 08:26:32 -0500, Jeff Schwab wrote:
So in the end, it was not that difficult to do. Nobody really commented if
this would be 100% complient, but I doubt it isn't. I retract my earlier
comment on it being a difficult technique, even though I described what
you coded.


It should be compliant, but it ain't standard-library quality. :) At
any rate, there's a lot of potential for misuse. I think the fact that
you and I had about the same idea of how this should be done does
suggest that it's a reasonable approach, though.


There has been a lot of talk about iterators and proxy objects in the
past. std::vector<boo l> showed that it is impossible to write a
STL-conforming container that uses proxy objects. Unfortunately, I'm a
little unclear on the details and my Meyers books seem to have gone of to
outer space somewhere.

Now what was it again why vector<bool> is not compliant? Does it apply to
this situation? Anyone knows?

M4

Jul 22 '05 #37
ge******@hotmai l.com (Elijah Bailey) wrote
I want to sort a set of records using STL's sort() function,
but dont see an easy way to do it.


I really do think that qsort() is the way to go.
However, here's what you asked for.

First, you need a strucure to sort.

// Structure needed for sort.
template <int size>struct Record {
char dat[size];
bool operator<(const Record<size>&rh s) const
{ return less_than(dat,r hs.dat); }
};

On some systems (or with some compiler settings!), this might only work
correctly if size is a multiple of some padding value. I leave you to
figure out how to deal with this.

The sort itself is trivially easy.

// The sort itself.
template<int size>void dosort() {
Record<size> *dat = reinterpret_cas t<Record<size>* >(data);
std::sort(dat, dat+n);
}

Unfortunately, you can't just call
dosort<m>
because m isn't a compile-time const. This makes sense if you think
about it -- your comments about speed are correct, but what you're doing
is trading speed against code size. The code is expanded in-line, which
means it needs to understand the recordsize before you begin. So, in
order to call it you're going to need a constant. Here's the only way I
can think of to convert a variable into a compile-time const.

// The only way to call it!
void callsort() {
switch (m) {
case 4: dosort< 4>(); break;
case 8: dosort< 8>(); break;
case 12: dosort<12>(); break;
case 16: dosort<16>(); break;
case 20: dosort<20>(); break;
// ... etc...
default: throw "Missing case!";
}
}

You're going to need a case for every possible value of m.

If you test the above code with Microsoft Visual C++ 6.0, you're going to
get an error. This is caused by a compiler bug; if the only place we
instantiate Record<size> is within dosort<size>, the compiler gets it
very wrong (in this case, it will silently always use Record<20>). The
solution is to instantiate Record<size> in callsort, and pass it to
dosort:

template<int size>void dosort(Record<s ize>unused) {
Record<size> *dat = reinterpret_cas t<Record<size>* >(data);
std::sort(dat, dat+n);
}

// The only way to call it!
void callsort() {
switch (m) {
case 4: dosort(Record< 4>()); break;
case 8: dosort(Record< 8>()); break;
case 12: dosort(Record<1 2>()); break;
case 16: dosort(Record<1 6>()); break;
case 20: dosort(Record<2 0>()); break;
// ... etc...
default: throw "Missing case!";
}
}

By explicitly instantiating every Record<n> size, we bypass the Microsoft
compiler bug. Note: I only tested this with values that were a multiple of
4. I don't know if it would work for odd lengths or not (this would be
compiler-dependant anyway).

I think that by now you can see the problem with this approach: code bloat.
(Not talking about source code, but generated code!) All of the in-line
logic to do the sort is being instantiated 5 times. If m can have 200
possible values, the problem is 40 times worse. If m can have 5000 possible
values, you may find that your program is MUCH bigger than the database
you're trying to sort. You mentioned that you were low on space. Even if
you do have enough RAM for this, depending on how your system works,
loading all of this code into memory might actually make your program
seem sluggish.

Have you considered the many virtues of qsort()? It *is* part of C++, and
the specs match your situation so closely, it's as if they custom wrote it
for you.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.m oderated. First time posters: Do this! ]
Jul 22 '05 #38
ka***@gabi-soft.fr wrote in message news:<d6******* *************** ****@posting.go ogle.com>...
ge******@hotmai l.com (Elijah Bailey) wrote in message
news:<e0******* *************** ****@posting.go ogle.com>...
> Here's the real problem that I need it for (Which incidentally I never
> mentioned) Consider a database of fixed record size (Because doing
> variable record size is already pain, even if you had everything in
> memory). Now if you had a "STL" way of doing this, then you could run
> all "STL" functions on this database compared to reading the file,
> doing the same thing and writing it. (Is that faster? Yes, indeed it
> is. Benchmark an mmap compared to fopen/read on ur machine, and u wud
> hopefully agree)


Do you know the structure of this data at compile time? If so, there is
no problem creating the necessary PODs and iterators, and mapping them
to the data. Theoretically, you can't guarantee the lack of padding,
but it is a technique I've used once or twice, it generally works, and
if you are using mmap, you aren't very portable anyway.


I know I'm coming in a late here, but I managed to implement something
which sounds somewhat similar to the problem being described:

- First, I implemented a mmapped file array class after reading an
article by Matt Austern on the subject (the link I have for it,
http://www.cuj.com/experts/1907/aust...?topic=experts, is now
broken, apparently), rendering the mmapped implementation at least
somewhat portable by hiding it behind an interface.

- Then I created a Table class to parse out the fields and records
from a particular flat-file relational database table format which
uses the mmapped file under the hood.

- Then I created a templated Iterator class that uses an adapter
function to convert each record in a Table to the desired view, as it
were.

- Voila, now I can use mmapped files with the STL.

Granted, I'm using mostly lower_bound() and equal_range(), as the data
I deal with is read-only; but I imagine crafting an iterator to write
to fixed-length records to use with sort() and other mutating
algorithms wouldn't be such a big leap.

And, incidentally, I wonder if my Iterator doesn't just replicate
functionality in Boost, but I've yet to play with Boost that much yet
(though I've been reading up on it) and I doubt it would work in our
baseline, as we're bound to use Sun CC 5.3 for some time.

Hope this helps,

Mike

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.m oderated. First time posters: Do this! ]
Jul 22 '05 #39
On Mon, 12 Jan 2004 15:05:13 -0500, Mike Bland wrote:
Granted, I'm using mostly lower_bound() and equal_range(), as the data
I deal with is read-only; but I imagine crafting an iterator to write
to fixed-length records to use with sort() and other mutating
algorithms wouldn't be such a big leap.


Remember that std::sort stores additional copies, so you'll have to create
proxy objects. Someone posted an implementation, but I cannot find it
right now (this thread has jumped over multiple groups, so it might be in
acllc-c++ or clc++m).

HTH,
M4

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.m oderated. First time posters: Do this! ]
Jul 22 '05 #40

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
1422
by: Lad | last post by:
Hi, I have a file of records of 4 fields each. Each field is separated by a semicolon. That is Filed1;Ffield2;Field3;Field4 But there may be also empty records such as ;;;; (only semicolons).
9
1654
by: SASRS | last post by:
Here is my code. I am trying to see if I can put a %@ LANGUAGE="VBSCRIPT" %> <% FieldA = Request.QueryString("A") FieldB = Request.QueryString("B") FieldC = Request.QueryString("C") response.write("<TABLE ALIGN=CENTER>")
8
5264
by: Mike | last post by:
Hello, I have a few rather urgent questions that I hope someone can help with (I need to figure this out prior to a meeting tomorrow.) First, a bit of background: The company I work for is developing a web-based application, one part of which involves allowing the user the ability to page through transaction "history" information. The _summary_ history table will have the following fields: ServiceName, Date, User-Ref1, User-Ref2,...
2
3870
by: Todd | last post by:
Hi. I want to sort the records on my form (using either a continuous form or a datasheet) by the unbound "description" column in a combo box on the form (or in the datasheet.) Here's a rough text representation of what I'm talking about FORM Item Number Description Category (text box) (text box) (combo box - 2 columns)
3
4037
by: Neil Hindry | last post by:
I wonder if you can help me. I have setup an address-book database in Access XP. I have the first name & surname as separate fields. As I wanted to sort my database by surname and then by first name I had surname before first name when I created the fields of my database.. To do the sort (in table view) I highlighted the two columns (fields), in this case surname and first name, and selected sort. Access then sorted the database by...
8
4150
by: Matthew Curiale | last post by:
I am creating an app that lists clients of a company for management of different attributes for that company. The first page is a listing of the companies currently in the database. I have my repeater working, and paging/sorting works, but there is a small bug that I can't seem to figure out. If, for example, I display 5 records per page, the paging function executes fine, and only shows 5 records per page. This is no problem. When I...
0
2580
by: SvenMathijssen | last post by:
Hi, I've been wrestling with a problem for some time that ought to be fairly simple, but turns out to be very difficult for me to solve. Maybe someone here knows the answer. What I try to do is sort the records in a plain-text index file based on certain columns. The index file consists of records and fields within the records. The individual fields are separated by semicolons, the records by newlines. The index file is loaded into memory...
3
7346
KevinADC
by: KevinADC | last post by:
If you are entirely unfamiliar with using Perl to sort data, read the "Sorting Data with Perl - Part One and Two" articles before reading this article. Beginning Perl coders may find this article uses unfamiliar terms and syntax. Intermediate and advanced Perl coders should find this article useful. The object of the article is to inform the reader, it is not about how to code Perl or how to write good Perl code, but to teach the Schwartzian...
1
2967
by: dorandoran | last post by:
The sort on the childgrid is not working; nothing happens when I click on the each column header for sort. (I followed Satay's sample: http://www.codeproject.com/KB/aspnet/EditNestedGridView.aspx) and i am using object for datasource. please suggest. = = = default.aspx = = <%@ Page Language="C#" AutoEventWireup="true" CodeFile="Default.aspx.cs" Inherits="_Default" %> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"...
0
9589
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
1
9994
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8870
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7408
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6673
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5447
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3958
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3561
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2815
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.