473,405 Members | 2,261 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

Sorting records using sort()

I want to sort a set of records using STL's sort() function,
but dont see an easy way to do it.

I have a

char *data;
which has size mn bytes where m is size of the record and
n is the number of records. Both these numbers are known
only dynamically. I have a function less_than that can compare
two records of size m given the pointers to the two records.

Is there an easy way to call STL sort() on this data and sort it.
The data is big and I do NOT want to allocate a list of pointers
of size n or anything linear in size. Assume that except the data,
we do not have much space...

I thought of tricking sort() using a dummy Record class that is
templated using the size of the record...But since m can change
dynamically this doesnt work.
Thanks in advance for your comments,
--Elijah

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #1
40 4208
"Elijah Bailey" <ge******@hotmail.com> skrev i en meddelelse
news:e0**************************@posting.google.c om...
I want to sort a set of records using STL's sort() function,
but dont see an easy way to do it.

I have a

char *data;
which has size mn bytes where m is size of the record and
n is the number of records. Both these numbers are known
only dynamically. I have a function less_than that can compare
two records of size m given the pointers to the two records.

Is there an easy way to call STL sort() on this data and sort it.
The data is big and I do NOT want to allocate a list of pointers
of size n or anything linear in size. Assume that except the data,
we do not have much space...

I thought of tricking sort() using a dummy Record class that is
templated using the size of the record...But since m can change
dynamically this doesnt work.
Thanks in advance for your comments,
--Elijah

Make your own iterator class, which really should not be so difficult. Then
you can use all the C++ algorithms on your container, including std::sort.

Kind regards
Peter

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #2

"Elijah Bailey" <ge******@hotmail.com> skrev i meddelandet
news:e0**************************@posting.google.c om...
I want to sort a set of records using STL's sort() function,
but dont see an easy way to do it.

I have a

char *data;
which has size mn bytes where m is size of the record and
n is the number of records. Both these numbers are known
only dynamically. I have a function less_than that can compare
two records of size m given the pointers to the two records.

Is there an easy way to call STL sort() on this data and sort it.
The data is big and I do NOT want to allocate a list of pointers
of size n or anything linear in size. Assume that except the data,
we do not have much space...


Well, if the data is BIG, the standard method IS of course to sort an array
of pointers...

It will be MUCH faster than actually moving all the data around!

Bo Persson

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #3

"Elijah Bailey" <ge******@hotmail.com> wrote in message news:e0**************************@posting.google.c om...

Is there an easy way to call STL sort() on this data and sort it.
The data is big and I do NOT want to allocate a list of pointers
of size n or anything linear in size. Assume that except the data,
we do not have much space...


You're confusing me. How is it this data is stored if you don't have
pointers to it. Are you telling me it's all in contiguous memory and you
want it sorted in place without using any additional storage? The standard
C routine qsort will do this.

qsort(ptr_to_beginning, m, n, compare_func);

The compare func has to return positive if a > b, 0 if a==b and negative if a<b.
(It takes two void*'s as args).
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #4
Elijah Bailey wrote:
I want to sort a set of records using STL's sort() function,
but dont see an easy way to do it.

I have a

char *data;
which has size mn bytes where m is size of the record and
n is the number of records. Both these numbers are known
only dynamically. I have a function less_than that can compare
two records of size m given the pointers to the two records.

Is there an easy way to call STL sort() on this data and sort it.
The data is big and I do NOT want to allocate a list of pointers
of size n or anything linear in size. Assume that except the data,
we do not have much space...

I thought of tricking sort() using a dummy Record class that is
templated using the size of the record...But since m can change
dynamically this doesnt work.

Do all records in "data" have the same size m, or do different records
have different values of m?
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #5
On 31 Dec 2003 11:07:38 -0500, "Peter Koch Larsen" <pk*@mailme.dk>
wrote:
"Elijah Bailey" <ge******@hotmail.com> skrev i en meddelelse
news:e0**************************@posting.google.c om...
I thought of tricking sort() using a dummy Record class that is
templated using the size of the record...But since m can change
dynamically this doesnt work.

Make your own iterator class, which really should not be so difficult. Then
you can use all the C++ algorithms on your container, including std::sort.


Note his thoughts. Now tell us what the value type of that iterator
will be. Sort will likely want to create a temporary of that type.

John

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #6
Is there no easy way to solve this problem??
I dont want to use qsort from C...! I cant figure
out how to even write a random access iterator
for this problem. What will be its value type?

I would like to swap the records to sort them,
not copy keys and sort them separately...

Thanks in advance for your help,

Thanks ,
--Elijah

ge******@hotmail.com (Elijah Bailey) wrote in message news:<e0**************************@posting.google. com>...
I want to sort a set of records using STL's sort() function,
but dont see an easy way to do it.

I have a

char *data;
which has size mn bytes where m is size of the record and
n is the number of records. Both these numbers are known
only dynamically. I have a function less_than that can compare
two records of size m given the pointers to the two records.

Is there an easy way to call STL sort() on this data and sort it.
The data is big and I do NOT want to allocate a list of pointers
of size n or anything linear in size. Assume that except the data,
we do not have much space...

I thought of tricking sort() using a dummy Record class that is
templated using the size of the record...But since m can change
dynamically this doesnt work.
Thanks in advance for your comments,
--Elijah

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Jul 22 '05 #7

"Elijah Bailey" <ge******@hotmail.com> wrote in message news:e0**************************@posting.google.c om...
Is there no easy way to solve this problem??
I dont want to use qsort from C...


Why not? The std::sort isn't guaranteed to work any better.

Jul 22 '05 #8
Elijah Bailey wrote:
I want to sort a set of records using STL's sort() function,
but dont see an easy way to do it.

I have a

char *data;

which has size mn bytes where m is size of the record and
n is the number of records. Both these numbers are known
only dynamically. I have a function less_than that can compare
two records of size m given the pointers to the two records.

Is there an easy way to call STL sort() on this data and sort it.
The data is big and I do NOT want to allocate a list of pointers
of size n or anything linear in size. Assume that except the data,
we do not have much space...

I thought of tricking sort() using a dummy Record class that is
templated using the size of the record...But since m can change
dynamically this doesnt work. I dont want to use qsort from C...! I cant figure
out how to even write a random access iterator
for this problem. What will be its value type?

I would like to swap the records to sort them,
not copy keys and sort them separately...

If you want to use an existing implementation of a sorting algorithm,
you have to provide a way for the implementation to know where records
begin and end, and how to compare them. This issue is fundamental.

Writing your own iterators would not be trivial, but probably would be a
educational. Having such iterators available would make your data
structure compatible with most of the standard library, and might avoid
countless future headaches.

If you just want to sort your records in place, why not use qsort? It's
part of the C++ standard library. The fact that the function was
inherited from C is no reason to avoid qsort. In fact, it sounds like
almost the perfect fit to your problem.

-Jeff

Jul 22 '05 #9

"Jeff Schwab" <je******@comcast.net> wrote in message news:Wo********************@comcast.com...
Writing your own iterators would not be trivial, but probably would be a
educational. Having such iterators available would make your data
structure compatible with most of the standard library, and might avoid
countless future headaches.


The trouble with writing iterators for this problem is that he needs it to be
variable size at run time and that the iterators have to be derferenceable
and the derefenced types assignable. Add his requirement to not allocate
even enough memory to hold pointers to each object, I can't figure out how
you could do it.

Jul 22 '05 #10
Thanks Ron, I think that is definitely a problem.
But I still "think" it can be done, dont know how to thou...:)

"Ron Natalie" <ro*@sensor.com> wrote in message news:<3f***********************@news.newshosting.c om>...
"Jeff Schwab" <je******@comcast.net> wrote in message news:Wo********************@comcast.com...
Writing your own iterators would not be trivial, but probably would be a
educational. Having such iterators available would make your data
structure compatible with most of the standard library, and might avoid
countless future headaches.


The trouble with writing iterators for this problem is that he needs it to be
variable size at run time and that the iterators have to be derferenceable
and the derefenced types assignable. Add his requirement to not allocate
even enough memory to hold pointers to each object, I can't figure out how
you could do it.

Jul 22 '05 #11
On Fri, 02 Jan 2004 03:13:26 -0800, Elijah Bailey wrote:

[ Please don't top post, thank you. M4 ]

[ Crossposted to csc++, is this standard complient? ]

[ Short description of the problem ]

We have an array of n*m bytes. It holds n objects of size m. Both are only
known at runtime. For some strange reason we want to sort this using
std::sort. We know there are other solutions, the question is, can it be
done using std::sort.

Constraints: Not clear, but memory seems to be an issue, creating an array
of pointers and sort that is out.
"Ron Natalie" <ro*@sensor.com> wrote in message
news:<3f***********************@news.newshosting.c om>...
"Jeff Schwab" <je******@comcast.net> wrote in message
news:Wo********************@comcast.com...
> Writing your own iterators would not be trivial, but probably would
> be a educational. Having such iterators available would make your
> data structure compatible with most of the standard library, and
> might avoid countless future headaches.


The trouble with writing iterators for this problem is that he needs it
to be variable size at run time and that the iterators have to be
derferenceable and the derefenced types assignable. Add his
requirement to not allocate even enough memory to hold pointers to each
object, I can't figure out how you could do it.


Thanks Ron, I think that is definitely a problem. But I still "think" it
can be done, dont know how to thou...:)


I also think it can be done. However, it requires so much trickery that it
is not interesting to do so. Any of the other options quickly become very
appealing. My approach uses some dynamic memory, but probably less than an
array of pointers would take. The actual amount depends on the number of
copies std::sort makes internally, there is no way around this. I guess
for a typical implementation this would be around log(n) copies, unless
the implementation takes special care to dispose of temporaries. You
cannot count on any of this though.

OK, how can we do it? Obviously every iterator must know the size of the
object it is supposed to handle. The main problems we are facing:

- We cannot create real objects (directly), the size of the real object is
only known at runtime.
- std::sort is allowed to (and most frequently does) make extra copies of
objects.

Asusmptions:
- There are functions to copy these objects and to compare these objects.
- The objects to be sorted don't need special construction or destruction
and can be copied by the previously mentioned routine. They can be
constructed by copying into a new memory area.
- std::sort never uses pointers to objects, only iterators. I think the
standard mandates this, but couldn't find it.

We create a proxy class and let the iterator return proxy objects. These
proxy objects store a pointer to some master descriptor that stores the
location and size of the array and the size of the individual objects. We
could store this in the proxy object itself, but that seems wasteful.
Also, the proxy object stores a void* to store the data for the real
object.

This proxy class is what our iterators value_type is.

Now the proxy object implements some intelligent constructors. I doubt we
need a default constructor, the value type needs to be assignable only
(this follows from the iterator requirements). So we create a constructor
that is used only inside the iterator, it sets the data pointer to point
inside the array at the correct location. If the copy constructor is
called (which would be by std::sort) we dynamically allocate memory to
hold the object and use the copying routine to fill it.

The assignment operator must use the copying routine if the destination is
inside the array. If not, it can either use copy-construct-and-swap or the
copying routine.

The destructor would check if the pointer points inside the array and if
not deletes the object.

Obviously, we need to define an operator< for the proxy objects, this
calls the comparison function above. If we template the proxy objects, we
can use a functor for the comparison, potentially inlining the comaparison
(this would be the only reason to use std::sort anyhow, so it makes
sense).

Looking at the standard, this would fullfill all requirements for
std::sort I could find. Somehow I think I did overlook something here, so
scrutiny by others is appreciated. I also left obvious details out,
obviously :-)

A note on exceptions. This would give the basic guarentee, if the copying
and comparison routines give at least the basic guarentee.

A note on memory usage. Assuming that std::sort holds log(n) copies of
objects internally, this aproach uses less memory than the
sort-array-of-pointers aproach if log(n)*m < n*sizeof(void*). So this
greatly depends on these variables, but quickly holds for large n and gets
less interesting for large m. Assume sizeof(void*)==4, we get m <
4n/log(n). For n=100, m should be less than 60. For n=1000, m should be
less than 400. These are gross approximations but should give you a feel.

I hope anyone sees that you must have good reasons to do this. It is ugly,
errorprone, prone to assumptions your implementation of std::sort makes
(in this implementation &*it does not return a pointer to the real object,
any pointer arithmetic will fail), difficult to maintain. There is no real
reason why you should not call qsort in the first place, except as an
accademic exercise.

HTH,
M4

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:st*****@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]

Jul 22 '05 #12
In message <e0**************************@posting.google.com >, Elijah
Bailey <ge******@hotmail.com> writes
I want to sort a set of records using STL's sort() function,
but dont see an easy way to do it.

I have a

char *data;
which has size mn bytes where m is size of the record and
n is the number of records. Both these numbers are known
only dynamically. I have a function less_than that can compare
two records of size m given the pointers to the two records.

Is there an easy way to call STL sort() on this data and sort it.
The data is big and I do NOT want to allocate a list of pointers
of size n or anything linear in size. Assume that except the data,
we do not have much space...

I thought of tricking sort() using a dummy Record class that is
templated using the size of the record...But since m can change
dynamically this doesnt work.


Sorry to be so slow off the mark.

But if n and m are only known at execution time the array must be
created dynamically which immediately makes me think of using a vector.
The following trivial program demonstrates that.

#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

int main(){
int n, m;
cout << "n and m ";
cin >> n >> m;
vector< vector<char> > a(n, m);
for(int i(0); i != n; ++i){
for(int j(0); j != m; ++j){
a[i][j] = 'a' - i - j;
}
}

for(int i(0); i != n; ++i){
for(int j(0); j != m; ++j){
cout << a[i][j] << " ";
}
cout << '\n';
}
sort(a.begin(), a.end());
cout << "\n\n";
for(int i(0); i != n; ++i){
for(int j(0); j != m; ++j){
cout << a[i][j] << " ";
}
cout << '\n';
}
}

--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
or http://www.robinton.demon.co.uk
Happy Xmas, Hanukkah, Yuletide, Winter/Summer Solstice to all.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #13
On 3 Jan 2004 09:59:24 -0500, Francis Glassborow
<fr*****@robinton.demon.co.uk> wrote:
In message <e0**************************@posting.google.com >, Elijah
Bailey <ge******@hotmail.com> writes
Is there an easy way to call STL sort() on this data and sort it.
The data is big and I do NOT want to allocate a list of pointers
of size n or anything linear in size. Assume that except the data,
we do not have much space...


Requirement for no added space proportional to size.
But if n and m are only known at execution time the array must be
created dynamically which immediately makes me think of using a vector.


Just use an array of size structs containing three pointers each.

I think you missed the requirements.

John

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #14
Francis Glassborow wrote:
In message <e0**************************@posting.google.com >, Elijah
Bailey <ge******@hotmail.com> writes
I want to sort a set of records using STL's sort() function,
but dont see an easy way to do it.

I have a

char *data;
which has size mn bytes where m is size of the record and
n is the number of records. Both these numbers are known
only dynamically. I have a function less_than that can compare
two records of size m given the pointers to the two records.

Is there an easy way to call STL sort() on this data and sort it.
The data is big and I do NOT want to allocate a list of pointers
of size n or anything linear in size. Assume that except the data,
we do not have much space...

I thought of tricking sort() using a dummy Record class that is
templated using the size of the record...But since m can change
dynamically this doesnt work.

Sorry to be so slow off the mark.

But if n and m are only known at execution time the array must be
created dynamically which immediately makes me think of using a vector.


This problem was posed by someone who gets handed a char*, and does not
want to copy the data.

I came to almost the same conclusions as Martijn. I don't think
std::sort needs (or uses, in the version shipped with GCC) log_n copies
of data; a few simultaneous copies are, however, needed. I naively
assumed that *no* copies would be needed, and that implementing "swap"
would be sufficient for avoiding the need of any temporaries. Silly me.

Anyway, the code I've got now seems to work *except* that all of the
data get overwritten by one of the values, due to the fact that all
records (even copies made by std::sort) alias the actual data. I'm sure
this could be fixed with a little bit of effort. Please let me know if
there is sufficient interest for me to post the code I've got (546
lines, lots of white space).

-Jeff
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #15
In article <pa****************************@remove.this.part.r tij.nl>,
Martijn Lievaart <m@remove.this.part.rtij.nl> writes
On Fri, 02 Jan 2004 03:13:26 -0800, Elijah Bailey wrote:

[ Please don't top post, thank you. M4 ]

[ Crossposted to csc++, is this standard complient? ]

[ Short description of the problem ]

We have an array of n*m bytes. It holds n objects of size m. Both are only
known at runtime. For some strange reason we want to sort this using
std::sort. We know there are other solutions, the question is, can it be
done using std::sort.

Constraints: Not clear, but memory seems to be an issue, creating an array
of pointers and sort that is out.


But if n and m are only known at execution time the array must be
created dynamically which immediately makes me think of using a vector.
The following trivial program demonstrates that.

#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

int main(){
int n, m;
cout << "N and m";
cin >> n >> m;
vector< vector<char> > a(n, m);
for(int i(0); i != n; ++i){
for(int j(0); j != m; ++j){
a[i][j] = 'a' - i - j;
}
}

for(int i(0); i != n; ++i){
for(int j(0); j != m; ++j){
cout << a[i][j] << " ";
}
cout << '\n';
}
sort(a.begin(), a.end());
cout << "\n\n";
for(int i(0); i != n; ++i){
for(int j(0); j != m; ++j){
cout << a[i][j] << " ";
}
cout << '\n';
}
}


--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
or http://www.robinton.demon.co.uk
Happy Xmas, Hanukkah, Yuletide, Winter/Summer Solstice to all.

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:st*****@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]

Jul 22 '05 #16
m@remove.this.part.rtij.nl (Martijn Lievaart) wrote in message news:<pa****************************@remove.this.p art.rtij.nl>...
On Fri, 02 Jan 2004 03:13:26 -0800, Elijah Bailey wrote:

[ Please don't top post, thank you. M4 ]

[ Crossposted to csc++, is this standard complient? ]

[ Short description of the problem ]

We have an array of n*m bytes. It holds n objects of size m. Both are only
known at runtime. For some strange reason we want to sort this using
std::sort. We know there are other solutions, the question is, can it be
done using std::sort.

Thanks Martijn. This is a much better description than i came up with.
Constraints: Not clear, but memory seems to be an issue, creating an array
of pointers and sort that is out.
Constraints: O(n) space is not allowed. So you can not create a
pointer
to each object and sort them. O(log(n)) space is perfectly ok.
O(log(n)*m) is also allowed if that simplifies things.

My data has n >> m, i.e. n is much greater than m.
"Ron Natalie" <ro*@sensor.com> wrote in message
news:<3f***********************@news.newshosting.c om>...
"Jeff Schwab" <je******@comcast.net> wrote in message
news:Wo********************@comcast.com...

> Writing your own iterators would not be trivial, but probably would
> be a educational. Having such iterators available would make your
> data structure compatible with most of the standard library, and
> might avoid countless future headaches.

The trouble with writing iterators for this problem is that he needs it
to be variable size at run time and that the iterators have to be
derferenceable and the derefenced types assignable. Add his
requirement to not allocate even enough memory to hold pointers to each
object, I can't figure out how you could do it.
Thanks Ron, I think that is definitely a problem. But I still "think" it
can be done, dont know how to thou...:)


I also think it can be done. However, it requires so much trickery that it
is not interesting to do so. Any of the other options quickly become very
appealing. My approach uses some dynamic memory, but probably less than an
array of pointers would take. The actual amount depends on the number of
copies std::sort makes internally, there is no way around this. I guess
for a typical implementation this would be around log(n) copies, unless
the implementation takes special care to dispose of temporaries. You
cannot count on any of this though.


Hopefully, by the end of this thread, we'll have something simple
enough
to implement...:)
OK, how can we do it? Obviously every iterator must know the size of the
object it is supposed to handle. The main problems we are facing:

- We cannot create real objects (directly), the size of the real object is
only known at runtime.
- std::sort is allowed to (and most frequently does) make extra copies of
objects.

Asusmptions:
- There are functions to copy these objects and to compare these objects.
- The objects to be sorted don't need special construction or destruction
and can be copied by the previously mentioned routine. They can be
constructed by copying into a new memory area.
- std::sort never uses pointers to objects, only iterators. I think the
standard mandates this, but couldn't find it.

Would be interesting to know, what std::sort() is required to have
from
the standard. I think you are perfectly right that sort doesnt use
pointers
to the objects, only iterators and derefernces(*) the iterators...
We create a proxy class and let the iterator return proxy objects. These
proxy objects store a pointer to some master descriptor that stores the
location and size of the array and the size of the individual objects. We
could store this in the proxy object itself, but that seems wasteful.
Also, the proxy object stores a void* to store the data for the real
object.

This proxy class is what our iterators value_type is.

Now the proxy object implements some intelligent constructors. I doubt we
need a default constructor, the value type needs to be assignable only
(this follows from the iterator requirements). So we create a constructor
that is used only inside the iterator, it sets the data pointer to point
inside the array at the correct location. If the copy constructor is
called (which would be by std::sort) we dynamically allocate memory to
hold the object and use the copying routine to fill it.

The assignment operator must use the copying routine if the destination is
inside the array. If not, it can either use copy-construct-and-swap or the
copying routine.

The destructor would check if the pointer points inside the array and if
not deletes the object.

Obviously, we need to define an operator< for the proxy objects, this
calls the comparison function above. If we template the proxy objects, we
can use a functor for the comparison, potentially inlining the comaparison
(this would be the only reason to use std::sort anyhow, so it makes
sense).

Looking at the standard, this would fullfill all requirements for
std::sort I could find. Somehow I think I did overlook something here, so
scrutiny by others is appreciated. I also left obvious details out,
obviously :-)

A note on exceptions. This would give the basic guarentee, if the copying
and comparison routines give at least the basic guarentee.

A note on memory usage. Assuming that std::sort holds log(n) copies of
objects internally, this aproach uses less memory than the
sort-array-of-pointers aproach if log(n)*m < n*sizeof(void*). So this
greatly depends on these variables, but quickly holds for large n and gets
less interesting for large m. Assume sizeof(void*)==4, we get m <
4n/log(n). For n=100, m should be less than 60. For n=1000, m should be
less than 400. These are gross approximations but should give you a feel.

Usually this is certainly the case for my data.
I hope anyone sees that you must have good reasons to do this. It is ugly,
errorprone, prone to assumptions your implementation of std::sort makes
(in this implementation &*it does not return a pointer to the real object,
any pointer arithmetic will fail), difficult to maintain. There is no real
reason why you should not call qsort in the first place, except as an
accademic exercise.


First problem to qsort: It's slower than using sort()
Second problem to qsort: Later I might want to use other STL
algorithms on this data!
like for_each()? find()? ...?

Thanks a lot for your comments,
--Elijah

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:st*****@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]

Jul 22 '05 #17
On Sun, 04 Jan 2004 04:15:33 +0000, Elijah Bailey wrote:

[ csc++ removed, not relevant for what I want to say ]
First problem to qsort: It's slower than using sort()


Qsort is slower only if std::sort can inline the comparison. In this case
this seems to me that with all the other overhead qsort will actually be
faster (let alone easier).

M4

Jul 22 '05 #18
Martijn Lievaart <m@remove.this.part.rtij.nl> wrote in message news:<pa****************************@remove.this.p art.rtij.nl>...
On Sun, 04 Jan 2004 04:15:33 +0000, Elijah Bailey wrote:

[ csc++ removed, not relevant for what I want to say ]
First problem to qsort: It's slower than using sort()


Qsort is slower only if std::sort can inline the comparison. In this case
this seems to me that with all the other overhead qsort will actually be
faster (let alone easier).

M4


Yes, IF there is no 'simple' way to make std::sort() work.

And IF that is the case, then shouldnt there be another version
of std::sort() to do what I want to do. Doesnt it seem natural to
have an interface in c++ that does the same thing that the c
function qsort CAN do, without actually going the "c" way??

Thanks,
--Elijah

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #19
Sounds like some sort of intermediate class is required in order to spoof a
RandomAccessIterator as required by the STL sort() function.

i.e. you can create these to adapt the dynamic size of the elements to be
sorted to the static requirements of an iterator.

Though whether this is technically possible I'm not sure.

david m.

"Elijah Bailey" <ge******@hotmail.com> wrote in message
news:e0**************************@posting.google.c om...
m@remove.this.part.rtij.nl (Martijn Lievaart) wrote in message

news:<pa****************************@remove.this.p art.rtij.nl>...
On Fri, 02 Jan 2004 03:13:26 -0800, Elijah Bailey wrote:

[ Please don't top post, thank you. M4 ]

[ Crossposted to csc++, is this standard complient? ]

[ Short description of the problem ]

We have an array of n*m bytes. It holds n objects of size m. Both are only known at runtime. For some strange reason we want to sort this using
std::sort. We know there are other solutions, the question is, can it be
done using std::sort.


Thanks Martijn. This is a much better description than i came up with.
Constraints: Not clear, but memory seems to be an issue, creating an array of pointers and sort that is out.


Constraints: O(n) space is not allowed. So you can not create a
pointer
to each object and sort them. O(log(n)) space is perfectly ok.
O(log(n)*m) is also allowed if that simplifies things.

My data has n >> m, i.e. n is much greater than m.
"Ron Natalie" <ro*@sensor.com> wrote in message
news:<3f***********************@news.newshosting.c om>...
> "Jeff Schwab" <je******@comcast.net> wrote in message
> news:Wo********************@comcast.com...
>
> > Writing your own iterators would not be trivial, but probably would
> > be a educational. Having such iterators available would make your
> > data structure compatible with most of the standard library, and
> > might avoid countless future headaches.
>
> The trouble with writing iterators for this problem is that he needs it> to be variable size at run time and that the iterators have to be
> derferenceable and the derefenced types assignable. Add his
> requirement to not allocate even enough memory to hold pointers to each> object, I can't figure out how you could do it.

Thanks Ron, I think that is definitely a problem. But I still "think" it can be done, dont know how to thou...:)


I also think it can be done. However, it requires so much trickery that it is not interesting to do so. Any of the other options quickly become very appealing. My approach uses some dynamic memory, but probably less than an array of pointers would take. The actual amount depends on the number of
copies std::sort makes internally, there is no way around this. I guess
for a typical implementation this would be around log(n) copies, unless
the implementation takes special care to dispose of temporaries. You
cannot count on any of this though.


Hopefully, by the end of this thread, we'll have something simple
enough
to implement...:)
OK, how can we do it? Obviously every iterator must know the size of the
object it is supposed to handle. The main problems we are facing:

- We cannot create real objects (directly), the size of the real object is only known at runtime.
- std::sort is allowed to (and most frequently does) make extra copies of objects.

Asusmptions:
- There are functions to copy these objects and to compare these objects. - The objects to be sorted don't need special construction or destruction and can be copied by the previously mentioned routine. They can be
constructed by copying into a new memory area.
- std::sort never uses pointers to objects, only iterators. I think the
standard mandates this, but couldn't find it.


Would be interesting to know, what std::sort() is required to have
from
the standard. I think you are perfectly right that sort doesnt use
pointers
to the objects, only iterators and derefernces(*) the iterators...
We create a proxy class and let the iterator return proxy objects. These
proxy objects store a pointer to some master descriptor that stores the
location and size of the array and the size of the individual objects. We could store this in the proxy object itself, but that seems wasteful.
Also, the proxy object stores a void* to store the data for the real
object.

This proxy class is what our iterators value_type is.

Now the proxy object implements some intelligent constructors. I doubt we need a default constructor, the value type needs to be assignable only
(this follows from the iterator requirements). So we create a constructor that is used only inside the iterator, it sets the data pointer to point
inside the array at the correct location. If the copy constructor is
called (which would be by std::sort) we dynamically allocate memory to
hold the object and use the copying routine to fill it.

The assignment operator must use the copying routine if the destination is inside the array. If not, it can either use copy-construct-and-swap or the copying routine.

The destructor would check if the pointer points inside the array and if
not deletes the object.

Obviously, we need to define an operator< for the proxy objects, this
calls the comparison function above. If we template the proxy objects, we can use a functor for the comparison, potentially inlining the comaparison (this would be the only reason to use std::sort anyhow, so it makes
sense).

Looking at the standard, this would fullfill all requirements for
std::sort I could find. Somehow I think I did overlook something here, so scrutiny by others is appreciated. I also left obvious details out,
obviously :-)

A note on exceptions. This would give the basic guarentee, if the copying and comparison routines give at least the basic guarentee.

A note on memory usage. Assuming that std::sort holds log(n) copies of
objects internally, this aproach uses less memory than the
sort-array-of-pointers aproach if log(n)*m < n*sizeof(void*). So this
greatly depends on these variables, but quickly holds for large n and gets less interesting for large m. Assume sizeof(void*)==4, we get m <
4n/log(n). For n=100, m should be less than 60. For n=1000, m should be
less than 400. These are gross approximations but should give you a feel.


Usually this is certainly the case for my data.
I hope anyone sees that you must have good reasons to do this. It is ugly, errorprone, prone to assumptions your implementation of std::sort makes
(in this implementation &*it does not return a pointer to the real object, any pointer arithmetic will fail), difficult to maintain. There is no real reason why you should not call qsort in the first place, except as an
accademic exercise.


First problem to qsort: It's slower than using sort()
Second problem to qsort: Later I might want to use other STL
algorithms on this data!
like for_each()? find()? ...?

Thanks a lot for your comments,
--Elijah

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:st*****@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:st*****@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]

Jul 22 '05 #20
On Sun, 04 Jan 2004 18:42:58 -0500, Elijah Bailey wrote:
Martijn Lievaart <m@remove.this.part.rtij.nl> wrote in message news:<pa****************************@remove.this.p art.rtij.nl>...
On Sun, 04 Jan 2004 04:15:33 +0000, Elijah Bailey wrote:
> First problem to qsort: It's slower than using sort()


Qsort is slower only if std::sort can inline the comparison. In this case
this seems to me that with all the other overhead qsort will actually be
faster (let alone easier).

M4


Yes, IF there is no 'simple' way to make std::sort() work.

And IF that is the case, then shouldnt there be another version
of std::sort() to do what I want to do. Doesnt it seem natural to
have an interface in c++ that does the same thing that the c
function qsort CAN do, without actually going the "c" way??


I don't think so. Qsort is not type safe, there is no way to do so in C.
Std::sort improves on this, but you don't have a type known compile time.
Your problem is exotic in my eyes, I don't think std::sort should take it
into account.

HTH,
M4
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #21
"Elijah Bailey" <ge******@hotmail.com> wrote in message
news:e0**************************@posting.google.c om...
And IF that is the case, then shouldnt there be another version
of std::sort() to do what I want to do. Doesnt it seem natural to
have an interface in c++ that does the same thing that the c
function qsort CAN do, without actually going the "c" way??


Yes, it's called qsort. Why is it so hard for you to accept that
it's a part of Standard C++, not to mention the best fit to the
problem you describe?

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #22
In message <e0**************************@posting.google.com >, Elijah
Bailey <ge******@hotmail.com> writes
Martijn Lievaart <m@remove.this.part.rtij.nl> wrote in message news:<pa****************************@remove.this.p art.rtij.nl>...
On Sun, 04 Jan 2004 04:15:33 +0000, Elijah Bailey wrote:

[ csc++ removed, not relevant for what I want to say ]
> First problem to qsort: It's slower than using sort()


Qsort is slower only if std::sort can inline the comparison. In this case
this seems to me that with all the other overhead qsort will actually be
faster (let alone easier).

M4


Yes, IF there is no 'simple' way to make std::sort() work.

And IF that is the case, then shouldnt there be another version
of std::sort() to do what I want to do. Doesnt it seem natural to
have an interface in c++ that does the same thing that the c
function qsort CAN do, without actually going the "c" way??


Do you think that the OP's requirements are particularly common? Rightly
or wrongly, I do not. We have a lot of work to do on the Standard C++
Library without spending time on something such as this. If someone
really needs it they can write it for themselves (std::sort() does not
rely on some special magic) If they think it likely to be of more
general use they can submit their work either to Boost or directly to
the Library Working Group of WG21.

Actually we have a perfectly good interface that does what C's qsort
does, it is called qsort. What you are asking for is something else
which seems to be of very limited utility.
--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
or http://www.robinton.demon.co.uk
Happy Xmas, Hanukkah, Yuletide, Winter/Summer Solstice to all.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #23
ge******@hotmail.com (Elijah Bailey) wrote in message
news:<e0**************************@posting.google. com>...
Martijn Lievaart <m@remove.this.part.rtij.nl> wrote in message
news:<pa****************************@remove.this.p art.rtij.nl>...
On Sun, 04 Jan 2004 04:15:33 +0000, Elijah Bailey wrote:
First problem to qsort: It's slower than using sort()
Qsort is slower only if std::sort can inline the comparison. In this
case this seems to me that with all the other overhead qsort will
actually be faster (let alone easier).
Yes, IF there is no 'simple' way to make std::sort() work.
I've not followed the discussions completely, but if I understand
correctly, you want to invoke *one* sort function for different types of
data. (This isn't the way you present it, but it comes out to the same
thing. You have data in raw memory, represented by a char[]; sometimes,
it is one type, with one size, and sometimes it is another type, with
another size.)

This is not the way std::sort works, or for that matter, not the way the
standard library in general works. The whole point of templating is
that you have a separate instance of the function for each type of data.
And IF that is the case, then shouldnt there be another version of
std::sort() to do what I want to do. Doesnt it seem natural to have
an interface in c++ that does the same thing that the c function qsort
CAN do, without actually going the "c" way??


Well, the interface in C++ would be pretty close to the interface in C
anyway -- about the only difference might be to allow a functional
object to be used instead of only a pointer to a function.

--
James Kanze GABI Software mailto:ka***@gabi-soft.fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #24
"P.J. Plauger" <pj*@dinkumware.com> wrote in message
news:<3N*******************@nwrddc02.gnilink.net>. ..
"Elijah Bailey" <ge******@hotmail.com> wrote in message
news:e0**************************@posting.google.c om...
> And IF that is the case, then shouldnt there be another version
> of std::sort() to do what I want to do. Doesnt it seem natural to
> have an interface in c++ that does the same thing that the c
> function qsort CAN do, without actually going the "c" way??


Yes, it's called qsort. Why is it so hard for you to accept that
it's a part of Standard C++, not to mention the best fit to the
problem you describe?


Here's the real problem that I need it for (Which incidentally I never
mentioned)
Consider a database of fixed record size (Because doing variable record size
is already pain, even if you had everything in memory). Now if you had
a "STL" way of doing this, then you could
run all "STL" functions on this database compared to reading the file, doing
the same thing and writing it. (Is that faster? Yes, indeed it is. Benchmark
an mmap compared to fopen/read on ur machine, and u wud hopefully agree)

Yes, its true that maybe STL was only written for In-Memory data,
and STL implementations did not care about out of memory data.
But I believe that both for inmemory and outmemory data, now or
later, locality will be imporatant. And if STL takes into account that
fact, then not only could we use it for data in memory, but also on
data that resides on disk.

I dont want the Standard people plagued with the burden of doing it
for just my case. But I believe that handling fixed size record files is
fairly common, if not as common as using sort() for instance...

And anyway, I have an implementation right now which uses qsort
to do it. I just was amazed that it was so tough to modify sort() to
do the same thing...

Thanks for your time and comments,
--Elijah

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #25
ge******@hotmail.com (Elijah Bailey) wrote in message
news:<e0**************************@posting.google. com>...
"P.J. Plauger" <pj*@dinkumware.com> wrote in message
news:<3N*******************@nwrddc02.gnilink.net>. ..
"Elijah Bailey" <ge******@hotmail.com> wrote in message
news:e0**************************@posting.google.c om...
> And IF that is the case, then shouldnt there be another version
> of std::sort() to do what I want to do. Doesnt it seem natural to
> have an interface in c++ that does the same thing that the c
> function qsort CAN do, without actually going the "c" way??
Yes, it's called qsort. Why is it so hard for you to accept that
it's a part of Standard C++, not to mention the best fit to the
problem you describe?
Here's the real problem that I need it for (Which incidentally I never
mentioned) Consider a database of fixed record size (Because doing
variable record size is already pain, even if you had everything in
memory). Now if you had a "STL" way of doing this, then you could run
all "STL" functions on this database compared to reading the file,
doing the same thing and writing it. (Is that faster? Yes, indeed it
is. Benchmark an mmap compared to fopen/read on ur machine, and u wud
hopefully agree)


Do you know the structure of this data at compile time? If so, there is
no problem creating the necessary PODs and iterators, and mapping them
to the data. Theoretically, you can't guarantee the lack of padding,
but it is a technique I've used once or twice, it generally works, and
if you are using mmap, you aren't very portable anyway.

My understanding was that the size and structure of the elements was not
known until runtime. If this is the case, I really don't see any
general answer -- the STL is based on compile time type resolution.

--
James Kanze GABI Software mailto:ka***@gabi-soft.fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #26
Francis Glassborow <fr*****@robinton.demon.co.uk> wrote in message
news:<Dm**************@robinton.demon.co.uk>...
In message <e0**************************@posting.google.com >, Elijah
Bailey <ge******@hotmail.com> writes
Martijn Lievaart <m@remove.this.part.rtij.nl> wrote in message
news:<pa****************************@remove.this. part.rtij.nl>...
On Sun, 04 Jan 2004 04:15:33 +0000, Elijah Bailey wrote: > First problem to qsort: It's slower than using sort() Qsort is slower only if std::sort can inline the comparison. In
this case this seems to me that with all the other overhead qsort
will actually be faster (let alone easier).
Yes, IF there is no 'simple' way to make std::sort() work. And IF that is the case, then shouldnt there be another version of
std::sort() to do what I want to do. Doesnt it seem natural to have
an interface in c++ that does the same thing that the c function
qsort CAN do, without actually going the "c" way??
Do you think that the OP's requirements are particularly common?


Yes and no. For the moment, I'm not completely sure what his
requirements are.

There are certainly various reasons why one might want to overlay a data
structure over pure memory. To date, I've had very good success in
doing this, provided that the struct was a POD type, and that all
alignment requirements were met in the actual data set. (Most of the
time, the struct will only contain arrays of char, so there are no
alignment requirements.)

Technically, the standard doesn't make any guarantees, but in practice,
I've had no trouble with doing things like:

struct Record
{
char state ;
char id[ 20 ] ;
char value[ 11 ] ;
// ...
} ;

std::vector< char > buffer ;
buffer.resize( recordCount * sizeof( Record ) ) ;
Record* record = reinterpret_cast< Record* >( &buffer[ 0 ] ) ;

// ...

char* buffer = mmap( ... ) ;
Record* record = reinterpret_cast< Record* >( buffer ) ;

etc. Obviously, if this works, it is trivial to create an STL
compatible iterator for the buffer, and use it with the STL algorithms.

As I said, this is useful, and it is not guaranteed by the standard. On
the other hand, I'm not convinced that it is worth the effort that might
be necessary -- while the intent seems rather straightforward, I'd hate
to have to come up with the words to express it rigorously without
limiting implementations too much otherwise.

--
James Kanze GABI Software mailto:ka***@gabi-soft.fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #27
Elijah Bailey wrote:
I want to sort a set of records using STL's sort() function,
but dont see an easy way to do it.

I have a

char *data;
which has size mn bytes where m is size of the record and
n is the number of records. Both these numbers are known
only dynamically. I have a function less_than that can compare
two records of size m given the pointers to the two records.

Is there an easy way to call STL sort() on this data and sort it.
The data is big and I do NOT want to allocate a list of pointers
of size n or anything linear in size. Assume that except the data,
we do not have much space...

I thought of tricking sort() using a dummy Record class that is
templated using the size of the record...But since m can change
dynamically this doesnt work.
Thanks in advance for your comments,
--Elijah

http://home.comcast.net/~jeffrey.schwab/code/records/

Jul 22 '05 #28
>
http://home.comcast.net/~jeffrey.schwab/code/records/


I tried compiling it with g++ 2.96 and 3.1...both dont compile the code...
Thanks for the code thou...wud be nice if I could test it on my linux box.
--Elijah
Jul 22 '05 #29
> Here's the real problem that I need it for (Which incidentally I never
mentioned)
You should have mentioned it. Nobody likes putting sweat and effort into
solving artificial problems and later being confronted with the "real"
problem.
I dont want the Standard people plagued with the burden of doing it
for just my case. But I believe that handling fixed size record files is
fairly common, if not as common as using sort() for instance...

And anyway, I have an implementation right now which uses qsort
to do it. I just was amazed that it was so tough to modify sort() to
do the same thing...
It is tough. sort() is templatised which means it _has_ to determine the
size of its elements at compile-time rather than run-time.
And there is no way that is likely to ever change.
Anyone who proposes adding additional dynamic run-time ideas to C++ will
normally be shot down.
Anyone who proposes adding additional static compile-time ideas to C++ will
be welcome with open arms.

To get sort() working, one of your constraints has to go.

I have sort() working with arbitrary size elements. But the price I paid was
in your original message as I have a vector of pointers to the elements. And
in the enviroment I am in, pointers are 4-bytes, the records are
considerably larger, sorting the pointers seems a good option. But with the
messages posted here, I am wondering whether I would have been better off
with qsort(). The code would be simpler. If the element size is 4 or less
bytes, I can make the additional optimisation of not using a vectors of
pointer at all but using qsort() directly on the records.

Yes, its true that maybe STL was only written for In-Memory data,
and STL implementations did not care about out of memory data.
But I believe that both for inmemory and outmemory data, now or
later, locality will be imporatant. And if STL takes into account that
fact, then not only could we use it for data in memory, but also on
data that resides on disk.


If the idea is to sort more data than fits in memory, it has been done. In
such a situation, you read as much as possible in chunks, sort, and write to
temporary files.
Later you perform an n-way merge of the temporary files, writing the final
result o some output file. All in Knuth.

Stephen Howe

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #30
In message <3f**********************@reading.news.pipex.net >, Stephen
Howe <NO**********@dial.pipex.com> writes
If the idea is to sort more data than fits in memory, it has been done. In
such a situation, you read as much as possible in chunks, sort, and write to
temporary files.
Later you perform an n-way merge of the temporary files, writing the final
result o some output file. All in Knuth.


Even here index files are often used and when the data has been
completely processed the data file is re-written in sorted order. In
many systems writing is much more expensive than reading and so we gain
by keeping it to a minimum.

Analysing performance on systems with mixed memory speeds (all the way
from level 1 cache through to disc or even tape) does not respond well
to big 'O' formulae. Worse, it is very sensitive to the resources
currently available (I can still remember the problem I had that
resulted from a client having used a single directory on a MSDOS machine
for all his word-processing files for three years -- he had never
deleted anything and he had set the application to keep two back-ups. It
took six hours just to remove the second back-ups, and almost 24-hours
for a batch file to reorganise the directory into 26 sub-directories on
the basis of the first letter of the file names)

Even though all your memory-devices may be random access the difference
in speed may still favour treating the slowest as if it were a serial
access device.
--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
or http://www.robinton.demon.co.uk
Happy Xmas, Hanukkah, Yuletide, Winter/Summer Solstice to all.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #31
Elijah Bailey wrote:
http://home.comcast.net/~jeffrey.schwab/code/records/

I tried compiling it with g++ 2.96 and 3.1...both dont compile the code...
Thanks for the code thou...wud be nice if I could test it on my linux box.
--Elijah


What's the error? It works fine on Mandrake with gcc 3.3.

Jul 22 '05 #32
Stephen Howe wrote:
It is tough. sort() is templatised which means it _has_ to determine the
size of its elements at compile-time rather than run-time.


std::sort is templatized on an iterator type. "All" you need to do is
provide a random access iterator that knows how to deal with the array
of bytes in question. The value type is going to need to be a proxy
class for dealing with that underlying memory. And that size of an
element controlled by such an iterator certainly does not have to be
part of its type.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #33
In message <10**************@master.nyc.kbcfp.com>, Hyman Rosen
<hy*****@mail.com> writes
Stephen Howe wrote:
It is tough. sort() is templatised which means it _has_ to determine the
size of its elements at compile-time rather than run-time.


std::sort is templatized on an iterator type. "All" you need to do is
provide a random access iterator that knows how to deal with the array
of bytes in question. The value type is going to need to be a proxy
class for dealing with that underlying memory. And that size of an
element controlled by such an iterator certainly does not have to be
part of its type.


And the real benefit from using this approach is that it allows you to
use many other algorithms (and in the context of sorting it allows you
to use stable_sort and partial_sort).

I doubt that this kind of iterator is worth it for the Standard Library,
but if you use flat (probably fixed record width) databases and mmap it
could be worth the effort. However if you are concerned with performance
(in terms of speed) I suspect you would be better off looking at the
specialised sorts for data held in serial access storage.
--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
or http://www.robinton.demon.co.uk
Happy Xmas, Hanukkah, Yuletide, Winter/Summer Solstice to all.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #34
On Tue, 06 Jan 2004 15:46:26 -0500, Jeff Schwab wrote:
http://home.comcast.net/~jeffrey.schwab/code/records/


Very nice! Shows the power of the STL with user defined iterators.

So in the end, it was not that difficult to do. Nobody really commented if
this would be 100% complient, but I doubt it isn't. I retract my earlier
comment on it being a difficult technique, even though I described what
you coded.

M4

Jul 22 '05 #35
Martijn Lievaart wrote:
On Tue, 06 Jan 2004 15:46:26 -0500, Jeff Schwab wrote:

http://home.comcast.net/~jeffrey.schwab/code/records/

Very nice! Shows the power of the STL with user defined iterators.


The STL has been very good to me.
So in the end, it was not that difficult to do. Nobody really commented if
this would be 100% complient, but I doubt it isn't. I retract my earlier
comment on it being a difficult technique, even though I described what
you coded.


It should be compliant, but it ain't standard-library quality. :) At
any rate, there's a lot of potential for misuse. I think the fact that
you and I had about the same idea of how this should be done does
suggest that it's a reasonable approach, though.

Jul 22 '05 #36
On Sun, 11 Jan 2004 08:26:32 -0500, Jeff Schwab wrote:
So in the end, it was not that difficult to do. Nobody really commented if
this would be 100% complient, but I doubt it isn't. I retract my earlier
comment on it being a difficult technique, even though I described what
you coded.


It should be compliant, but it ain't standard-library quality. :) At
any rate, there's a lot of potential for misuse. I think the fact that
you and I had about the same idea of how this should be done does
suggest that it's a reasonable approach, though.


There has been a lot of talk about iterators and proxy objects in the
past. std::vector<bool> showed that it is impossible to write a
STL-conforming container that uses proxy objects. Unfortunately, I'm a
little unclear on the details and my Meyers books seem to have gone of to
outer space somewhere.

Now what was it again why vector<bool> is not compliant? Does it apply to
this situation? Anyone knows?

M4

Jul 22 '05 #37
ge******@hotmail.com (Elijah Bailey) wrote
I want to sort a set of records using STL's sort() function,
but dont see an easy way to do it.


I really do think that qsort() is the way to go.
However, here's what you asked for.

First, you need a strucure to sort.

// Structure needed for sort.
template <int size>struct Record {
char dat[size];
bool operator<(const Record<size>&rhs) const
{ return less_than(dat,rhs.dat); }
};

On some systems (or with some compiler settings!), this might only work
correctly if size is a multiple of some padding value. I leave you to
figure out how to deal with this.

The sort itself is trivially easy.

// The sort itself.
template<int size>void dosort() {
Record<size> *dat = reinterpret_cast<Record<size>*>(data);
std::sort(dat, dat+n);
}

Unfortunately, you can't just call
dosort<m>
because m isn't a compile-time const. This makes sense if you think
about it -- your comments about speed are correct, but what you're doing
is trading speed against code size. The code is expanded in-line, which
means it needs to understand the recordsize before you begin. So, in
order to call it you're going to need a constant. Here's the only way I
can think of to convert a variable into a compile-time const.

// The only way to call it!
void callsort() {
switch (m) {
case 4: dosort< 4>(); break;
case 8: dosort< 8>(); break;
case 12: dosort<12>(); break;
case 16: dosort<16>(); break;
case 20: dosort<20>(); break;
// ... etc...
default: throw "Missing case!";
}
}

You're going to need a case for every possible value of m.

If you test the above code with Microsoft Visual C++ 6.0, you're going to
get an error. This is caused by a compiler bug; if the only place we
instantiate Record<size> is within dosort<size>, the compiler gets it
very wrong (in this case, it will silently always use Record<20>). The
solution is to instantiate Record<size> in callsort, and pass it to
dosort:

template<int size>void dosort(Record<size>unused) {
Record<size> *dat = reinterpret_cast<Record<size>*>(data);
std::sort(dat, dat+n);
}

// The only way to call it!
void callsort() {
switch (m) {
case 4: dosort(Record< 4>()); break;
case 8: dosort(Record< 8>()); break;
case 12: dosort(Record<12>()); break;
case 16: dosort(Record<16>()); break;
case 20: dosort(Record<20>()); break;
// ... etc...
default: throw "Missing case!";
}
}

By explicitly instantiating every Record<n> size, we bypass the Microsoft
compiler bug. Note: I only tested this with values that were a multiple of
4. I don't know if it would work for odd lengths or not (this would be
compiler-dependant anyway).

I think that by now you can see the problem with this approach: code bloat.
(Not talking about source code, but generated code!) All of the in-line
logic to do the sort is being instantiated 5 times. If m can have 200
possible values, the problem is 40 times worse. If m can have 5000 possible
values, you may find that your program is MUCH bigger than the database
you're trying to sort. You mentioned that you were low on space. Even if
you do have enough RAM for this, depending on how your system works,
loading all of this code into memory might actually make your program
seem sluggish.

Have you considered the many virtues of qsort()? It *is* part of C++, and
the specs match your situation so closely, it's as if they custom wrote it
for you.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #38
ka***@gabi-soft.fr wrote in message news:<d6**************************@posting.google. com>...
ge******@hotmail.com (Elijah Bailey) wrote in message
news:<e0**************************@posting.google. com>...
> Here's the real problem that I need it for (Which incidentally I never
> mentioned) Consider a database of fixed record size (Because doing
> variable record size is already pain, even if you had everything in
> memory). Now if you had a "STL" way of doing this, then you could run
> all "STL" functions on this database compared to reading the file,
> doing the same thing and writing it. (Is that faster? Yes, indeed it
> is. Benchmark an mmap compared to fopen/read on ur machine, and u wud
> hopefully agree)


Do you know the structure of this data at compile time? If so, there is
no problem creating the necessary PODs and iterators, and mapping them
to the data. Theoretically, you can't guarantee the lack of padding,
but it is a technique I've used once or twice, it generally works, and
if you are using mmap, you aren't very portable anyway.


I know I'm coming in a late here, but I managed to implement something
which sounds somewhat similar to the problem being described:

- First, I implemented a mmapped file array class after reading an
article by Matt Austern on the subject (the link I have for it,
http://www.cuj.com/experts/1907/aust...?topic=experts, is now
broken, apparently), rendering the mmapped implementation at least
somewhat portable by hiding it behind an interface.

- Then I created a Table class to parse out the fields and records
from a particular flat-file relational database table format which
uses the mmapped file under the hood.

- Then I created a templated Iterator class that uses an adapter
function to convert each record in a Table to the desired view, as it
were.

- Voila, now I can use mmapped files with the STL.

Granted, I'm using mostly lower_bound() and equal_range(), as the data
I deal with is read-only; but I imagine crafting an iterator to write
to fixed-length records to use with sort() and other mutating
algorithms wouldn't be such a big leap.

And, incidentally, I wonder if my Iterator doesn't just replicate
functionality in Boost, but I've yet to play with Boost that much yet
(though I've been reading up on it) and I doubt it would work in our
baseline, as we're bound to use Sun CC 5.3 for some time.

Hope this helps,

Mike

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #39
On Mon, 12 Jan 2004 15:05:13 -0500, Mike Bland wrote:
Granted, I'm using mostly lower_bound() and equal_range(), as the data
I deal with is read-only; but I imagine crafting an iterator to write
to fixed-length records to use with sort() and other mutating
algorithms wouldn't be such a big leap.


Remember that std::sort stores additional copies, so you'll have to create
proxy objects. Someone posted an implementation, but I cannot find it
right now (this thread has jumped over multiple groups, so it might be in
acllc-c++ or clc++m).

HTH,
M4

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #40
Martijn Lievaart <m@remove.this.part.rtij.nl> wrote in message news:<pa****************************@remove.this.p art.rtij.nl>...
On Mon, 12 Jan 2004 15:05:13 -0500, Mike Bland wrote:
> Granted, I'm using mostly lower_bound() and equal_range(), as the data
> I deal with is read-only; but I imagine crafting an iterator to write
> to fixed-length records to use with sort() and other mutating
> algorithms wouldn't be such a big leap.


Remember that std::sort stores additional copies, so you'll have to create
proxy objects. Someone posted an implementation, but I cannot find it
right now (this thread has jumped over multiple groups, so it might be in
acllc-c++ or clc++m).


D'oh! Right, I momentarily forgot that the iterator I developed is
templated by an adapter function *and* a proxy object; the adapter
converts the raw data to the proxy type upon dereferencing. And, now
that you mention it, I'd imagine I'd have to make the proxy
assignable, not the iterator implementation.

Mike

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jul 22 '05 #41

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Lad | last post by:
Hi, I have a file of records of 4 fields each. Each field is separated by a semicolon. That is Filed1;Ffield2;Field3;Field4 But there may be also empty records such as ;;;; (only...
9
by: SASRS | last post by:
Here is my code. I am trying to see if I can put a %@ LANGUAGE="VBSCRIPT" %> <% FieldA = Request.QueryString("A") FieldB = Request.QueryString("B") FieldC = Request.QueryString("C") ...
8
by: Mike | last post by:
Hello, I have a few rather urgent questions that I hope someone can help with (I need to figure this out prior to a meeting tomorrow.) First, a bit of background: The company I work for is...
2
by: Todd | last post by:
Hi. I want to sort the records on my form (using either a continuous form or a datasheet) by the unbound "description" column in a combo box on the form (or in the datasheet.) Here's a rough...
3
by: Neil Hindry | last post by:
I wonder if you can help me. I have setup an address-book database in Access XP. I have the first name & surname as separate fields. As I wanted to sort my database by surname and then by first...
8
by: Matthew Curiale | last post by:
I am creating an app that lists clients of a company for management of different attributes for that company. The first page is a listing of the companies currently in the database. I have my...
0
by: SvenMathijssen | last post by:
Hi, I've been wrestling with a problem for some time that ought to be fairly simple, but turns out to be very difficult for me to solve. Maybe someone here knows the answer. What I try to do is...
3
KevinADC
by: KevinADC | last post by:
If you are entirely unfamiliar with using Perl to sort data, read the "Sorting Data with Perl - Part One and Two" articles before reading this article. Beginning Perl coders may find this article...
1
by: dorandoran | last post by:
The sort on the childgrid is not working; nothing happens when I click on the each column header for sort. (I followed Satay's sample: http://www.codeproject.com/KB/aspnet/EditNestedGridView.aspx)...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.