Fast STL data structure

tim.lino

Hello,

I would like to use C++ STL to store a set of Object's which is as
follows:

class Object
{
public:
int value;
......
}

I need to perform the following actions:

1. Sort Objects in the increasing order of their "value".
2. After sorting, I need to randomly access some of the Objects.

As I know, vector is very efficient in random access. However, sorting
vector of size n takes O(nlog n) time which is not good.

Is there a good data structure (or a combination of some) such that it
is efficient in random access and sorting?

Thank You.

May 1 '07 #1

Subscribe Post Reply

4188

Victor Bazarov

ti******@gmail.com wrote:

I would like to use C++ STL to store a set of Object's which is as
follows:

class Object
{
public:
int value;
......
}

I need to perform the following actions:

1. Sort Objects in the increasing order of their "value".
2. After sorting, I need to randomly access some of the Objects.

As I know, vector is very efficient in random access. However, sorting
vector of size n takes O(nlog n) time which is not good.

Is there a good data structure (or a combination of some) such that it
is efficient in random access and sorting?

Collect them in a set, then copy the set into a vector.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

May 1 '07 #2

cpunerd

their are sorts that are better than O(n lg n), but they are general,
and hence not easily templatized. try implementing radix/counting sort
for your own data stored within a vector.
tim.l...@gmail.com wrote:

Hello,

I would like to use C++ STL to store a set of Object's which is as
follows:

class Object
{
public:
int value;
......
}

I need to perform the following actions:

1. Sort Objects in the increasing order of their "value".
2. After sorting, I need to randomly access some of the Objects.

As I know, vector is very efficient in random access. However, sorting
vector of size n takes O(nlog n) time which is not good.

Is there a good data structure (or a combination of some) such that it
is efficient in random access and sorting?

Thank You.

May 1 '07 #3

cpunerd

collecting in a set is still O(n lg n)
Victor Bazarov wrote:

ti******@gmail.com wrote:
I would like to use C++ STL to store a set of Object's which is as
follows:

class Object
{
public:
int value;
......
}

I need to perform the following actions:

1. Sort Objects in the increasing order of their "value".
2. After sorting, I need to randomly access some of the Objects.

As I know, vector is very efficient in random access. However, sorting
vector of size n takes O(nlog n) time which is not good.

Is there a good data structure (or a combination of some) such that it
is efficient in random access and sorting?

Collect them in a set, then copy the set into a vector.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

May 1 '07 #4

cpunerd

Hello,

I would like to use C++ STL to store a set of Object's which is as
follows:

class Object
{
public:
int value;
......
}

I need to perform the following actions:

1. Sort Objects in the increasing order of their "value".
2. After sorting, I need to randomly access some of the Objects.

As I know, vector is very efficient in random access. However, sorting
vector of size n takes O(nlog n) time which is not good.

Is there a good data structure (or a combination of some) such that it
is efficient in random access and sorting?

Thank You.

May 1 '07 #5

peter koch

On 1 Maj, 05:09, tim.l...@gmail.com wrote:

Hello,

I would like to use C++ STL to store a set of Object's which is as
follows:

class Object
{
public:
int value;
......

}

I need to perform the following actions:

1. Sort Objects in the increasing order of their "value".
2. After sorting, I need to randomly access some of the Objects.

As I know, vector is very efficient in random access. However, sorting
vector of size n takes O(nlog n) time which is not good.

Is there a good data structure (or a combination of some) such that it
is efficient in random access and sorting?

Thank You.

Vector is probably the correct choice. Sorting is inherently n log n
if you sort by comparing, and the specialised sorts are only useful in
special cases.

/Peter

May 1 '07 #6

Andrew Koenig

As I know, vector is very efficient in random access. However, sorting

vector of size n takes O(nlog n) time which is not good.

What is not good about it?

May 1 '07 #7

Dave Steffen

ti******@gmail.com writes:

Hello,

I would like to use C++ STL to store a set of Object's which is as
follows:

class Object
{
public:
int value;
......
}

I need to perform the following actions:

1. Sort Objects in the increasing order of their "value".
2. After sorting, I need to randomly access some of the Objects.

As I know, vector is very efficient in random access. However, sorting
vector of size n takes O(nlog n) time which is not good.

A point: _all_ sorting operations are O(n log n). You just can't do
any better. What you _can_ do is reduce the constants in front of
the n log n, by using well-implemented libraries, and being clever.
For example, in Effective STL, Scott Meyers points out an
alternative to using sets (and maps) that, given certain usage
patterns, can provide significant run-time advantages. (If you
haven't read Effective STL, you should.)
----------------------------------------------------------------------
Dave Steffen, Ph.D. Disobey this command!
Software Engineer IV - Douglas Hofstadter
Numerica Corporation
dg@steffen a@t numerica d@ot us (remove @'s to email me)

May 1 '07 #8

Dave Steffen

[Don't top-post. Reordered.]

cp*****@gmail.com writes:

Victor Bazarov wrote:
ti******@gmail.com wrote:
I would like to use C++ STL to store a set of Object's which is as
follows:
>
class Object
{
public:
int value;
......
}
>
I need to perform the following actions:
>
1. Sort Objects in the increasing order of their "value".
2. After sorting, I need to randomly access some of the Objects.
>
As I know, vector is very efficient in random access. However, sorting
vector of size n takes O(nlog n) time which is not good.
>
Is there a good data structure (or a combination of some) such that it
is efficient in random access and sorting?
Collect them in a set, then copy the set into a vector.

collecting in a set is still O(n lg n)

Indeed. Again, you can't beat n log n for sorting; IIRC that's been
definitively proven.

While I wouldn't do exactly what Victor suggests, he's got a point,
which is effectively the same as Scott Meyer's point in Effective
STL. Sets (and maps) are really optimized for the use case "insert,
read, add or delete, read, add or delete..." In other words, the data
in the container is frequently changed.

Putting your data in a vector (which provides fast access) and sorting
it _once_ up front can provide substantial speed increases; we've used
that approach several times.

My only quibble with Victor's solution is the extra memory management
and copying that will go on behind the scenes. I'd say put your data
in a vector (or maybe deque?), sort, and then use. This is probably
the fastest "general purpose" solution there is.

----------------------------------------------------------------------
Dave Steffen, Ph.D. Disobey this command!
Software Engineer IV - Douglas Hofstadter
Numerica Corporation
dg@steffen a@t numerica d@ot us (remove @'s to email me)

May 1 '07 #9

Alf P. Steinbach

* Dave Steffen:

ti******@gmail.com writes:

>Hello,

I would like to use C++ STL to store a set of Object's which is as
follows:

class Object
{
public:
int value;
......
}

I need to perform the following actions:

1. Sort Objects in the increasing order of their "value".
2. After sorting, I need to randomly access some of the Objects.

As I know, vector is very efficient in random access. However, sorting
vector of size n takes O(nlog n) time which is not good.

A point: _all_ sorting operations are O(n log n). You just can't do
any better.

Although off-topic in clc++m, this urban myth (an over-generalization of
a basic result) should not be propagated, so:

The O(n log n) limit is for random data, with enough of it, on a
sequential machine. Theoretically it stems from the fact that n items
can be arranged in n! ways, and n^n <= n!^2 <= (n^2)^n. Now just take
logs to find the number of bits needed to represent a permutation, which
is the same as the number of binary choices needed to select one.

As soon as you can make assumptions about the data, or can establish
properties of the data, you can do better (let's not discuss advanced
computer architectures). The most trivial example is when you know the
data is already sorted, yielding O(1) sorting time. As a more
practically useful example, a least significant digit radix sort has
sorting time O(nk), where k is the length of a key.

What you _can_ do is reduce the constants in front of
the n log n, by using well-implemented libraries, and being clever.

Being clever is in general not very clever...

For example, in Effective STL, Scott Meyers points out an
alternative to using sets (and maps) that, given certain usage
patterns, can provide significant run-time advantages. (If you
haven't read Effective STL, you should.)

The OP's problem is trivially solved by using std::map. He just needs
to provide a strict weak ordering, operator<, for his Object class.
Before optimizing or even thinking about it, measure.

----------------------------------------------------------------------
Dave Steffen, Ph.D. Disobey this command!
Software Engineer IV - Douglas Hofstadter
Numerica Corporation
dg@steffen a@t numerica d@ot us (remove @'s to email me)

Please use a proper signature delimiter (two dashes followed by space,
with nothing else on that line).

Oh well, I'll never get a PhD, nor a job with Numerica Corporation, I'm
sure. :-)
--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

May 1 '07 #10

Dave Steffen

"Alf P. Steinbach" <al***@start.nowrites:

* Dave Steffen:

[...]

As I know, vector is very efficient in random access. However, sorting
vector of size n takes O(nlog n) time which is not good.
A point: _all_ sorting operations are O(n log n). You just can't
do
any better.

Although off-topic in clc++m, this urban myth (an over-generalization
of a basic result) should not be propagated, so:

The O(n log n) limit is for random data, with enough of it, on a
sequential machine. Theoretically it stems from the fact that n items
can be arranged in n! ways, and n^n <= n!^2 <= (n^2)^n. Now just take
logs to find the number of bits needed to represent a permutation,
which is the same as the number of binary choices needed to select one.

As soon as you can make assumptions about the data, or can establish
properties of the data, you can do better (let's not discuss advanced
computer architectures). The most trivial example is when you know
the data is already sorted, yielding O(1) sorting time. As a more
practically useful example, a least significant digit radix sort has
sorting time O(nk), where k is the length of a key.

Of course. O(n log n) isn't so much an urban myth as the best
answer you can give if you _don't_ know anything more about the
input data. Or, put it another way, it's really an _upper bound_ on
how long it takes to sort.

The OP wanted better than O(n log n) but didn't specify any
assumptions about the data. Arguably the correct answer should have
been not "n log n is as good as you can do" but "your question is
underspecified, no answer possible". :-)

What you _can_ do is reduce the constants in front of
the n log n, by using well-implemented libraries, and being clever.

Being clever is in general not very clever...

Well, being clever is clever only if you can do it in general. :-)

What I meant by that, but didn't elaborate on, is what you said
above; you can do better, but only under specific circumstances
(e.g. when assumptions about the nature of the data are valid).

For example, in Effective STL, Scott Meyers points out an
alternative to using sets (and maps) that, given certain usage
patterns, can provide significant run-time advantages. (If you
haven't read Effective STL, you should.)

The OP's problem is trivially solved by using std::map. He just needs
to provide a strict weak ordering, operator<, for his Object
class.

Well, that's still O(n log n) which he wanted to improve on. My
advice was more about usage patterns; if you're stuck with n log n,
you can still (sometimes) do a lot to file down the constants involved.

Before optimizing or even thinking about it, measure.

Absolutely, and always.

Dave Steffen, Ph.D. Disobey this command!
Software Engineer IV - Douglas Hofstadter
Numerica Corporation

[...]

Oh well, I'll never get a PhD, nor a job with Numerica Corporation,
I'm sure. :-)

Dunno, we tend to be chronically short on C++ people. :-) (Sig
fixed.)

--
Dave Steffen, Ph.D. Disobey this command!
Software Engineer IV - Douglas Hofstadter
Numerica Corporation
dg@steffen a@t numerica d@ot us (remove @'s to email me)

May 1 '07 #11

Juha Nieminen

Dave Steffen wrote:

Again, you can't beat n log n for sorting; IIRC that's been
definitively proven.

You can't sort faster than nlogn when using *comparison* sorting.
You can achieve linear sorting time by other means, but those are
not based on comparisons between elements and thus don't work on
all possible element types. If the element is an int, a faster
sorting algorithm can be used (such as radix sort).

OTOH one has to question if it's worth the efforts. nlogn is not
all that more slower than linear, after all.

May 1 '07 #12

Juha Nieminen

Alf P. Steinbach wrote:

Although off-topic in clc++m, this urban myth (an over-generalization of
a basic result) should not be propagated, so:

It's not an urban myth. It's the upper limit for generic sorting when
only less-than comparison between elements is available. Any algorithm
faster than that (in the general case) needs something more than just
a less-than comparison.

May 1 '07 #13

Alf P. Steinbach

* Juha Nieminen:

Alf P. Steinbach wrote:
>Although off-topic in clc++m, this urban myth (an over-generalization of
a basic result) should not be propagated, so:

It's not an urban myth.

It is. Note: you selected to snip the "it" referred to here. Please be
a bit more conscientious about quoting.

It's the upper limit for generic sorting when
only less-than comparison between elements is available. Any algorithm
faster than that (in the general case) needs something more than just
a less-than comparison.

Presumably you're here referring to a different "it" than above, namely
just the expression O(n log n).

Even so, your statement, as it is formulated, is incorrect or
meaningless, depending on how it's interpreted. My earlier posting in
this thread may provide some guidance, but I think I wrote what you
intended to write above. However, that is better discussed in e.g.
[comp.programming], if at all. Follow-ups set.

Cheers,

- Alf

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

May 1 '07 #14

by: Thomas R. Hummel | last post by:

Hello, I know that I've seen this question asked on here before, but I can't find an answer that gives me the performance that I need. I have a table that stores events for users: CREATE...

Microsoft SQL Server

How fast can we add 1000 or other number of records?

by: Lyle Fairfield | last post by:

This takes about 2 seconds on my rather obsolete machine: Option Explicit ' Test is a simple JET Table with four fields ' ID -> autonumber primary key ' Field1 -> Integer (maps to VBA long)...

Microsoft Access / VBA

Why in stdint.h have both least and fast integer types?

by: GS | last post by:

The stdint.h header definition mentions five integer categories, 1) exact width, eg., int32_t 2) at least as wide as, eg., int_least32_t 3) as fast as possible but at least as wide as, eg.,...

C / C++

Help with fast tree-like structure

by: David Hirschfield | last post by:

I've written a tree-like data structure that stores arbitrary python objects. The objective was for the tree structure to allow any number of children per node, and any number of root nodes...and...

Python

fast kd-tree or octree implementations for python

by: Neilen Marais | last post by:

Hi. I'm doing a FEM (Finite Elements) code in python. It uses a tetrahedral mesh to represent the geometry. For post-processing one specifies a list of 3D coordinates to calculate field values...

Python

Text-Based Windows Library

by: hstagni | last post by:

Where can I find a library to created text-based windows applications? Im looking for a library that can make windows and buttons inside console.. Many old apps were make like this, i guess ...

C / C++

FIFO buffer with fast lookup

by: Spoon | last post by:

Hello, I'm wondering whether the STL defines a data structure with the following features: o provides push_front() and pop_back() (like std::list) to implement a FIFO buffer. o allows fast...

C / C++

Re: which datastructure for fast sorted insert?

by: I V | last post by:

On Sun, 25 May 2008 13:05:31 -0300, Gabriel Genellina wrote: That's worth doing if you need the data to be sorted after each insert. If the OP just needs the data to be sorted at the end, using a...

Python

Fast list traversal

by: dineshv | last post by:

I want to see if there is an alternative method for fast list traversal. The code is very simple: dict_long_lists = defaultdict(list) for long_list in dict_long_lists.itervalues() for element...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Fast STL data structure

Similar topics