473,399 Members | 2,478 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,399 software developers and data experts.

simple list efficiency question

Hi,

I have a few million data items being received by my program and I wish to
store them in a list, with each item being inserted in any position in the
list. Any performance tips so that my program runs ok?

I have checked out the STL and have thought of using vector, set, list,
deque and map. I dont know how these things are implemented so not sure
which is best to use or if i should implement another data structure.
Perhaps any or all of them can be used with no big problem.

I am just worried in case I waste effort coding with them and then finding i
need to implement something else because it runs so slow. I'm a novice
using c++ so it would be nice to know up front before I start learning how
to write it.

Thanks for your help.
jason
Jul 22 '05 #1
10 3021
Hello,

"Jason" <ja***********@btinternet.com> writes:
I have a few million data items being received by my program and I wish to
store them in a list, with each item being inserted in any position in the
list. Any performance tips so that my program runs ok? I have checked out the STL and have thought of using vector, set, list,
deque and map. I dont know how these things are implemented so not sure
which is best to use or if i should implement another data structure.
Perhaps any or all of them can be used with no big problem.


If I understand correctly you want to store these objects in a sorted
fashion. For a number like a few million my guess would be that you can
get away with storing them in a vector and after they are all read in
sorting them. Because sorting is (if the order in which the objects are
coming in is not too bad) an N*log(N) operation this should not take
prohibitively long. Furthermore, it is difficult to imagine that you
could be faster than N*log(N).

Good luck,
Chris Dams
Jul 22 '05 #2
ch****@gamow.sci.kun.nl (Chris Dams) writes:
If I understand correctly you want to store these objects in a sorted
fashion. For a number like a few million my guess would be that you can
get away with storing them in a vector and after they are all read in
sorting them. Because sorting is (if the order in which the objects are
coming in is not too bad) an N*log(N) operation this should not take
prohibitively long. Furthermore, it is difficult to imagine that you
could be faster than N*log(N).


Forgot to mention that if you use a vector you should not forget to
call the .reserve() method with a sensible number.
Jul 22 '05 #3
Chris Dams wrote:

Hello,

"Jason" <ja***********@btinternet.com> writes:
I have a few million data items being received by my program and I wish to
store them in a list, with each item being inserted in any position in the
list. Any performance tips so that my program runs ok?

I have checked out the STL and have thought of using vector, set, list,
deque and map. I dont know how these things are implemented so not sure
which is best to use or if i should implement another data structure.
Perhaps any or all of them can be used with no big problem.


If I understand correctly you want to store these objects in a sorted
fashion. For a number like a few million my guess would be that you can
get away with storing them in a vector and after they are all read in
sorting them. Because sorting is (if the order in which the objects are
coming in is not too bad) an N*log(N) operation this should not take
prohibitively long. Furthermore, it is difficult to imagine that you
could be faster than N*log(N).


I would cross check with a map also.

--
Karl Heinz Buchegger
kb******@gascad.at
Jul 22 '05 #4
On Thu, 4 Dec 2003 15:42:22 +0000 (UTC), "Jason"
<ja***********@btinternet.com> wrote:
Hi,

I have a few million data items being received by my program and I wish to
store them in a list, with each item being inserted in any position in the
list. Any performance tips so that my program runs ok?


How do you decide where to insert? If it is based on some kind of
ordering, then set or map would be ideal.

If you are inserting in the middle, then vector and deque are not
good.

If you are inserting at a particular "index", then list isn't great,
since finding that index is O(n), but it will probably be better than
vector or deque.

So, basically, it depends on the details of the algorithm...

Tom

C++ FAQ: http://www.parashift.com/c++-faq-lite/
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
Jul 22 '05 #5
> I have a few million data items being received by my program and I wish
to
store them in a list, with each item being inserted in any position in the list. Any performance tips so that my program runs ok?

I have checked out the STL and have thought of using vector, set, list,
deque and map.

I dont know how these things are implemented so not sure
which is best to use or if i should implement another data structure.
Perhaps any or all of them can be used with no big problem.

I am just worried in case I waste effort coding with them and then finding i need to implement something else because it runs so slow. I'm a novice
using c++ so it would be nice to know up front before I start learning how to write it.


When selecting a container the choice should be based on:
1. How are elements accessed (sequentially, random, by key...);
2. Which operations are to be performed on the container;
3. How often will each of those operations be performed;
4. The number of elements will be stored in the container (if the number is
low just pick the most convenient one).

The C++ standard doesn't directly dictate how the standard containers
should be implemented, but it does specify the complexity of the operations
on the container (which limits the possible choices for a container
implementation). These complexity guarantees can be used as guideline for
selecting the right container for the job.

Given that the container should hold a few million elements and that
elements will be inserted at arbitrary positions makes the std::vector
class a poor choice; if an element needs to be inserted at the first
position all those million elements need to be moved in memory. OTOH the
advantage of the std::vector class is that random access of elements is
very fast.

The std::list class (typically implemented as doubly linked list) may be a
good choice if insert performance is very important for your application.
However the std::list class doesn't support random access to individual
elements, they can only be accessed sequentially. If you need to access
element 1923402, std::list is not a good choice. The std::deque class would
have been a choice given that it does support random access, but its insert
performance is only good when inserting elements at the beginning or the
end.

If you need to access elements by key the std::map class may be a good
choice (this class is often based on red-black tree algorithm). With the
std::map you don't have control over the ordering of elements, as the
ording of the elements will be such that a key lookup can be done in O(log
N). An alternative for std::map may be the non-standard (but common)
hash_map class; with proper hashing it can do key lookups in O(1).

You may just start with one of the standard containers. If none of those
containers meet your performance requirements you may eventually replace
the standard container with your own container later. Note that one of the
nice features of standard containers is that they can be easilly exchanged
for each other.
--
Peter van Merkerk
peter.van.merkerk(at)dse.nl


Jul 22 '05 #6
Sorry, I didnt make myself too clear. I mean when I receive items they need
to be inserted into a vector in any position, not sorted once they are all
there. For example, item 1 is processed and placed at position 0. Item 2
is processed and placed at position 0 causing the item already there to be
shifted up to position 1 and so on, each new item causing a new insert
operation as opposed to being added at the end of the list.

so if I had a million data items already in and the next one needed to go in
somewhere in the middle or even at the beginning, thinking about it as
shifting everything up one then leads to a terrible situation of gross
inefficiency.

I was wondering for example if I could get away with using
vector.insert(nextitem, positionRequired) on such a huge list without
causing the program to be massively slow? Or would a set, list or map be
any better?

Thanks in advance

"Jason" <ja***********@btinternet.com> wrote in message
news:bq**********@sparta.btinternet.com...
Hi,

I have a few million data items being received by my program and I wish to
store them in a list, with each item being inserted in any position in the
list. Any performance tips so that my program runs ok?

I have checked out the STL and have thought of using vector, set, list,
deque and map. I dont know how these things are implemented so not sure
which is best to use or if i should implement another data structure.
Perhaps any or all of them can be used with no big problem.

I am just worried in case I waste effort coding with them and then finding i need to implement something else because it runs so slow. I'm a novice
using c++ so it would be nice to know up front before I start learning how
to write it.

Thanks for your help.
jason

Jul 22 '05 #7
Thanks, reading your answer, because accessing elements is not such a worry
and insertion performance is needed I should use List. thank god it's that
easy :) phew

"Peter van Merkerk" <me*****@deadspam.com> wrote in message
news:bq*************@ID-133164.news.uni-berlin.de...
I have a few million data items being received by my program and I wish to
store them in a list, with each item being inserted in any position in

the
list. Any performance tips so that my program runs ok?

I have checked out the STL and have thought of using vector, set, list,
deque and map.

I dont know how these things are implemented so not sure
which is best to use or if i should implement another data structure.
Perhaps any or all of them can be used with no big problem.

I am just worried in case I waste effort coding with them and then

finding i
need to implement something else because it runs so slow. I'm a novice
using c++ so it would be nice to know up front before I start learning

how
to write it.


When selecting a container the choice should be based on:
1. How are elements accessed (sequentially, random, by key...);
2. Which operations are to be performed on the container;
3. How often will each of those operations be performed;
4. The number of elements will be stored in the container (if the number

is low just pick the most convenient one).

The C++ standard doesn't directly dictate how the standard containers
should be implemented, but it does specify the complexity of the operations on the container (which limits the possible choices for a container
implementation). These complexity guarantees can be used as guideline for
selecting the right container for the job.

Given that the container should hold a few million elements and that
elements will be inserted at arbitrary positions makes the std::vector
class a poor choice; if an element needs to be inserted at the first
position all those million elements need to be moved in memory. OTOH the
advantage of the std::vector class is that random access of elements is
very fast.

The std::list class (typically implemented as doubly linked list) may be a
good choice if insert performance is very important for your application.
However the std::list class doesn't support random access to individual
elements, they can only be accessed sequentially. If you need to access
element 1923402, std::list is not a good choice. The std::deque class would have been a choice given that it does support random access, but its insert performance is only good when inserting elements at the beginning or the
end.

If you need to access elements by key the std::map class may be a good
choice (this class is often based on red-black tree algorithm). With the
std::map you don't have control over the ordering of elements, as the
ording of the elements will be such that a key lookup can be done in O(log
N). An alternative for std::map may be the non-standard (but common)
hash_map class; with proper hashing it can do key lookups in O(1).

You may just start with one of the standard containers. If none of those
containers meet your performance requirements you may eventually replace
the standard container with your own container later. Note that one of the
nice features of standard containers is that they can be easilly exchanged
for each other.
--
Peter van Merkerk
peter.van.merkerk(at)dse.nl


Jul 22 '05 #8
On Thu, 4 Dec 2003 17:19:39 +0000 (UTC), "Jason"
<ja***********@btinternet.com> wrote:
Sorry, I didnt make myself too clear. I mean when I receive items they need
to be inserted into a vector in any position, not sorted once they are all
there. For example, item 1 is processed and placed at position 0. Item 2
is processed and placed at position 0 causing the item already there to be
shifted up to position 1 and so on, each new item causing a new insert
operation as opposed to being added at the end of the list.

so if I had a million data items already in and the next one needed to go in
somewhere in the middle or even at the beginning, thinking about it as
shifting everything up one then leads to a terrible situation of gross
inefficiency.

I was wondering for example if I could get away with using
vector.insert(nextitem, positionRequired) on such a huge list without
causing the program to be massively slow? Or would a set, list or map be
any better?


( People around here are going to hang me by the balls for saying
this, yet, someone has to say it ;-)

What you should *really* use is std::hashmap

It is NOT part of the Standard Template Library, but if you have such
large quantities of objects that need to be inserted in sorted order,
and especially if you need to find them quickly, nothing beats it.

One of the good things about hashmap is that, as with list, elements
are never moved around in memory, and your iterators are not
invalidated when elements are inserted or removed.

Hashmap takes a bit of work to setup: you need to give it functors for
magnitude and equality comparisons, as well as a hash function.
A word of advice, though: use a rather large bucket table, if you
have so many objects!, because after identifying a bucket by the hash
function, searching is slower across the elements in that bucket.
So, if you have "millions" of elements, I'd use a bucket count as
large as tens of thousands or more.

Cheers!
Jul 22 '05 #9
Jason wrote:

Hi,

I have a few million data items being received by my program and I wish to
store them in a list, with each item being inserted in any position in the
list. Any performance tips so that my program runs ok?


It depends on some more details of how this container is going to be
used.

If the container is going to be initialized all at once and nothing
added later, then appending to a list or a vector and sorting after all
the elements have been added is faster. If there will be lookups
interspersed with additions (i.e. the container has to be in sorted
order while the elements are being added) then a set or map will be
slightly faster (and much simpler to use).

In both cases, though, having millions of elements raises the issue of
memory consumption. If the elements are large it doesn't matter much --
the container's overhead will be dwarfed by the size of the elements.
But if they're down near 100 bytes or less then memory overhead becomes
significant. A set or map typically holds three pointers plus a couple
of bits in each node, in addition to the actual data. A list holds two
pointers in each node. A vector doesn't have any per node overhead (but
it ties up a large contiguous block of memory).

--

Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)
Jul 22 '05 #10
> I have a few million data items being received by my program and I wish to
store them in a list, with each item being inserted in any position in the
list. Any performance tips so that my program runs ok?


Since you need to insert data at any position, vector and deque will
give very poor performance.

If you need random access to any entry and the order of entries
is not important, use map.

If the order of entries is important and you do not need random
access to any entry, use list.

Sandeep
--
http://www.EventHelix.com/EventStudio
EventStudio 2.0 - Generate Sequence Diagrams and Use Case Diagrams in PDF
Jul 22 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Wellu Mäkinen | last post by:
Hi all, let's say I have a need to store 4 int values in somekind of a structure. I have atleast two options: class A { public: int a, b, c, b;
11
by: velthuijsen | last post by:
I tried taking a list and pass it through std::sort like the following: sort(Unsorted.begin(), Unsorted.end()); I got an error back stating that the list iterator doesn't have a binary...
9
by: mkdunaway3 | last post by:
I'm just starting out here and I keep running into a basic problem. I build a set of tables: First Names, Last Names, Member Status. I bring these together in a table, Persons' Name & Status,...
8
by: Ross A. Finlayson | last post by:
I'm trying to write some C code, but I want to use C++'s std::vector. Indeed, if the code is compiled as C++, I want the container to actually be std::vector, in this case of a collection of value...
4
by: jgabbai | last post by:
Hi, What is the best way to white list a set of allowable characters using regex or replace? I understand it is safer to whitelist than to blacklist, but am not sure how to go about it. Many...
1
by: Tomás | last post by:
dynamic_cast can be used to obtain a pointer or to obtain a reference. If the pointer form fails, then you're left with a null pointer. If the reference form fails, then an exception is thrown....
9
by: Rick | last post by:
I have a large list of objects where each object has a unique (non-overlapping) date range. The list is sorted chronologically. What is the most efficient way to search this list for a single...
12
by: Mark S. | last post by:
Hello, The app in question is lives on a Windows 2003 server with .NET 2.0 running IIS 6. The page of the app in question processes 2000 get requests a second during peak loads. The app uses...
1
by: Itanium | last post by:
Hi all! I'm new to .NET Platform and got some simple questions about efficiency... To put you in situation, to say that I'm involved in the writing of a complex regex based lexer for use over...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.