Which is more efficient:STL:Set or array?

CodeCracker

Problem details below:
I have few items(simple values or objects) to be put into an array and
I implement it through a set rather than just an array of items. This
is because every time I get a new item I have to look into the Set
whether it is there or not. Depending on whether it is there or not I
have to respond differntly.
Since it is easy to search into a Set rather than a number of if
statement which will compare each of the element in the array with the
given new item I chose Set but worrying about the efficiency
comparision of them.

Thanks,
CodeCracker

Jul 23 '05 #1

Subscribe Post Reply

2883

Larry I Smith

CodeCracker wrote:

Problem details below:
I have few items(simple values or objects) to be put into an array and
I implement it through a set rather than just an array of items. This
is because every time I get a new item I have to look into the Set
whether it is there or not. Depending on whether it is there or not I
have to respond differntly.
Since it is easy to search into a Set rather than a number of if
statement which will compare each of the element in the array with the
given new item I chose Set but worrying about the efficiency
comparision of them.

Thanks,
CodeCracker

Don't worry about it.

--
Anti-spam address, change each 'X' to '.' to reply directly.

Jul 23 '05 #2

Rapscallion

Larry I Smith wrote:

Don't worry about it.

I like your answer. May I quote you?

Jul 23 '05 #3

Larry I Smith

Rapscallion wrote:

Larry I Smith wrote:
Don't worry about it.

I like your answer. May I quote you?

Sure. I get blamed for a lot... (:

Lary

--
Anti-spam address, change each 'X' to '.' to reply directly.

Jul 23 '05 #4

Larry I Smith

CodeCracker wrote:

Problem details below:
I have few items(simple values or objects) to be put into an array and
I implement it through a set rather than just an array of items. This
is because every time I get a new item I have to look into the Set
whether it is there or not. Depending on whether it is there or not I
have to respond differntly.
Since it is easy to search into a Set rather than a number of if
statement which will compare each of the element in the array with the
given new item I chose Set but worrying about the efficiency
comparision of them.

Thanks,
CodeCracker

Besides the QT manuals, you might get some good help in the
KDE newsgroup. Some smart QT developers hang out there.

comp.windows.x.kde

Larry

--
Anti-spam address, change each 'X' to '.' to reply directly.

Jul 23 '05 #5

CodeCracker

Say in two situations :
1. I have 10 items
2. I have 50 or more items.
Still I should not worry about efficiency in either of them. I can go
ahead with either STL:set or array implementation.

Jul 23 '05 #6

E. Robert Tisdale

CodeCracker wrote:

Problem details below:
I have few items(simple values or objects) to be put into an array
and I implement it through a set rather than just an array of items.
This is because every time I get a new item,
I have to look into the Set whether it is there or not.
Depending on whether it is there or not I have to respond differntly.
Since it is easy to search into a Set
rather than a number of if statement[s]
which will compare each of the element in the array with the given new item,
I chose Set but worrying about the efficiency comparision of them.

It sounds like the objects that you are describing *are* sets
so a set class template should be the most efficient representation.
These templates are part of the standard library for a reason.
The reason is to give the compiler developer an opportunity
to provide a highly optimized implementation.
These implementations are usually difficult to beat
with any "hand rolled" implementation.

As a C++ programmer, your job is to match the containers
required by your program with standard container class templates.
All of the classic Abstract Data Types (ADTs) are implemented
in the standard library so this shouldn't be a problem.
If you can't find a perfect match,
you should try to adapt one of the standard class templates
(by deriving an appropriate class [template] from it for example.)

Only rarely is it necessary
to design a new container class from scratch.
If you find yourself doing this,
you probably need to re-examine your requirements
and convince yourself that you haven't just made a big mistake.

Jul 23 '05 #7

CodeCracker

I think I wasn't clear. Let me rehearse what I was telling.
I have some items( either simple values as int, float etc or objects (
some other classes of mine say FileClass).
Now every time somebody calls my function DoSomething() I do a search
inside my array to find that item and if it is found I call DoThis()
and if not found I call DoThat().
But the problem here is that if I use an array implementation I have to
write a whole lot of If statement or write a loop where it goes in
search of the current item in the array, for which it will make a whole
lot of comparison. Instead of doing all this if I use the STL:set
containers and insert my items once there and then search for my
current item every time in my set will it not be more efficient than
my array implementation.
Remember I am not creating my container from scratch and using the
existing containers only.

Jul 23 '05 #8

Pete Becker

CodeCracker wrote:

I think I wasn't clear. Let me rehearse what I was telling.
I have some items( either simple values as int, float etc or objects (
some other classes of mine say FileClass).
Now every time somebody calls my function DoSomething() I do a search
inside my array to find that item and if it is found I call DoThis()
and if not found I call DoThat().
But the problem here is that if I use an array implementation I have to
write a whole lot of If statement or write a loop where it goes in
search of the current item in the array, for which it will make a whole
lot of comparison. Instead of doing all this if I use the STL:set
containers and insert my items once there and then search for my
current item every time in my set will it not be more efficient than
my array implementation.
Remember I am not creating my container from scratch and using the
existing containers only.

If you're going to populate your array once and then not change it
you're probably better off with a vector, because it uses less memory.

vector<whatever> vec(first, last);
sort(vec.begin(), vec.end());

when you need to find something, use

find(vec.begin(), vec.end(), X)

Even if you're going to add or remove things occasionally, this is
probably better. set wins when you add and remove things fairly often.

--

Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

Jul 23 '05 #9

Larry I Smith

CodeCracker wrote:

Say in two situations :
1. I have 10 items
2. I have 50 or more items.
Still I should not worry about efficiency in either of them. I can go
ahead with either STL:set or array implementation.

yes

--
Anti-spam address, change each 'X' to '.' to reply directly.

Jul 23 '05 #10

CodeCracker

But if while adding I want to be sure that the item in the containers
are all unique then still vector be preferred.

Jul 23 '05 #11

Kai-Uwe Bux

CodeCracker wrote:

But if while adding I want to be sure that the item in the containers
are all unique then still vector be preferred.

You could use a set while you are inserting items. Once that is finished,
you copy the set into a vector, preserving the order of the items.
Afterwards looking up items in the vector will be fast using binary search.
The costs for the copy operation are very small compared to the savings
provided you have a lot of lookups. Also, this is only applicable if no
further changes to the data structure need to be done in the lookup-phase.
Best

Kai-Uwe Bux

ps.: Do not worry about efficiency before you profile your code. Quite
possibly the real timesink is somewhere else.

Jul 23 '05 #12

Vyacheslav Kononenko

CodeCracker wrote:

But if while adding I want to be sure that the item in the containers
are all unique then still vector be preferred.

Just use std::set so your code will be simplier and most probably it
will be faster than you need. And for amount of ~50 elements you would
hardly measure the difference of execution time btw std::vector and
std::set (maybe even std::list). So use std::set and if profiler will
show that there is a problem with std::set then try something else.

Regards,
Vyacheslav

Jul 23 '05 #13

E. Robert Tisdale

CodeCracker wrote:

But if while adding I want to be sure that
the item in the containers are all unique
then still vector be preferred.

No.
You are confused.
The object, as you have described it so far, is a set.
You should implement it with a set template class.
This will be the most efficient and portable implementation.
Using any other class template will almost certainly
result in a sub optimal implementation.

Study this object carefully.
It is either a set or it isn't.
If you can discover some reason why it isn't a set,
tell us and, perhaps, we can help you find a better match.

Jul 23 '05 #14

Richard Herring

In message <d6**********@nntp1.jpl.nasa.gov>, E. Robert Tisdale
<E.**************@jpl.nasa.gov> writes

CodeCracker wrote:
But if while adding I want to be sure that
the item in the containers are all unique
then still vector be preferred.
No.
You are confused.
The object, as you have described it so far, is a set.
You should implement it with a set template class.
This will be the most efficient and portable implementation.
Using any other class template will almost certainly
result in a sub optimal implementation.

Study this object carefully.
It is either a set or it isn't.

Wrong level of abstraction. It's not what it "is", but what it _does_,
that matters. The C++ library containers and algorithms support various
operations - insert, erase, search - with different efficiency
tradeoffs. Patterns of use also vary - do you have separate phases which
build then search, or are the operations intermixed?

What you want is the container which most efficiently supports your
pattern of use. That might very well be std::set, but it doesn't have to
be.
If you can discover some reason why it isn't a set,
tell us and, perhaps, we can help you find a better match.

Note that the STL set-oriented algorithms includes, set_union,
set_intersection, set_difference, set_symmetric_difference are designed
to work on any sorted sequence, not just a std::set.

--
Richard Herring

Jul 23 '05 #15

Richard Herring

In message <_q********************@rcn.net>, Pete Becker
<pe********@acm.org> writes

CodeCracker wrote:
I think I wasn't clear. Let me rehearse what I was telling.
I have some items( either simple values as int, float etc or objects (
some other classes of mine say FileClass).
Now every time somebody calls my function DoSomething() I do a search
inside my array to find that item and if it is found I call DoThis()
and if not found I call DoThat().
But the problem here is that if I use an array implementation I have to
write a whole lot of If statement or write a loop where it goes in
search of the current item in the array, for which it will make a whole
lot of comparison. Instead of doing all this if I use the STL:set
containers and insert my items once there and then search for my
current item every time in my set will it not be more efficient than
my array implementation.
Remember I am not creating my container from scratch and using the
existing containers only.

If you're going to populate your array once and then not change it
you're probably better off with a vector, because it uses less memory.

vector<whatever> vec(first, last);
sort(vec.begin(), vec.end());

And maybe vec.erase(unique(vec.begin(), vec.end()), vec.end()) to remove
duplicates?
when you need to find something, use

find(vec.begin(), vec.end(), X)
Since it's sorted, binary_search (if you just want to know if X is
there) or lower_bound (if you need an iterator to it) might be better.
Even if you're going to add or remove things occasionally, this is
probably better. set wins when you add and remove things fairly often.

--
Richard Herring

Jul 23 '05 #16

Pete Becker

E. Robert Tisdale wrote:

The object, as you have described it so far, is a set.
You should implement it with a set template class.
This will be the most efficient and portable implementation.
Using any other class template will almost certainly
result in a sub optimal implementation.

It depends on what's important. As I said earlier, a set will use more
memory than a vector of the same size. In addition, its implementation
requires more code.

--

Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

Jul 23 '05 #17

Pete Becker

Richard Herring wrote:

And maybe vec.erase(unique(vec.begin(), vec.end()), vec.end()) to remove
duplicates?

Sigh. Maybe. But designing from incomplete information is usually a
waste of time. I see no point in trying to fine-tune this example.

--

Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

Jul 23 '05 #18

Pete Becker

Richard Herring wrote:

In message <_q********************@rcn.net>, Pete Becker
<pe********@acm.org> writes

when you need to find something, use

find(vec.begin(), vec.end(), X)

Since it's sorted, binary_search (if you just want to know if X is
there) or lower_bound (if you need an iterator to it) might be better.

Good point. That was the idea, but I used the wrong algorithm.

--

Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

Jul 23 '05 #19

E. Robert Tisdale

Richard Herring wrote:

It's not what it "is", but what it _does_, that matters.

You are confused.

An object *is* what it *does*.

Jul 23 '05 #20

CodeCracker

I conclude that I can continue with set for now with a small set of
elements in it but if I am going to have a >50 elements in the set I
must consider vector and use the unique algo while deleting the
duplicates with erase.
I took the path of set for the ease of use and did not consider any
efficiency of it. I want to delay the effciency topic until my testing
shows I need to optimize my code here.
Is my conclusion correct?

Jul 23 '05 #21

Pete Becker

CodeCracker wrote:

I conclude that I can continue with set for now with a small set of
elements in it but if I am going to have a >50 elements in the set I
must consider vector and use the unique algo while deleting the
duplicates with erase.
I took the path of set for the ease of use and did not consider any
efficiency of it. I want to delay the effciency topic until my testing
shows I need to optimize my code here.
Is my conclusion correct?

Generally speaking, yes. On the other hand, if you're writing CAD
applications, every byte is important, and profiling is irrelevant:
always make things small. <g>

--

Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

Jul 23 '05 #22

sam

Pete Becker wrote:

CodeCracker wrote:
I think I wasn't clear. Let me rehearse what I was telling.
I have some items( either simple values as int, float etc or objects (
some other classes of mine say FileClass).
Now every time somebody calls my function DoSomething() I do a search
inside my array to find that item and if it is found I call DoThis()
and if not found I call DoThat().
But the problem here is that if I use an array implementation I have to
write a whole lot of If statement or write a loop where it goes in
search of the current item in the array, for which it will make a whole
lot of comparison. Instead of doing all this if I use the STL:set
containers and insert my items once there and then search for my
current item every time in my set will it not be more efficient than
my array implementation.
Remember I am not creating my container from scratch and using the
existing containers only.

If you're going to populate your array once and then not change it
you're probably better off with a vector, because it uses less memory.

vector<whatever> vec(first, last);
sort(vec.begin(), vec.end());

when you need to find something, use

find(vec.begin(), vec.end(), X)

It is sure this using find is very elegant and efficent. But I've been
feeling pluzzl about how to create X for find. Is there any complex
example for defining X? eg. the comparison is not a primitive type like
int, float, etc...

Sam.
Even if you're going to add or remove things occasionally, this is
probably better. set wins when you add and remove things fairly often.

Jul 23 '05 #23

sam

Pete Becker wrote:

Richard Herring wrote:

And maybe vec.erase(unique(vec.begin(), vec.end()), vec.end()) to
remove duplicates?

how about vec.clear()?

Sam

Sigh. Maybe. But designing from incomplete information is usually a
waste of time. I see no point in trying to fine-tune this example.

Jul 23 '05 #24

Pete Becker

sam wrote:

It is sure this using find is very elegant and efficent. But I've been
feeling pluzzl about how to create X for find. Is there any complex
example for defining X? eg. the comparison is not a primitive type like
int, float, etc...

The three-argument version of find (and of binary_search which, as was
pointed out, is the right one to use <g>) takes an object of the type
that the iterator points to. The four-argument version takes an object
of an arbitrary type and a predicate that compares the object that the
iterator points to with the passed object.

struct data
{
int value;
// other stuff
};

struct predicate
{
bool operator()(const data& obj, int val)
{ return obj.value == val; }

binary_search(vec.begin(), vec.end(), 3, predicate());

--

Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

Jul 23 '05 #25

Richard Herring

In message <d6**********@nntp1.jpl.nasa.gov>, E. Robert Tisdale
<E.**************@jpl.nasa.gov> writes

Richard Herring wrote:
It's not what it "is", but what it _does_, that matters.
You are confused.

Not in the least. I consider whether a class matches my requirements,
not what its name happens to be.
An object *is* what it *does*.
I see you're getting the point. What it does, not what it's named. "is"
was in quotes above for a reason, referring to someone who said:
It is either a set or it isn't.

when they probably meant something more like

It is either being used for its set-like behaviour or it isn't.

--
Richard Herring

Jul 23 '05 #26

Which is more efficient:STL:Set or array?

Similar topics