Efficient data structures

Christian Christmann

Hi,

I'm looking for a data structure where I can store
arbitrary elements and than efficiently check if an
element (of same type as the stored elements) is contained
in the list. The latter operation is performed pretty
frequently, so the data structure I require must handle
it with low komplexity.

My idea was to use a STL set and then do something like

myset.find( element2Search ) != myset.end()

For sorted associative containers the "find" function
has a logarithmic complexity. Are there any approaches
with linear complexity?

Regards,
Chris

Aug 15 '06 #1

Subscribe Post Reply

1901

mlimber

[changed followup to c.l.c++]

Christian Christmann wrote:

Hi,

I'm looking for a data structure where I can store
arbitrary elements and than efficiently check if an
element (of same type as the stored elements) is contained
in the list. The latter operation is performed pretty
frequently, so the data structure I require must handle
it with low komplexity.

My idea was to use a STL set and then do something like

myset.find( element2Search ) != myset.end()

For sorted associative containers the "find" function
has a logarithmic complexity. Are there any approaches
with linear complexity?

A heterogenous container calls for boost::any. You would have to write
your own compare function, however, for ordering in the set (do you
really want uniqueness and if so, in what sense -- per type or per
value? if not, you might consider a sorted std::vector<boost::anyand
then the standard search functions).

Cheers! --M

Aug 15 '06 #2

Christian Christmann

On Tue, 15 Aug 2006 07:24:13 -0700, mlimber wrote:

>>
My idea was to use a STL set and then do something like

myset.find( element2Search ) != myset.end()

For sorted associative containers the "find" function
has a logarithmic complexity. Are there any approaches
with linear complexity?

A heterogenous container calls for boost::any. You would have to write
your own compare function, however, for ordering in the set (do you
really want uniqueness and if so, in what sense -- per type or per
value? if not, you might consider a sorted std::vector<boost::anyand
then the standard search functions).

Maybe I didn't specify my requirements too precise. What I need is
a "template-based" data structure where all stored elements are of the
same type. The order of the elements in the data structure is
irrelevant.

Aug 15 '06 #3

Victor Bazarov

Christian Christmann wrote:

On Tue, 15 Aug 2006 07:24:13 -0700, mlimber wrote:

>>>
My idea was to use a STL set and then do something like

myset.find( element2Search ) != myset.end()

For sorted associative containers the "find" function
has a logarithmic complexity. Are there any approaches
with linear complexity?

A heterogenous container calls for boost::any. You would have to
write your own compare function, however, for ordering in the set
(do you really want uniqueness and if so, in what sense -- per type
or per value? if not, you might consider a sorted
std::vector<boost::anyand then the standard search functions).

Maybe I didn't specify my requirements too precise. What I need is
a "template-based" data structure where all stored elements are of the
same type. The order of the elements in the data structure is
irrelevant.

If you don't use the member notation (.find), you can use 'std::find'
with any container:

std::blah<mytypemycontainer; // replace 'blah' with 'vector'
// or 'list' or 'set' or 'deque' or ...
std::find(mycontainer.begin(), mycontainer.end(), myvalue);

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

Aug 15 '06 #4

mlimber

Christian Christmann wrote:

Maybe I didn't specify my requirements too precise. What I need is
a "template-based" data structure where all stored elements are of the
same type. The order of the elements in the data structure is
irrelevant.

You could use any of the containers in the STL. Which one you choose
will depend on how you will use it (e.g., will there be a lot of
insertions or deletions, or just one populating on startup). If
uniqueness is necessary, you probably want std::set. If not, you might
consider std::multiset or std::vector (which you'd have to keep sorted
yourself). You can't beat logarithmic complexity in searching (unless
you already know the index into the data array, but then it's not
really a "search").

Cheers! --M

Aug 15 '06 #5

Jerry Coffin

In article <11**********************@b28g2000cwb.googlegroups .com>,
ml*****@gmail.com says...

[ ... ]

You could use any of the containers in the STL. Which one you choose
will depend on how you will use it (e.g., will there be a lot of
insertions or deletions, or just one populating on startup). If
uniqueness is necessary, you probably want std::set. If not, you might
consider std::multiset or std::vector (which you'd have to keep sorted
yourself). You can't beat logarithmic complexity in searching (unless
you already know the index into the data array, but then it's not
really a "search").

Actually, you often can beat logarithmic. Hash tables have constant
expected complexity. You can also use an interpolating search, which
typically has substantially lower complexity than logarithmic as well.

A binary search ignores much of the information that's really available
-- it blindly assumes that the best guess it can make at the location is
the middle of the available range.

An interpolating search is much closer to the way most people would (for
example) look something up in a dictionary. If you're looking up 'cat',
you know you can start fairly close to the beginning. If you're looking
up 'victory', you know you can start fairly close to the end. Your first
attempt usually won't be the right page, but you can (again) usually
make quite a bit better of a guess than simply the middle of the range.

Obviously, this _can_ backfire -- its worst-case complexity is quite
poor. You can, however, do something like an Introsort, and switch to a
plain binary search if you find that it's not working well for a
particular search.

Note that an interpolating search requires a random acces iterator -- it
generally doesn't work in something like a tree structure.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Aug 15 '06 #6

tolgaceylanus

Besides hashes, you can beat logaritmic complexity in
searches depending on specifics of your data.
radix sort, bucket sort, counting sort are all linear.

Tolga Ceylan

Aug 15 '06 #7

Howard

<to***********@yahoo.comwrote in message
news:11**********************@m73g2000cwd.googlegr oups.com...

>
Besides hashes, you can beat logaritmic complexity in
searches depending on specifics of your data.
radix sort, bucket sort, counting sort are all linear.

Linear (O(N)) is worse that logarithmic (O(log N)) on average. Perhaps
you're thinking of O(N log N) when you say "logarithmic"? Or perhaps you
mean constant time (O(1)) when you say "linear"?

-Howard

Aug 15 '06 #8

mlimber

Jerry Coffin wrote:

In article <11**********************@b28g2000cwb.googlegroups .com>,
ml*****@gmail.com says...

[ ... ]

You could use any of the containers in the STL. Which one you choose
will depend on how you will use it (e.g., will there be a lot of
insertions or deletions, or just one populating on startup). If
uniqueness is necessary, you probably want std::set. If not, you might
consider std::multiset or std::vector (which you'd have to keep sorted
yourself). You can't beat logarithmic complexity in searching (unless
you already know the index into the data array, but then it's not
really a "search").

Actually, you often can beat logarithmic. Hash tables have constant
expected complexity.

Ok, I should have said: "You can't beat logarithmic complexity with the
current standard library." (Hashes are part of TR1, however.) In any
case, while hashes can certainly be helpful, their worst-case guarantee
O(n) is obviously worse than logarithmic performance. In addition,
creating a good hash function isn't a trivial task (a bad one can
severly degrade performance), and the computations required to evaluate
the hash function can be slow. Less theoretically, hashes can decrease
locality of reference, which may degrade performance on particular
systems. Suffice it to say, there are trade-offs involved in choosing
data structures and algorithms.

You can also use an interpolating search, which
typically has substantially lower complexity than logarithmic as well.

A binary search ignores much of the information that's really available
-- it blindly assumes that the best guess it can make at the location is
the middle of the available range.

An interpolating search is much closer to the way most people would (for
example) look something up in a dictionary. If you're looking up 'cat',
you know you can start fairly close to the beginning. If you're looking
up 'victory', you know you can start fairly close to the end. Your first
attempt usually won't be the right page, but you can (again) usually
make quite a bit better of a guess than simply the middle of the range.

Obviously, this _can_ backfire -- its worst-case complexity is quite
poor. You can, however, do something like an Introsort, and switch to a
plain binary search if you find that it's not working well for a
particular search.

Note that an interpolating search requires a random acces iterator -- it
generally doesn't work in something like a tree structure.

You can beat logarithmic complexity in average complexity but, as far
as I know, not in worst-case complexity or with standard library
functions.

Cheers! --M

Aug 15 '06 #9

tolgaceylanus

Oppps... "logarithmic" is the wrong word here. It should have been
N log2 N

By linear, I mean O(N).

Howard wrote:

>
Linear (O(N)) is worse that logarithmic (O(log N)) on average. Perhaps
you're thinking of O(N log N) when you say "logarithmic"? Or perhaps you
mean constant time (O(1)) when you say "linear"?

-Howard

Thanks for the fix.

Tolga Ceylan

Aug 15 '06 #10

tolgaceylanus

Also, I meant "in sorting" not "in searching". My posting is not very
related
with the original question. I guess in this case hashes O(1) can beat
O(log N)
complexity of the sets for the 'lookup' operation.

(assuming sets are typically implemented with red-black trees.)

(Shouldn't have posted at all.. :-}} )

Tolga Ceylan

Howard wrote:

>
Linear (O(N)) is worse that logarithmic (O(log N)) on average. Perhaps
you're thinking of O(N log N) when you say "logarithmic"? Or perhaps you
mean constant time (O(1)) when you say "linear"?

-Howard

Aug 15 '06 #11

Mark P

Christian Christmann wrote:

Hi,

I'm looking for a data structure where I can store
arbitrary elements and than efficiently check if an
element (of same type as the stored elements) is contained
in the list. The latter operation is performed pretty
frequently, so the data structure I require must handle
it with low komplexity.

My idea was to use a STL set and then do something like

myset.find( element2Search ) != myset.end()

For sorted associative containers the "find" function
has a logarithmic complexity. Are there any approaches
with linear complexity?

Why would you want linear complexity when log is clearly faster? Or did
you mean constant time complexity? The canonical data structure for
this sort of problem is a hash table. Unfortunately this is not (yet)
part of standard C++ but it is a commonly provided extension.

Aug 15 '06 #12

Marcus Kwok

to***********@yahoo.com wrote:

Howard wrote:
>Linear (O(N)) is worse that logarithmic (O(log N)) on average. Perhaps
you're thinking of O(N log N) when you say "logarithmic"? Or perhaps you
mean constant time (O(1)) when you say "linear"?

Oppps... "logarithmic" is the wrong word here. It should have been
N log2 N

Just a small thing: O(N log N) == O(N log2 N). This is because Big-O
does not care about constant factors.

N log2 N == N (log N / log 2) == (1/log 2) * N log N

1/log 2 is a constant so it can be dropped.

--
Marcus Kwok
Replace 'invalid' with 'net' to reply

Aug 15 '06 #13

Jerry Coffin

In article <11********************@m73g2000cwd.googlegroups.c om>,
ml*****@gmail.com says...

[ ... ]

Ok, I should have said: "You can't beat logarithmic complexity with the
current standard library." (Hashes are part of TR1, however.) In any
case, while hashes can certainly be helpful, their worst-case guarantee
O(n) is obviously worse than logarithmic performance.

While the hash tables included in TR1 may have O(N) complexity in the
worst case, a hash table can be designed to provide logarithmic worst-
case complexity.

[ ... ]

You can beat logarithmic complexity in average complexity but, as far
as I know, not in worst-case complexity or with standard library
functions.

That sounds reasonable to me.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Aug 16 '06 #14

red floyd

Marcus Kwok wrote:

>
Just a small thing: O(N log N) == O(N log2 N). This is because Big-O
does not care about constant factors.

N log2 N == N (log N / log 2) == (1/log 2) * N log N

1/log 2 is a constant so it can be dropped.

I suspect that O(N log2 N) actually means O(N log^2 N)

Aug 16 '06 #15

mlimber

Jerry Coffin wrote:

While the hash tables included in TR1 may have O(N) complexity in the
worst case, a hash table can be designed to provide logarithmic worst-
case complexity.

Right you are, though of course it comes as another trade-off.

Cheers! --M

Aug 16 '06 #16

Marcus Kwok

red floyd <no*****@here.dudewrote:

Marcus Kwok wrote:
>Just a small thing: O(N log N) == O(N log2 N). This is because Big-O
does not care about constant factors.

N log2 N == N (log N / log 2) == (1/log 2) * N log N

1/log 2 is a constant so it can be dropped.

I suspect that O(N log2 N) actually means O(N log^2 N)

You could be right, I was just using the common convention that people
use to indicate the base. E.g., log10 is base-10 logarithm, so log2
would be base-2 logarithm.

--
Marcus Kwok
Replace 'invalid' with 'net' to reply

Aug 16 '06 #17

by: laclac01 | last post by:

I have developed my own copy function for coping my own dynamic memory structure. It works, but I feel its not too efficient. There must be a quicker way to copy the data. In some of the...

C / C++

statically allocated data structures source?

by: Thomas Paul Diffenbach | last post by:

Can anyone point me to an open source library of /statically allocated/ data structures? I'm writing some code that would benefit from trees, preferably self balancing, but on an embedded system...

C / C++

Processing Bit Fields (flag) that represent request as quickly/efficient as possible...is there better approach?

by: mrhicks | last post by:

Hello all, I have a question regarding efficeny and how to find the best approach when trying to find flag with in a structure of bit fields. I have several structures which look similar to ...

C / C++

Efficient Data Structures

by: Curious | last post by:

Hi, I am searching for a data structure that stores key-value pairs in it. This data structure is to hold large amounts of key-value pairs, and so needs to be efficient both in insertion and...

C# / C Sharp

Efficient techniques to handle large binary files

by: pedagani | last post by:

Dear comp.lang.c++, I'm interested in knowing the general techniques used to handle large binary files (>10GB) efficiently such as tweaking with filebuf , etc. Reading chunk by chunk seems to be...

C / C++

Questions on Using Python to Teach Data Structures and Algorithms

by: efrat | last post by:

Hello, I'm planning to use Python in order to teach a DSA (data structures and algorithms) course in an academic institute. If you could help out with the following questions, I'd sure...

Python

Efficient way of querying datasets in c#

by: sql_er | last post by:

Guys, I have an XML file which is 233MB in size. It was created by loading 6 tables from an sql server database into a dataset object and then writing out the contents from this dataset into an...

C# / C Sharp

seeking highly efficient caches scheme for demanding engineering computing?

by: Luna Moon | last post by:

seeking highly efficient caches scheme for demanding engineering computing? HI all, To same the time of costly function evaluation, I want to explore the possibility of caching. Suppose in...

C / C++

How to make this program more efficient?

by: Bill David | last post by:

SUBJECT: How to make this program more efficient? In my program, a thread will check update from server periodically and generate a stl::map for other part of this program to read data from....

C / C++

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Efficient data structures

Similar topics