443,652 Members | 1,335 Online
Need help? Post your question and get tips & solutions from a community of 443,652 IT Pros & Developers. It's quick & easy.

# Efficient data structures

 P: n/a Hi, I'm looking for a data structure where I can store arbitrary elements and than efficiently check if an element (of same type as the stored elements) is contained in the list. The latter operation is performed pretty frequently, so the data structure I require must handle it with low komplexity. My idea was to use a STL set and then do something like myset.find( element2Search ) != myset.end() For sorted associative containers the "find" function has a logarithmic complexity. Are there any approaches with linear complexity? Regards, Chris Aug 15 '06 #1
16 Replies

 P: n/a [changed followup to c.l.c++] Christian Christmann wrote: Hi, I'm looking for a data structure where I can store arbitrary elements and than efficiently check if an element (of same type as the stored elements) is contained in the list. The latter operation is performed pretty frequently, so the data structure I require must handle it with low komplexity. My idea was to use a STL set and then do something like myset.find( element2Search ) != myset.end() For sorted associative containers the "find" function has a logarithmic complexity. Are there any approaches with linear complexity? A heterogenous container calls for boost::any. You would have to write your own compare function, however, for ordering in the set (do you really want uniqueness and if so, in what sense -- per type or per value? if not, you might consider a sorted std::vector

 P: n/a On Tue, 15 Aug 2006 07:24:13 -0700, mlimber wrote: >>My idea was to use a STL set and then do something likemyset.find( element2Search ) != myset.end()For sorted associative containers the "find" functionhas a logarithmic complexity. Are there any approacheswith linear complexity? A heterogenous container calls for boost::any. You would have to write your own compare function, however, for ordering in the set (do you really want uniqueness and if so, in what sense -- per type or per value? if not, you might consider a sorted std::vector

 P: n/a Christian Christmann wrote: On Tue, 15 Aug 2006 07:24:13 -0700, mlimber wrote: >>>My idea was to use a STL set and then do something likemyset.find( element2Search ) != myset.end()For sorted associative containers the "find" functionhas a logarithmic complexity. Are there any approacheswith linear complexity? A heterogenous container calls for boost::any. You would have towrite your own compare function, however, for ordering in the set(do you really want uniqueness and if so, in what sense -- per typeor per value? if not, you might consider a sortedstd::vector

 P: n/a Christian Christmann wrote: Maybe I didn't specify my requirements too precise. What I need is a "template-based" data structure where all stored elements are of the same type. The order of the elements in the data structure is irrelevant. You could use any of the containers in the STL. Which one you choose will depend on how you will use it (e.g., will there be a lot of insertions or deletions, or just one populating on startup). If uniqueness is necessary, you probably want std::set. If not, you might consider std::multiset or std::vector (which you'd have to keep sorted yourself). You can't beat logarithmic complexity in searching (unless you already know the index into the data array, but then it's not really a "search"). Cheers! --M Aug 15 '06 #5

 P: n/a In article <11**********************@b28g2000cwb.googlegroups .com>, ml*****@gmail.com says... [ ... ] You could use any of the containers in the STL. Which one you choose will depend on how you will use it (e.g., will there be a lot of insertions or deletions, or just one populating on startup). If uniqueness is necessary, you probably want std::set. If not, you might consider std::multiset or std::vector (which you'd have to keep sorted yourself). You can't beat logarithmic complexity in searching (unless you already know the index into the data array, but then it's not really a "search"). Actually, you often can beat logarithmic. Hash tables have constant expected complexity. You can also use an interpolating search, which typically has substantially lower complexity than logarithmic as well. A binary search ignores much of the information that's really available -- it blindly assumes that the best guess it can make at the location is the middle of the available range. An interpolating search is much closer to the way most people would (for example) look something up in a dictionary. If you're looking up 'cat', you know you can start fairly close to the beginning. If you're looking up 'victory', you know you can start fairly close to the end. Your first attempt usually won't be the right page, but you can (again) usually make quite a bit better of a guess than simply the middle of the range. Obviously, this _can_ backfire -- its worst-case complexity is quite poor. You can, however, do something like an Introsort, and switch to a plain binary search if you find that it's not working well for a particular search. Note that an interpolating search requires a random acces iterator -- it generally doesn't work in something like a tree structure. -- Later, Jerry. The universe is a figment of its own imagination. Aug 15 '06 #6

 P: n/a Besides hashes, you can beat logaritmic complexity in searches depending on specifics of your data. radix sort, bucket sort, counting sort are all linear. Tolga Ceylan Aug 15 '06 #7

 P: n/a Besides hashes, you can beat logaritmic complexity in searches depending on specifics of your data. radix sort, bucket sort, counting sort are all linear. Linear (O(N)) is worse that logarithmic (O(log N)) on average. Perhaps you're thinking of O(N log N) when you say "logarithmic"? Or perhaps you mean constant time (O(1)) when you say "linear"? -Howard Aug 15 '06 #8

 P: n/a Jerry Coffin wrote: In article <11**********************@b28g2000cwb.googlegroups .com>, ml*****@gmail.com says... [ ... ] You could use any of the containers in the STL. Which one you choose will depend on how you will use it (e.g., will there be a lot of insertions or deletions, or just one populating on startup). If uniqueness is necessary, you probably want std::set. If not, you might consider std::multiset or std::vector (which you'd have to keep sorted yourself). You can't beat logarithmic complexity in searching (unless you already know the index into the data array, but then it's not really a "search"). Actually, you often can beat logarithmic. Hash tables have constant expected complexity. Ok, I should have said: "You can't beat logarithmic complexity with the current standard library." (Hashes are part of TR1, however.) In any case, while hashes can certainly be helpful, their worst-case guarantee O(n) is obviously worse than logarithmic performance. In addition, creating a good hash function isn't a trivial task (a bad one can severly degrade performance), and the computations required to evaluate the hash function can be slow. Less theoretically, hashes can decrease locality of reference, which may degrade performance on particular systems. Suffice it to say, there are trade-offs involved in choosing data structures and algorithms. You can also use an interpolating search, which typically has substantially lower complexity than logarithmic as well. A binary search ignores much of the information that's really available -- it blindly assumes that the best guess it can make at the location is the middle of the available range. An interpolating search is much closer to the way most people would (for example) look something up in a dictionary. If you're looking up 'cat', you know you can start fairly close to the beginning. If you're looking up 'victory', you know you can start fairly close to the end. Your first attempt usually won't be the right page, but you can (again) usually make quite a bit better of a guess than simply the middle of the range. Obviously, this _can_ backfire -- its worst-case complexity is quite poor. You can, however, do something like an Introsort, and switch to a plain binary search if you find that it's not working well for a particular search. Note that an interpolating search requires a random acces iterator -- it generally doesn't work in something like a tree structure. You can beat logarithmic complexity in average complexity but, as far as I know, not in worst-case complexity or with standard library functions. Cheers! --M Aug 15 '06 #9

 P: n/a Oppps... "logarithmic" is the wrong word here. It should have been N log2 N By linear, I mean O(N). Howard wrote: > Linear (O(N)) is worse that logarithmic (O(log N)) on average. Perhaps you're thinking of O(N log N) when you say "logarithmic"? Or perhaps you mean constant time (O(1)) when you say "linear"? -Howard Thanks for the fix. Tolga Ceylan Aug 15 '06 #10

 P: n/a Also, I meant "in sorting" not "in searching". My posting is not very related with the original question. I guess in this case hashes O(1) can beat O(log N) complexity of the sets for the 'lookup' operation. (assuming sets are typically implemented with red-black trees.) (Shouldn't have posted at all.. :-}} ) Tolga Ceylan Howard wrote: > Linear (O(N)) is worse that logarithmic (O(log N)) on average. Perhaps you're thinking of O(N log N) when you say "logarithmic"? Or perhaps you mean constant time (O(1)) when you say "linear"? -Howard Aug 15 '06 #11

 P: n/a Christian Christmann wrote: Hi, I'm looking for a data structure where I can store arbitrary elements and than efficiently check if an element (of same type as the stored elements) is contained in the list. The latter operation is performed pretty frequently, so the data structure I require must handle it with low komplexity. My idea was to use a STL set and then do something like myset.find( element2Search ) != myset.end() For sorted associative containers the "find" function has a logarithmic complexity. Are there any approaches with linear complexity? Why would you want linear complexity when log is clearly faster? Or did you mean constant time complexity? The canonical data structure for this sort of problem is a hash table. Unfortunately this is not (yet) part of standard C++ but it is a commonly provided extension. Aug 15 '06 #12

 P: n/a to***********@yahoo.com wrote: Howard wrote: >Linear (O(N)) is worse that logarithmic (O(log N)) on average. Perhapsyou're thinking of O(N log N) when you say "logarithmic"? Or perhaps youmean constant time (O(1)) when you say "linear"? Oppps... "logarithmic" is the wrong word here. It should have been N log2 N Just a small thing: O(N log N) == O(N log2 N). This is because Big-O does not care about constant factors. N log2 N == N (log N / log 2) == (1/log 2) * N log N 1/log 2 is a constant so it can be dropped. -- Marcus Kwok Replace 'invalid' with 'net' to reply Aug 15 '06 #13

 P: n/a In article <11********************@m73g2000cwd.googlegroups.c om>, ml*****@gmail.com says... [ ... ] Ok, I should have said: "You can't beat logarithmic complexity with the current standard library." (Hashes are part of TR1, however.) In any case, while hashes can certainly be helpful, their worst-case guarantee O(n) is obviously worse than logarithmic performance. While the hash tables included in TR1 may have O(N) complexity in the worst case, a hash table can be designed to provide logarithmic worst- case complexity. [ ... ] You can beat logarithmic complexity in average complexity but, as far as I know, not in worst-case complexity or with standard library functions. That sounds reasonable to me. -- Later, Jerry. The universe is a figment of its own imagination. Aug 16 '06 #14

 P: n/a Marcus Kwok wrote: > Just a small thing: O(N log N) == O(N log2 N). This is because Big-O does not care about constant factors. N log2 N == N (log N / log 2) == (1/log 2) * N log N 1/log 2 is a constant so it can be dropped. I suspect that O(N log2 N) actually means O(N log^2 N) Aug 16 '06 #15

 P: n/a Jerry Coffin wrote: While the hash tables included in TR1 may have O(N) complexity in the worst case, a hash table can be designed to provide logarithmic worst- case complexity. Right you are, though of course it comes as another trade-off. Cheers! --M Aug 16 '06 #16

 P: n/a red floyd Just a small thing: O(N log N) == O(N log2 N). This is because Big-Odoes not care about constant factors. N log2 N == N (log N / log 2) == (1/log 2) * N log N 1/log 2 is a constant so it can be dropped. I suspect that O(N log2 N) actually means O(N log^2 N) You could be right, I was just using the common convention that people use to indicate the base. E.g., log10 is base-10 logarithm, so log2 would be base-2 logarithm. -- Marcus Kwok Replace 'invalid' with 'net' to reply Aug 16 '06 #17

### This discussion thread is closed

Replies have been disabled for this discussion.