473,785 Members | 2,458 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Search a list/array of objects for specified criteria?

I have a large list of objects where each object has a unique
(non-overlapping) date range. The list is sorted chronologically . What is
the most efficient way to search this list for a single object that spans a
specified date?
Mar 10 '07 #1
9 3141
Hello Rick,
>I have a large list of objects where each object has a unique
(non-overlapping) date range. The list is sorted chronologically . What is
the most efficient way to search this list for a single object that spans
a specified date?
My suggestion below... you mention that the list is sorted
chronologically . I don't think that information is of any use here, as
long as you don't have any good information about the distribution of the
elements in the list, or the lengths of the spans covered by a typical
element. Of course you could implement an algorithm that looks at a first
element and tries to heuristically guess where the element you're looking
for may be in the list - but without knowing the total range of elements,
the number, the distribution and the lengths of the spans, I find it very
hard to say whether the algorithm would be any more efficient than the
basic one below.

If you want efficiency without having good estimates of these details,
maybe you should organize your data in a different way - using
dictionaries for certain ranges of data, for instance. Once more it's hard
to say what that structure should best look like, without knowing details
about the data you're expecting.

And finally, don't forget: it's usually better to write a reasonably good
algorithm and to optimize it only when it proves to be a problem.

[TestFixture]
public class Tests {
[Test]
public void Test( ) {
List<ClassValid WithinRangelist = new List<ClassValid WithinRange>(ne w
ClassValidWithi nRange[] {
new ClassValidWithi nRange(new DateTime(2007, 1, 1), new DateTime(2007,
1,15)),
new ClassValidWithi nRange(new DateTime(2007, 1, 16), new
DateTime(2007, 1,31)),
new ClassValidWithi nRange(new DateTime(2007, 2, 1), new DateTime(2007,
2,28))
});
ClassValidWithi nRange checkVal = new ClassValidWithi nRange(new
DateTime(2007, 3, 1), new DateTime(2007, 3, 31));
list.Add(checkV al);

ClassValidWithi nRange foundVal =
list.Find(deleg ate(ClassValidW ithinRange entry) {
return entry.IsValidOn (new DateTime(2007, 3, 15));
});

Assert.AreEqual (checkVal, foundVal);
}
}

public class ClassValidWithi nRange {
public ClassValidWithi nRange(DateTime from, DateTime to) {
this.from = from;
this.to = to;
}
public bool IsValidOn(DateT ime date) {
// Implement with your choice of <, <=, and >=
return from < date && to date;
}
private DateTime from;
public DateTime From {
get {
return from;
}
set {
from = value;
}
}
private DateTime to;
public DateTime To {
get {
return to;
}
set {
to = value;
}
}
}
Oliver Sturm
--
http://www.sturmnet.org/blog
Mar 10 '07 #2
Thanks Oliver.

In short, you recommend using the Find method of the List<Tclass. This
performs a linear search, which is basically what I'm doing now. It's
acceptable but hardly optimal. I believe that a binary search could be more
efficient, even where the distribution is uneven. The problem is that the
BinarySearch methods provided by .NET don't seem to support searching using
an IComparer alone.
"Oliver Sturm" <ol****@sturmne t.orgwrote in message
news:xn******** ********@msnews .microsoft.com. ..
Hello Rick,
>>I have a large list of objects where each object has a unique
(non-overlapping) date range. The list is sorted chronologically . What is
the most efficient way to search this list for a single object that spans
a specified date?

My suggestion below... you mention that the list is sorted
chronologically . I don't think that information is of any use here, as
long as you don't have any good information about the distribution of the
elements in the list, or the lengths of the spans covered by a typical
element. Of course you could implement an algorithm that looks at a first
element and tries to heuristically guess where the element you're looking
for may be in the list - but without knowing the total range of elements,
the number, the distribution and the lengths of the spans, I find it very
hard to say whether the algorithm would be any more efficient than the
basic one below.

If you want efficiency without having good estimates of these details,
maybe you should organize your data in a different way - using
dictionaries for certain ranges of data, for instance. Once more it's hard
to say what that structure should best look like, without knowing details
about the data you're expecting.

And finally, don't forget: it's usually better to write a reasonably good
algorithm and to optimize it only when it proves to be a problem.

[TestFixture]
public class Tests {
[Test]
public void Test( ) {
List<ClassValid WithinRangelist = new List<ClassValid WithinRange>(ne w
ClassValidWithi nRange[] {
new ClassValidWithi nRange(new DateTime(2007, 1, 1), new DateTime(2007,
1,15)),
new ClassValidWithi nRange(new DateTime(2007, 1, 16), new DateTime(2007,
1,31)),
new ClassValidWithi nRange(new DateTime(2007, 2, 1), new DateTime(2007,
2,28))
});
ClassValidWithi nRange checkVal = new ClassValidWithi nRange(new
DateTime(2007, 3, 1), new DateTime(2007, 3, 31));
list.Add(checkV al);

ClassValidWithi nRange foundVal = list.Find(deleg ate(ClassValidW ithinRange
entry) {
return entry.IsValidOn (new DateTime(2007, 3, 15));
});

Assert.AreEqual (checkVal, foundVal);
}
}

public class ClassValidWithi nRange {
public ClassValidWithi nRange(DateTime from, DateTime to) {
this.from = from;
this.to = to;
}
public bool IsValidOn(DateT ime date) {
// Implement with your choice of <, <=, and >=
return from < date && to date;
}
private DateTime from;
public DateTime From {
get {
return from;
}
set {
from = value;
}
}
private DateTime to;
public DateTime To {
get {
return to;
}
set {
to = value;
}
}
}
Oliver Sturm
--
http://www.sturmnet.org/blog

Mar 10 '07 #3
I think I found an answer. I could create an IComparer that compares the
upperbound of each date range. I can then use a BinarySearch to look for an
object with a specific upperbound (equal to my desired date). It will return
either the index of an exact match, or the bitwise complement of the index
next highest object, or the bitwise complement of the list count.

"Rick" <ri**@nospam.co mwrote in message
news:u6******** ******@TK2MSFTN GP06.phx.gbl...
Thanks Oliver.

In short, you recommend using the Find method of the List<Tclass. This
performs a linear search, which is basically what I'm doing now. It's
acceptable but hardly optimal. I believe that a binary search could be
more efficient, even where the distribution is uneven. The problem is that
the BinarySearch methods provided by .NET don't seem to support searching
using an IComparer alone.
"Oliver Sturm" <ol****@sturmne t.orgwrote in message
news:xn******** ********@msnews .microsoft.com. ..
>Hello Rick,
>>>I have a large list of objects where each object has a unique
(non-overlapping) date range. The list is sorted chronologically . What is
the most efficient way to search this list for a single object that spans
a specified date?

My suggestion below... you mention that the list is sorted
chronologicall y. I don't think that information is of any use here, as
long as you don't have any good information about the distribution of the
elements in the list, or the lengths of the spans covered by a typical
element. Of course you could implement an algorithm that looks at a first
element and tries to heuristically guess where the element you're looking
for may be in the list - but without knowing the total range of elements,
the number, the distribution and the lengths of the spans, I find it very
hard to say whether the algorithm would be any more efficient than the
basic one below.

If you want efficiency without having good estimates of these details,
maybe you should organize your data in a different way - using
dictionaries for certain ranges of data, for instance. Once more it's
hard to say what that structure should best look like, without knowing
details about the data you're expecting.

And finally, don't forget: it's usually better to write a reasonably good
algorithm and to optimize it only when it proves to be a problem.

[TestFixture]
public class Tests {
[Test]
public void Test( ) {
List<ClassVali dWithinRangelis t = new List<ClassValid WithinRange>(ne w
ClassValidWith inRange[] {
new ClassValidWithi nRange(new DateTime(2007, 1, 1), new DateTime(2007,
1,15)),
new ClassValidWithi nRange(new DateTime(2007, 1, 16), new DateTime(2007,
1,31)),
new ClassValidWithi nRange(new DateTime(2007, 2, 1), new DateTime(2007,
2,28))
});
ClassValidWith inRange checkVal = new ClassValidWithi nRange(new
DateTime(200 7, 3, 1), new DateTime(2007, 3, 31));
list.Add(check Val);

ClassValidWith inRange foundVal = list.Find(deleg ate(ClassValidW ithinRange
entry) {
return entry.IsValidOn (new DateTime(2007, 3, 15));
});

Assert.AreEqua l(checkVal, foundVal);
}
}

public class ClassValidWithi nRange {
public ClassValidWithi nRange(DateTime from, DateTime to) {
this.from = from;
this.to = to;
}
public bool IsValidOn(DateT ime date) {
// Implement with your choice of <, <=, and >=
return from < date && to date;
}
private DateTime from;
public DateTime From {
get {
return from;
}
set {
from = value;
}
}
private DateTime to;
public DateTime To {
get {
return to;
}
set {
to = value;
}
}
}
Oliver Sturm
--
http://www.sturmnet.org/blog


Mar 10 '07 #4
Hello Rick,
>In short, you recommend using the Find method of the List<Tclass. This
performs a linear search, which is basically what I'm doing now. It's
acceptable but hardly optimal. I believe that a binary search could be
more efficient, even where the distribution is uneven.
Hm... I'm not sure really - it's a mathematical exercise to prove whether
it would be more efficient under the circumstances or not. There are two
problems:

* the distribution may be seriously uneven, we (I at least) don't have any information on that
* the decision of whether or not a date falls in a given range is not only dependent on one distributed value, but on two different ones.

The second problem brings the length of ranges into the equation - if
there is some regularity to this, the task of estimating efficiency may
become easier.

In the end I think that theoretically a binary search could improve
efficiency, if done right. But as we seem to know practically nothing
about the content of the data list, it would be rather hard to implement
the binary search in a way that is more efficient on average than a linear
search. And of course there's always the storage to think about... linear
search could be made more efficient very easily by splitting the list
based on years, or something similar.

Other things I would think about... I think a linear search algorithm
should probably be efficient enough for a large majority of use cases. In
those cases where the number of objects would be too large to use linear
searching efficiently, it seems like a good question to ask why these huge
numbers of objects are being handled in-memory in the first place - a
relational database could make the search operation much easier and a lot
faster in these particular cases.
Oliver Sturm
--
http://www.sturmnet.org/blog
Mar 10 '07 #5
Hello Rick,
>I think I found an answer. I could create an IComparer that compares the
upperbound of each date range. I can then use a BinarySearch to look for
an object with a specific upperbound (equal to my desired date). It will
return either the index of an exact match, or the bitwise complement of
the index next highest object, or the bitwise complement of the list count.
Have fun :-)
Oliver Sturm
--
http://www.sturmnet.org/blog
Mar 10 '07 #6
"Oliver Sturm" <ol****@sturmne t.orgwrote in message
news:xn******** ********@msnews .microsoft.com. ..
Hello Rick,
>>In short, you recommend using the Find method of the List<Tclass.
This performs a linear search, which is basically what I'm doing now.
It's acceptable but hardly optimal. I believe that a binary search
could be more efficient, even where the distribution is uneven.

Hm... I'm not sure really - it's a mathematical exercise to prove
whether it would be more efficient under the circumstances or not.
There are two problems:

* the distribution may be seriously uneven, we (I at least) don't
have any information on that
* the decision of whether or not a date falls in a given range is not
only dependent on one distributed value, but on two different ones.
Neither of these points will effect the speed of the search in the
slightest.
Remember, we are not talking about a (possibly imbalanced) binary tree.
We are talking about using a binary search algorithm over a sorted list.
That algorithm will give you the same performance as a perfectly
balanced tree. Uneven distributions of the data will not impact the
search in the slightest.

Also remember that Rick said:
<quote>
I have a large list of objects where each object has a unique
(non-overlapping) date range. The list is sorted chronologically
</quote>

Since the date ranges do NOT overlap there is a clear sequence of data
points
obj1.Start
obj1.End
obj2.Start
obj2.End
....

So, You perform a Binary search using either Start or end (the choice is
irrelevant). This seach will go as O(logN). If you were lucky enough to
hit the date on the money, then you are done. If you did not find the
exact date (far more likely), the algorithm will return a negative
number, which is the bitwise complement of the index of the next
element. In other words, the algorithm has in effect told you which item
in the Array you need to examine (off by one is more like it). Then it
is a simple matter of seeing if your date falls within the Start and End
dates for the object in question.

Of course, unless the number of data points is substantial, the total
time would be negligable in either case.

Hope this helps
Bill


<snip>

Mar 10 '07 #7
Hello Bill,

I understand I was probably mixing up some things here - my ideas were
targeted at a more efficient algorithm that would take the structure into
account, based on the thought that the amount of data in question is so
vast that we start running into problems with the linear search to begin
with.
Oliver Sturm
--
http://www.sturmnet.org/blog
Mar 10 '07 #8

"Oliver Sturm" <ol****@sturmne t.orgwrote in message
news:xn******** ********@msnews .microsoft.com. ..
Hello Bill,

I understand I was probably mixing up some things here - my ideas were
targeted at a more efficient algorithm that would take the structure
into account, based on the thought that the amount of data in question
is so vast that we start running into problems with the linear search
to begin with.

You do bring up a good point.
Knowledge of your data, and how it is likely to be searched can go a
LONG way towards picking the best approach.

If most of the time you are accessing data near the ends of the range
the linear search may in fact outperform the binary search which does
better for random access.

If the number of data points is not large, it doesn't matter how you
search, since the time is negligible in either case. Unless you are
sitting in a tight loop processing requests continuously that is.

Bill
Mar 10 '07 #9
Hello Bill,
>Knowledge of your data, and how it is likely to be searched can go a LONG
way towards picking the best approach.
That's what I had in mind - maybe too much so, granted.
>If the number of data points is not large, it doesn't matter how you
search, since the time is negligible in either case. Unless you are
sitting in a tight loop processing requests continuously that is.
Oh, of course. I didn't mean my comments somewhere on the thread to say
that it's meaningless to search for efficiency in this regard. But I find
it significant to think about the best way to achieve efficiency - for
instance by changing the storage structure instead of optimizing searches
-, and this depends on the use case a lot.
Oliver Sturm
--
http://www.sturmnet.org/blog
Mar 10 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

38
5236
by: VK | last post by:
Hello, In my object I have getDirectory() method which returns 2-dimentional array (or an imitation of 2-dimentional array using two JavaScript objects with auto-handled length property - please let's us do not go into an "each dot over i" clarification discussion now - however you want to call - you call it ;-) array contains records of all files in the current dir. array contains records of all subs in the current dir
1
2623
by: decrypted | last post by:
I am having trouble with creating / finding effective searching algorithms. I constantly run into a situation where I have a list of objects and need to find out if a property of one of these objects matches some criteria...especialy if the criteria is a list itself. Example... class Part has private PartID A listview of items with each tag having a Part object. I have an array of PartID's denoting the parts I want to remove from the...
4
7113
by: Ben Fidge | last post by:
Hi What is the most efficient way to gather a list of files under a given folder recursively with the ability to specify exclusion file masks? For example, say I wanted to get a list of all files under \Program Files and it's subdirectories that meet the following criteria:
3
2027
by: Russell | last post by:
Hey, ok i have numerous tables to search through for a 'site search'. some of the searchble fields have html embeded within so after some quick referencing, saw I can use the regExp function to strip out all the HTML leaving only the raw text. (done and works a treat) My issue is:
3
9559
by: Chung Leong | last post by:
Here's the rest of the tutorial I started earlier: Aside from text within a document, Indexing Service let you search on meta information stored in the files. For example, MusicArtist and MusicAlbum let you find MP3 and other music files based on the singer and album name; DocAuthor let you find Office documents created by a certain user; DocAppName let you find files of a particular program, and so on. Indexing Service uses plug-ins...
6
4020
by: Amit Bhatia | last post by:
Hi, I am not sure if this belongs to this group. Anyway, my question is as follows: I have a list (STL list) whose elements are pairs of integers (STL pairs, say objects of class T). When I create a new object of class T, I would like to check if this object already exists in the list: meaning one having same integers. This can be done in linear time in a list, and probably faster if I use STL Set instead of list. I am wondering however if...
16
2861
by: Computer geek | last post by:
Hello, I am new to VB.NET and programming in general. I have taught myself a lot of the basics with vb.net but am still quite the novice. I am working on a little application now and I need some help with one part of the code. When a button is clicked I need to have it go out to a network drive location and count how many files are present with a certain file extension. Then store that number in a declared variable. Is this possible? Can...
3
17278
by: =?Utf-8?B?YWJjZA==?= | last post by:
I have a generic list of objects. Each object contains multiple objects and one of the object has some properties. Now I need to filter the list based on those properties. Whats the best way for me. I tried foreach loop but I am unable to write the filter criteria. All the proeperties I need to filter must have AND operation. I need to consider conditions like blank etc. To use list.Find I need to write predicate...any samples anywhere... ...
10
3970
by: Eugenio | last post by:
Hi there, I'm returning to this forum for the second time and I would like to say thanks for the great help provided. I've encountered a new problem now and hope that you will be able to help me again. I'm currently implementing a container monitoring system in MSAccess07 and I'm using a multiple search function to sort containers. I use several unbound text boxes and a list box.The user type's in partial search criteria in one or several...
0
9645
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9480
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10147
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8972
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7499
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6739
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5381
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5511
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
2879
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.