hi,
I have the following problem which is turning out to be nontrivial. I
realize that this is not
exactly a python problem but more of an algorithm problem  but I
post it here because
I want to implement this in python.
I want to write a code that given an interval (integer tuple:
start,stop) will find which other
interval it matches to. Here is an example,
suppose I have the following NONOVERLAPPING intervals
10  21 =='a1'
34  51 =='a2'
77  101 =='a3'
etc
So, suppose I'm give an interval such as (42,50), I should go through
the list and map it to "a2" etc
if the region is a subset of an exiting interval. If the query
interval does not exist in the table or
maps to two or more intervals (eg, (45, 80)) the program return a
None.
One naive way to solve this problem is to create an array such as
follows:
[None, None, None, ...., None, a1, a1, a1, ..., a1, None, None ...,
None, a2, ... etc] at indicies
1 2 3 9 10 11 12 21
22 23 33 34, ...
now with this in place I can easily solve the problem. However, this
is not a feasable solution
because the initial list has intervals whose range can go to billions!
So I need a smarter
idea. So what I can do is sort the initial list and then go through
each element from the start
and test if the a X[i][0] and b < X[i][1]
where (a,b) is my query start and stop
and X is a n x 2 array with the known intervals.
I don't like this solution because it can be really really slow as for
each query I'm doing a
linear search and on average I'll be searching almost half the list
before I find the right
interval.
Is there a smarter way to do this?
I've my problem is not clear please let me know and I'll try to
explain the unclear parts again.
Many thanks
Lee 3 1288
On Jun 13, 2:32 pm, Lee Sander <lesa...@gmail.comwrote:
hi,
I have the following problem which is turning out to be nontrivial. I
realize that this is not
exactly a python problem but more of an algorithm problem  but I
post it here because
I want to implement this in python.
I want to write a code that given an interval (integer tuple:
start,stop) will find which other
interval it matches to. Here is an example,
suppose I have the following NONOVERLAPPING intervals
10  21 =='a1'
34  51 =='a2'
77  101 =='a3'
etc
So, suppose I'm give an interval such as (42,50), I should go through
the list and map it to "a2" etc
if the region is a subset of an exiting interval. If the query
interval does not exist in the table or
maps to two or more intervals (eg, (45, 80)) the program return a
None.
....snip...
Many thanks
Lee
OK  I'm going to assume your intervals are inclusive (i.e. 3451
contains both 34 and 51).
If your intervals are all really all nonoverlapping, one thing you
can try is to put all the endpoints in a single list, and sort it.
Then, you can use the bisect module to search for intervals, which
will give you a logarithmic time algorithm.
Here, I'm going to assume you just need the index of the containing
interval. If you really need a name (i.e. 'a1' or 'a2'), you can use a
list of names, and index into that.
I hope those assumptions are valid! if so, the following should work:
from bisect import bisect
# assume initial intervals are sorted
intervals=[(10,21),(34,51), (77,101)]
# get a sorted list of endpoints
endpts=sorted([a for a,b in intervals]+[b for a,b in intervals])
def test_interval(ivl,endpts):
# Find where the left endpoint of ivl would lie in the list
# i.e. the index of the first element greater than ivl[0]
l_idx=bisect(endpts,ivl[0])
# If l_idx is even, then it lies between two intervals, and thus
# is not contained in any interval. Otherwise it returns the index
if l_idx % 2 == 0: return None
# if l_idx is out of bounds (i.e. ==len(endpts)), then ivl is
# not contained in any interval (too large)
if l_idx==len(endpts): return None
# Now, all we need to do is check that the right endpoint of ivl is
# less than or equal to the corresponding endpoint in the list.
# Happily, that endpoint is at index l_idx
if ivl[1]<=endpts[l_idx]:
# So return the index of the interval:
return (l_idx1)/2
else:
return None
Then, from a shell:
>>print tst.test_interval((0,13),endpts)
None
>>print tst.test_interval((15,21),endpts)
0
>>print tst.test_interval((35,40),endpts)
1
>>print tst.test_interval((40,80),endpts)
None
>>print tst.test_interval((109,200),endpts)
None
Matteo skrev:
OK  I'm going to assume your intervals are inclusive (i.e. 3451
contains both 34 and 51).
If your intervals are all really all nonoverlapping, one thing you
can try is to put all the endpoints in a single list, and sort it.
Then, you can use the bisect module to search for intervals, which
will give you a logarithmic time algorithm.
Here, I'm going to assume you just need the index of the containing
interval. If you really need a name (i.e. 'a1' or 'a2'), you can use a
list of names, and index into that.
I hope those assumptions are valid! if so, the following should work:
I have taken the liberty of simplifying your code, using the fact that
tuples are sorted lexicographically. Note that this requires all
intervals to be tuples and not lists (since list(a) < tuple(b) is always
True).
from bisect import bisect
def test_interval(ivl,intervals):
# Find where ivl would lie in the list
# i.e. the index of the first interval sorting as larger than ivl
idx=bisect(intervals,ivl)
# Left endpoints equal is a special case  a matching interval will be
# to the right of the insertion point
if idx < len(intervals) and intervals[idx][0] == ivl[0]:
if intervals[idx][1] >= ivl[1]:
return idx
else:
return None
# Otherwise, we need to check to the left of the insertion point
if idx 0 and intervals[idx1][1] >= ivl[1]:
return idx1
else:
return None
>>intervals =[(10, 21), (34, 51), (77, 101)] print test_interval((34,35),intervals)
1
>>print test_interval((34,53),intervals)
None
>>print test_interval((77,53),intervals)
2
>>print test_interval((77,83),intervals)
2
>>print test_interval((77,102),intervals)
None
>>print test_interval((77,101),intervals)
2
u"Nis J\xf8rgensen"
Dear Matteo and Nis,
Thankyou very much for your help. I wasn't aware of the bisect
library but it's really useful.
thank you both once again
Lee
On 13 Jun, 23:21, Nis Jørgensen <n...@superlativ.dkwrote:
Matteo skrev:
OK  I'm going to assume your intervals are inclusive (i.e. 3451
contains both 34 and 51).
If your intervals are all really all nonoverlapping, one thing you
can try is to put all the endpoints in a single list, and sort it.
Then, you can use the bisect module to search for intervals, which
will give you a logarithmic time algorithm.
Here, I'm going to assume you just need the index of the containing
interval. If you really need a name (i.e. 'a1' or 'a2'), you can use a
list of names, and index into that.
I hope those assumptions are valid! if so, the following should work:
I have taken the liberty of simplifying your code, using the fact that
tuples are sorted lexicographically. Note that this requires all
intervals to be tuples and not lists (since list(a) < tuple(b) is always
True).
from bisect import bisect
def test_interval(ivl,intervals):
# Find where ivl would lie in the list
# i.e. the index of the first interval sorting as larger than ivl
idx=bisect(intervals,ivl)
# Left endpoints equal is a special case  a matching interval will be
# to the right of the insertion point
if idx < len(intervals) and intervals[idx][0] == ivl[0]:
if intervals[idx][1] >= ivl[1]:
return idx
else:
return None
# Otherwise, we need to check to the left of the insertion point
if idx 0 and intervals[idx1][1] >= ivl[1]:
return idx1
else:
return None
>intervals =[(10, 21), (34, 51), (77, 101)] print test_interval((34,35),intervals)
1
>print test_interval((34,53),intervals)
None
>print test_interval((77,53),intervals)
2
>print test_interval((77,83),intervals)
2
>print test_interval((77,102),intervals)
None
>print test_interval((77,101),intervals)
2
u"Nis J\xf8rgensen"
This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: Frank 
last post by:
Hi there,
In web.xml, I know it's possible to do path mapping:
<pathmapping urlpattern="/aaa/*" realpath="C:\TEMP\"/>
So a call to myserver.com/mywebapp/aaa will redirect to c:\temp.
...

by: Pierre Fortin 
last post by:
Hi!
"Python Essential Reference"  2nd Ed, on P. 47 states that a string
format can include "*" for a field width (no restrictions noted); yet...
>>> "%*d" % (6,2) # works as expected
' ...

by: naruto 
last post by:
Hi all,
I have the following being defined in a A.cxx file.
// define in source file. Not exported to the outside world (this
cannot be
// moved to the header file )
#define CHANNEL_0 0...

by: mike 
last post by:
regards:
Where to find tag mappingtable of HTML translated to XHTML1.0
Any positive suggestion is welcome.
thank you
May goodness be with you all

by: Tamas Hegedus 
last post by:
Hi!
I am looking for an xmlobject mapping tool ('XML Data Bindingdesign
time product') where I can define the mapping rules in 'binding files'
and the parser is generated automatically.
...

by: Elder Hyde 
last post by:
I was reading this interview with Hejlsberg, when suddenly the
conversation turned to O/R mapping. Hejlsberg talked as if he had had to
design an O/R mapping for .NET (he said ".NET had each one of...

by: BentleyInc 
last post by:
I'm trying to find a way to add a whildcard application mapping to
aspnet_isapi.dll in IIS programmatically.... been looking into IIS
administrator reference but didn't find the right function to...

by: none 
last post by:
Hi,
I'm trying to establish table mappings, and I've hit a snag.
At the point to where I try to fill the schema (DB_adapter.FillSchema),
I get an exception, and the message is as follows:...

by: Ram 
last post by:
Hey,
I'm having a trouble mapping a connecting between 2 of my tables.
We have 2 tables  the simplest "dept", "emp" tables which are mapped
to 2 classes.
Class Dept contains 2 properties for...

by: Jan Kucera 
last post by:
Hi,
does anybody know about wildcard mapping ASP.NET 2 in IIS6?
Any tutorial?
Thanks,
Jan

by: Charles Arthur 
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone

by: ryjfgjl 
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

by: emmanuelkatto 
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel

by: BarryA 
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

by: Sonnysonu 
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by columnwise with in the specific length.
suppose the i have to...

by: Hystou 
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

by: Hystou 
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

by: tracyyun 
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, ZWave, WiFi, Bluetooth, etc. Each...

by: agi2029 
last post by:
Let's talk about the concept of autonomous AI software engineers and nocode agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
 