mapping subintervals

Lee Sander

hi,
I have the following problem which is turning out to be non-trivial. I
realize that this is not
exactly a python problem but more of an algorithm problem -- but I
post it here because
I want to implement this in python.

I want to write a code that given an interval (integer tuple:
start,stop) will find which other
interval it matches to. Here is an example,
suppose I have the following NON-OVERLAPPING intervals

10 - 21 =='a1'
34 - 51 =='a2'
77 - 101 =='a3'
etc

So, suppose I'm give an interval such as (42,50), I should go through
the list and map it to "a2" etc
if the region is a subset of an exiting interval. If the query
interval does not exist in the table or
maps to two or more intervals (eg, (45, 80)) the program return a
None.

One naive way to solve this problem is to create an array such as
follows:
[None, None, None, ...., None, a1, a1, a1, ..., a1, None, None ...,
None, a2, ... etc] at indicies
1 2 3 9 10 11 12 21
22 23 33 34, ...

now with this in place I can easily solve the problem. However, this
is not a feasable solution
because the initial list has intervals whose range can go to billions!
So I need a smarter
idea. So what I can do is sort the initial list and then go through
each element from the start
and test if the a X[i][0] and b < X[i][1]
where (a,b) is my query start and stop
and X is a n x 2 array with the known intervals.
I don't like this solution because it can be really really slow as for
each query I'm doing a
linear search and on average I'll be searching almost half the list
before I find the right
interval.
Is there a smarter way to do this?
I've my problem is not clear please let me know and I'll try to
explain the unclear parts again.
Many thanks
Lee

Jun 13 '07 #1

Subscribe Post Reply

1289

Matteo

On Jun 13, 2:32 pm, Lee Sander <lesa...@gmail.comwrote:

hi,
I have the following problem which is turning out to be non-trivial. I
realize that this is not
exactly a python problem but more of an algorithm problem -- but I
post it here because
I want to implement this in python.

I want to write a code that given an interval (integer tuple:
start,stop) will find which other
interval it matches to. Here is an example,
suppose I have the following NON-OVERLAPPING intervals

10 - 21 =='a1'
34 - 51 =='a2'
77 - 101 =='a3'
etc

So, suppose I'm give an interval such as (42,50), I should go through
the list and map it to "a2" etc
if the region is a subset of an exiting interval. If the query
interval does not exist in the table or
maps to two or more intervals (eg, (45, 80)) the program return a
None.

....snip...

Many thanks
Lee

OK - I'm going to assume your intervals are inclusive (i.e. 34-51
contains both 34 and 51).

If your intervals are all really all non-overlapping, one thing you
can try is to put all the endpoints in a single list, and sort it.
Then, you can use the bisect module to search for intervals, which
will give you a logarithmic time algorithm.

Here, I'm going to assume you just need the index of the containing
interval. If you really need a name (i.e. 'a1' or 'a2'), you can use a
list of names, and index into that.

I hope those assumptions are valid! if so, the following should work:

from bisect import bisect

# assume initial intervals are sorted
intervals=[(10,21),(34,51), (77,101)]
# get a sorted list of endpoints
endpts=sorted([a for a,b in intervals]+[b for a,b in intervals])

def test_interval(ivl,endpts):
# Find where the left endpoint of ivl would lie in the list
# i.e. the index of the first element greater than ivl[0]
l_idx=bisect(endpts,ivl[0])
# If l_idx is even, then it lies between two intervals, and thus
# is not contained in any interval. Otherwise it returns the index
if l_idx % 2 == 0: return None
# if l_idx is out of bounds (i.e. ==len(endpts)), then ivl is
# not contained in any interval (too large)
if l_idx==len(endpts): return None
# Now, all we need to do is check that the right endpoint of ivl is
# less than or equal to the corresponding endpoint in the list.
# Happily, that endpoint is at index l_idx
if ivl[1]<=endpts[l_idx]:
# So return the index of the interval:
return (l_idx-1)/2
else:
return None
Then, from a shell:

>>print tst.test_interval((0,13),endpts)

None

>>print tst.test_interval((15,21),endpts)

>>print tst.test_interval((35,40),endpts)

>>print tst.test_interval((40,80),endpts)

None

>>print tst.test_interval((109,200),endpts)

None

Jun 13 '07 #2

=?ISO-8859-1?Q?Nis_J=F8rgensen?=

Matteo skrev:

OK - I'm going to assume your intervals are inclusive (i.e. 34-51
contains both 34 and 51).

If your intervals are all really all non-overlapping, one thing you
can try is to put all the endpoints in a single list, and sort it.
Then, you can use the bisect module to search for intervals, which
will give you a logarithmic time algorithm.

Here, I'm going to assume you just need the index of the containing
interval. If you really need a name (i.e. 'a1' or 'a2'), you can use a
list of names, and index into that.

I hope those assumptions are valid! if so, the following should work:

I have taken the liberty of simplifying your code, using the fact that
tuples are sorted lexicographically. Note that this requires all
intervals to be tuples and not lists (since list(a) < tuple(b) is always
True).

from bisect import bisect

def test_interval(ivl,intervals):

# Find where ivl would lie in the list
# i.e. the index of the first interval sorting as larger than ivl
idx=bisect(intervals,ivl)
# Left endpoints equal is a special case - a matching interval will be
# to the right of the insertion point
if idx < len(intervals) and intervals[idx][0] == ivl[0]:
if intervals[idx][1] >= ivl[1]:
return idx
else:
return None
# Otherwise, we need to check to the left of the insertion point
if idx 0 and intervals[idx-1][1] >= ivl[1]:
return idx-1
else:
return None

>>intervals =[(10, 21), (34, 51), (77, 101)]
print test_interval((34,35),intervals)

>>print test_interval((34,53),intervals)

None

>>print test_interval((77,53),intervals)

>>print test_interval((77,83),intervals)

>>print test_interval((77,102),intervals)

None

>>print test_interval((77,101),intervals)

2

u"Nis J\xf8rgensen"

Jun 13 '07 #3

Lee Sander

Dear Matteo and Nis,
Thankyou very much for your help. I wasn't aware of the bisect
library but it's really useful.
thank you both once again
Lee

On 13 Jun, 23:21, Nis Jørgensen <n...@superlativ.dkwrote:

Matteo skrev:

OK - I'm going to assume your intervals are inclusive (i.e. 34-51
contains both 34 and 51).

If your intervals are all really all non-overlapping, one thing you
can try is to put all the endpoints in a single list, and sort it.
Then, you can use the bisect module to search for intervals, which
will give you a logarithmic time algorithm.

Here, I'm going to assume you just need the index of the containing
interval. If you really need a name (i.e. 'a1' or 'a2'), you can use a
list of names, and index into that.

I hope those assumptions are valid! if so, the following should work:

I have taken the liberty of simplifying your code, using the fact that
tuples are sorted lexicographically. Note that this requires all
intervals to be tuples and not lists (since list(a) < tuple(b) is always
True).

from bisect import bisect

def test_interval(ivl,intervals):

# Find where ivl would lie in the list
# i.e. the index of the first interval sorting as larger than ivl
idx=bisect(intervals,ivl)
# Left endpoints equal is a special case - a matching interval will be
# to the right of the insertion point
if idx < len(intervals) and intervals[idx][0] == ivl[0]:
if intervals[idx][1] >= ivl[1]:
return idx
else:
return None
# Otherwise, we need to check to the left of the insertion point
if idx 0 and intervals[idx-1][1] >= ivl[1]:
return idx-1
else:
return None

>intervals =[(10, 21), (34, 51), (77, 101)]
print test_interval((34,35),intervals)

1

>print test_interval((34,53),intervals)

None

>print test_interval((77,53),intervals)

2

>print test_interval((77,83),intervals)

2

>print test_interval((77,102),intervals)

None

>print test_interval((77,101),intervals)

2

u"Nis J\xf8rgensen"

Jun 14 '07 #4

Similar topics

url mapping

by: Frank | last post by:

Hi there, In web.xml, I know it's possible to do path mapping: <path-mapping url-pattern="/aaa/*" real-path="C:\TEMP\"/> So a call to myserver.com/mywebapp/aaa will redirect to c:\temp. ...

Java

string formatting with mapping & '*'... is this a bug?

by: Pierre Fortin | last post by:

Hi! "Python Essential Reference" - 2nd Ed, on P. 47 states that a string format can include "*" for a field width (no restrictions noted); yet... >>> "%*d" % (6,2) # works as expected ' ...

Python

Mapping Function

by: naruto | last post by:

Hi all, I have the following being defined in a A.cxx file. // define in source file. Not exported to the outside world (this cannot be // moved to the header file ) #define CHANNEL_0 0...

C / C++

Where to find tag mapping-table of HTML translated to XHTML1.0

by: mike | last post by:

regards: Where to find tag mapping-table of HTML translated to XHTML1.0 Any positive suggestion is welcome. thank you May goodness be with you all

HTML / CSS

xml-object mapping

by: Tamas Hegedus | last post by:

Hi! I am looking for an xml-object mapping tool ('XML Data Binding-design time product') where I can define the mapping rules in 'binding files' and the parser is generated automatically. ...

Python

Does .NET have O/R Mapping?

by: Elder Hyde | last post by:

I was reading this interview with Hejlsberg, when suddenly the conversation turned to O/R mapping. Hejlsberg talked as if he had had to design an O/R mapping for .NET (he said ".NET had each one of...

C# / C Sharp

how to add a whildcard appliation mapping in IIS programmatically?

by: BentleyInc | last post by:

I'm trying to find a way to add a whildcard application mapping to aspnet_isapi.dll in IIS programmatically.... been looking into IIS administrator reference but didn't find the right function to...

ASP.NET

Mapping problem with ADO.NET

by: none | last post by:

Hi, I'm trying to establish table mappings, and I've hit a snag. At the point to where I try to fill the schema (DB_adapter.FillSchema), I get an exception, and the message is as follows:...

C# / C Sharp

NHibernate Complex Table Mapping

by: Ram | last post by:

Hey, I'm having a trouble mapping a connecting between 2 of my tables. We have 2 tables - the simplest "dept", "emp" tables which are mapped to 2 classes. Class Dept contains 2 properties for...

.NET Framework

wildcard mapping

by: Jan Kucera | last post by:

Hi, does anybody know about wildcard mapping ASP.NET 2 in IIS6? Any tutorial? Thanks, Jan

ASP.NET

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA