473,396 Members | 2,020 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

fast pythonic algorithm question

hi all,

i have a big list of tuples like this:

[ (host, port, protocol, startime, endtime), .. ] etc

now i have another big(ger) list of tuples like this:

[(src_host, src_port, dest_src, dest_port, protocol, time), ... ] etc

now i need to find all the items in the second list where either
src_host/src_port or dest_host/dest_port matches, protocol matches and
time is between starttime and end time.

After trynig some stuff out i actually found dictionary lookup pretty
fast. Putting the first list in a dict like this:

dict[(host,port,protocol)] = (starttime, endtime)

then:

if (((src_host,src_port, protocol) in dict or (dest_host, dest_port,
protocol) in dict) and starttime < time < endtime):
print "we have a winner"
I have also been looking at the bisect module, but couldnt figure out
if it was what I was looking for...

any ideas?
regards,

Guyon Moree
http;//gumuz.looze.net/

Aug 1 '06 #1
7 1266
"Guyon Morée" <gu*********@gmail.comwrites:
if (((src_host,src_port, protocol) in dict or (dest_host, dest_port,
protocol) in dict) and starttime < time < endtime):
print "we have a winner"
If you have enough memory to do it that way, what's the problem?
Aug 1 '06 #2
Memory is no problem. It just needs to be as fast as possible, if
that's what this is, fine.

If not, I'd like to find out what is :)
thanx,

Guyon Moree
http://gumuz.looze.net
Paul Rubin schreef:
"Guyon Morée" <gu*********@gmail.comwrites:
if (((src_host,src_port, protocol) in dict or (dest_host, dest_port,
protocol) in dict) and starttime < time < endtime):
print "we have a winner"
If you have enough memory to do it that way, what's the problem?
Aug 1 '06 #3
Guyon Morée wrote:
Memory is no problem. It just needs to be as fast as possible, if
that's what this is, fine.

If not, I'd like to find out what is :)
I'd say it is as fast as it can get - using hashing for lookups is O(n) in
most cases, where bisection or other order-based lookups have O(log n)

Additionally, dict lookups are fully written in C.

Diez
Aug 1 '06 #4

On Aug 1, 2006, at 11:13 AM, Diez B. Roggisch wrote:
Guyon Morée wrote:
>Memory is no problem. It just needs to be as fast as possible, if
that's what this is, fine.

If not, I'd like to find out what is :)

I'd say it is as fast as it can get - using hashing for lookups is O
(n) in

I know you meant O(1) for hash lookups, but just in case anyone is
confused, I figured I'd correct this.

most cases, where bisection or other order-based lookups have O(log n)

Additionally, dict lookups are fully written in C.

Diez
Dave

Aug 1 '06 #5
>I'd say it is as fast as it can get - using hashing for lookups is O
>(n) in


I know you meant O(1) for hash lookups, but just in case anyone is
confused, I figured I'd correct this.
Ooops. Thanks.

Diez
Aug 1 '06 #6

Guyon Morée wrote:
i have a big list of tuples like this:

[ (host, port, protocol, startime, endtime), .. ] etc

now i have another big(ger) list of tuples like this:

[(src_host, src_port, dest_src, dest_port, protocol, time), ... ] etc

now i need to find all the items in the second list where either
src_host/src_port or dest_host/dest_port matches, protocol matches and
time is between starttime and end time.

After trynig some stuff out i actually found dictionary lookup pretty
fast. Putting the first list in a dict like this:

dict[(host,port,protocol)] = (starttime, endtime)
That only works if each (host,port,protocol) can appear with only
one (starttime, endtime) in your first big list. Do the variable
names mean what they look like? There's nothing unusual about
connecting to the same host and port with the same protocol, at
multiple times.

You might want your dict to associate (host,port,protocol) with a
list, or a set, of tuples of the form (starttime, endtime). If the
lists can be long, there are fancier methods for keeping the set
of intervals and searching them for contained times or overlapping
intervals. Google up "interval tree" for more.
--
--Bryan

Aug 1 '06 #7
Brian you are right, but in my case (host, port, protocol) is unique.
br***********************@yahoo.com schreef:
Guyon Morée wrote:
i have a big list of tuples like this:

[ (host, port, protocol, startime, endtime), .. ] etc

now i have another big(ger) list of tuples like this:

[(src_host, src_port, dest_src, dest_port, protocol, time), ... ] etc

now i need to find all the items in the second list where either
src_host/src_port or dest_host/dest_port matches, protocol matches and
time is between starttime and end time.

After trynig some stuff out i actually found dictionary lookup pretty
fast. Putting the first list in a dict like this:

dict[(host,port,protocol)] = (starttime, endtime)

That only works if each (host,port,protocol) can appear with only
one (starttime, endtime) in your first big list. Do the variable
names mean what they look like? There's nothing unusual about
connecting to the same host and port with the same protocol, at
multiple times.

You might want your dict to associate (host,port,protocol) with a
list, or a set, of tuples of the form (starttime, endtime). If the
lists can be long, there are fancier methods for keeping the set
of intervals and searching them for contained times or overlapping
intervals. Google up "interval tree" for more.
--
--Bryan
Aug 1 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: Bulba! | last post by:
Hello everyone, I'm reading the rows from a CSV file. csv.DictReader puts those rows into dictionaries. The actual files contain old and new translations of software strings. The dictionary...
24
by: Alex Vinokur | last post by:
Consider the following statement: n+i, where i = 1 or 0. Is there more fast method for computing n+i than direct computing that sum? -- Alex Vinokur email: alex DOT vinokur AT gmail DOT...
2
by: jmdeschamps | last post by:
Working with several thousand tagged items on a Tkinter Canvas, I want to change different configurations of objects having a certain group of tags. I've used the sets module, on the tuple...
4
by: Carl J. Van Arsdall | last post by:
It seems the more I come to learn about Python as a langauge and the way its used I've come across several discussions where people discuss how to do things using an OO model and then how to design...
3
by: jesper | last post by:
I would like some feedback on this. A while back I was trying my hand at some pathfinding for a small game I was making. I did not know anything about it so I read some stuff and came up with the...
16
by: Andy Dingley | last post by:
I'm trying to write rot13, but to do it in a better and more Pythonic style than I'm currrently using. What would you reckon to the following pretty ugly thing? How would you improve it? In...
20
by: pratap | last post by:
Could someone clarify how could one reduce the size of an executable code during compile time. Could one use specific compile time flag with makefile or is it advisable to go for dynamic linking....
3
by: Magnus Lycka | last post by:
I'm looking for some library to parse XML code much faster than the libs built into Python 2.4 (I'm stuck with 2.4 for quite a while) and I also need XML Schema validation, and would appreciate...
19
by: Juha Nieminen | last post by:
If I'm not completely mistaken, the only reason why std::list::size() may be (and usually is) a linear-time operation is because they want std::list::splice() to be a constant-time operation, and...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.