473,379 Members | 1,220 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,379 software developers and data experts.

fast list search?

Hi,

I´ve got a list with more than 500,000 ints. Before inserting new ints,
I have to check that it doesn´t exist already in the list.

Currently, I am doing the standard:

if new_int not in long_list:
long_list.append(new_int)
but it is extremely slow... is there a faster way of doing this in python?

Thanks a lot,

Ramon Aragues
Jul 18 '05 #1
6 22161
Am Wed, 09 Jun 2004 11:49:19 +0200 schrieb ramon aragues:
Hi,

I´ve got a list with more than 500,000 ints. Before inserting new ints,
I have to check that it doesn´t exist already in the list.

Currently, I am doing the standard:

if new_int not in long_list:
long_list.append(new_int)
but it is extremely slow... is there a faster way of doing this in python?


Hi,

Use a dictionary instead of the list:

if not long_dict.has_key(new_int):
long_dict[new_int]=1

Thomas

Jul 18 '05 #2
ramon aragues wrote:
if new_int not in long_list:
long_list.append(new_int) but it is extremely slow... is there a faster way of doing this in python?


import sets
int_set = sets.Set()
for i in [1,2,2,3,4,4,4]: .... int_set.add(i)
.... int_set Set([1, 2, 3, 4])


This requires Python 2.3.

Peter

Jul 18 '05 #3

"Thomas Guettler" <gu*****@thomas-guettler.de> wrote in message
news:pa***************************@thomas-guettler.de...
Am Wed, 09 Jun 2004 11:49:19 +0200 schrieb ramon aragues:
Hi,

I´ve got a list with more than 500,000 ints. Before inserting new ints,
I have to check that it doesn´t exist already in the list.

Currently, I am doing the standard:

if new_int not in long_list:
long_list.append(new_int)
but it is extremely slow... is there a faster way of doing this in
python?
Hi,

Use a dictionary instead of the list:

if not long_dict.has_key(new_int):
long_dict[new_int]=1


Or use a set, which I believe will be faster (more C-coded) in 2.4. That
eliminates the dummy value.

TJR


Jul 18 '05 #4
I think the better way is sort before the list
thelist.sort()
And then, a binary search
point_of_insertion = bisect.bisect(thelist, item)
is_present = thelist[point_of_insertion -1:point_of_insertion] == [item]
if not is_present:
..... thelist.insert(point_of_insertion, item)

and done!

Fco Javier Zaragozá Arenas

"ramon aragues" <ra***********@gmx.net> escribió en el mensaje
news:ma*************************************@pytho n.org... Hi,

I´ve got a list with more than 500,000 ints. Before inserting new ints,
I have to check that it doesn´t exist already in the list.

Currently, I am doing the standard:

if new_int not in long_list:
long_list.append(new_int)
but it is extremely slow... is there a faster way of doing this in python?

Thanks a lot,

Ramon Aragues

Jul 18 '05 #5
How about using a dictionary, with the keys being the ints and the
values being the index in the 'list', i.e. len(dictionary). This is
quite fast:
d = {}
for i in xrange(0,500000):
.... if not d.has_key(i): d[i] = len(d)

-Dan

Javier Zaragozá Arenas wrote:
I think the better way is sort before the list

thelist.sort()

And then, a binary search

point_of_insertion = bisect.bisect(thelist, item)
is_present = thelist[point_of_insertion -1:point_of_insertion] ==[item] >> if not is_present:

.... thelist.insert(point_of_insertion, item)

and done!

Fco Javier Zaragozá Arenas



"ramon aragues" <ra***********@gmx.net> escribió en el mensaje
news:ma*************************************@pytho n.org...
Hi,

I´ve got a list with more than 500,000 ints. Before inserting new ints,
I have to check that it doesn´t exist already in the list.

Currently, I am doing the standard:

if new_int not in long_list:
long_list.append(new_int)
but it is extremely slow... is there a faster way of doing this in python?

Thanks a lot,

Ramon Aragues




Jul 18 '05 #6
Perhaps the standard 'sets' module would be of use? It's implemented
with dictionaries, and seems to do the thing that is needed.

On Fri, Jun 11, 2004 at 08:06:17PM -0700, Dan Gunter wrote:
How about using a dictionary, with the keys being the ints and the
values being the index in the 'list', i.e. len(dictionary). This is
quite fast:
d = {}
for i in xrange(0,500000):

... if not d.has_key(i): d[i] = len(d)

-Dan

Javier Zaragoz? Arenas wrote:
I think the better way is sort before the list

thelist.sort()

And then, a binary search

point_of_insertion = bisect.bisect(thelist, item)
is_present = thelist[point_of_insertion -1:point_of_insertion] == [item]

> if not is_present:

.... thelist.insert(point_of_insertion, item)

and done!

Fco Javier Zaragoz? Arenas

"ramon aragues" <ra***********@gmx.net> escribi? en el mensaje
news:ma*************************************@pyth on.org...
Hi,

I?ve got a list with more than 500,000 ints. Before inserting new ints,
I have to check that it doesn?t exist already in the list.

Currently, I am doing the standard:

if new_int not in long_list:
long_list.append(new_int)
but it is extremely slow... is there a faster way of doing this in python?

Thanks a lot,

Ramon Aragues


Jul 18 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Neil | last post by:
I have a very puzzling situation with a database. It's an Access 2000 mdb with a SQL 7 back end, with forms bound using ODBC linked tables. At our remote location (accessed via a T1 line) the time...
7
by: simkn | last post by:
Hello, I'm writing a function that updates an array. That is, given an array, change each element. The trick is this: I can't change any elements until I've processed the entire array. For...
3
by: ali poursadri | last post by:
Hi, I want to fill my list box from a dataset in c#. I use adapter and fill command and bind it to a listbox, but it is very slow for large data about 500000 records. How can I view this list...
6
by: DC | last post by:
Hi, I am programming a search catalogue with 200000 items (and growing). I am currently using the SQL Server 2000 fulltext engine for this task but it does not fit the requirements anymore. ...
4
by: Olaf Baeyens | last post by:
I have to load about 10000 possible file names into a list and must have thems sorted alphabetically, including the file information. At this moment, in conventional C++, I use a sorted file strng...
14
by: Jeff S. | last post by:
In a Windows Forms application I plan to have a collection of structs - each of which contains a bunch of properties describing a person (e.g., LastName, FirstName, EmployeeID, HomeAddress,...
10
by: javuchi | last post by:
I just want to share some code with you, and have some comments and improvements if you want. This header file allocates and add and delete items of any kind of data from a very fast array: ...
2
by: Spoon | last post by:
Hello, I'm wondering whether the STL defines a data structure with the following features: o provides push_front() and pop_back() (like std::list) to implement a FIFO buffer. o allows fast...
7
by: Karch | last post by:
I need to find the fastest way in terms of storage and searching to determine if a given string contains one of a member of a list of strings. So, think of it in terms of this: I have a string such...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.