473,385 Members | 1,464 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

Faster search algorithm for lists

I have a file containing 10,000 words which I import as strings and store in a list. Now, I have another list of strings and have to delete the common words in both lists. When I try to run the program, it takes a lot of time. Please let me know if there's any other way to do it.

For now, I am using:

For item in list_1:
If item in list_2:
{Do the deletion}
Jun 15 '18 #1
4 2221
It's much faster to search a set than a list. Since that list doesn't seem to change during the program run, simply use
Expand|Select|Wrap|Line Numbers
  1.     common_words = set(list_2) # after building list_2 from the file
  2.     ...
  3.     for word in list_1:
  4.          if word in common_words:
  5.              ...do the deletion
  6.  
If you are just making a reduced list, then you can do that without any explicit looping:

Expand|Select|Wrap|Line Numbers
  1. uncommon = [w for w in list_2 if w not in common_words]
  2.  
That's a "list comprehension". It builds a list of every element from the list_2 sequence that is not an element of the common_words set.
Jun 15 '18 #2
Thanks a lot. I'll definitely try it out.
Jun 18 '18 #3
I would use this big list to build sub_lists , one sub_list for each letter , thus , reducing the search time by almost 26 times (in average) , I don't know if python's built-in search functions do that or just loop through every element.
Jun 21 '18 #4
Searching a set is much faster than searching lists--even with the sublists you describe. It's a hash table search, and even if the key-hashing algorithm is a horrible match for the actual key distribution (nearly all "words" happen to hash to the same number, modulo the table size) then that worst-case performance is a sequential list search. Heads anytime you win; tails 30 times in a row you lose a little.
Jul 1 '18 #5

Sign in to post your reply or Sign up for a free account.

Similar topics

28
by: joshc | last post by:
If I have an array of data that I know to be sorted in increasing order, and the array is less than 50 elements, and I want to find the first element greater than a certain value, is a simple...
4
by: Ben Fidge | last post by:
Hi What is the most efficient way to gather a list of files under a given folder recursively with the ability to specify exclusion file masks? For example, say I wanted to get a list of all...
1
by: Rathtap | last post by:
I read claims from a file into a DataTable and DataRow and then use the data adapter's Update method to push those claims into the database. I import a few thousand claims at a time and before...
1
by: Bill Mill | last post by:
Hello all, What data structure would you use to implement something analogous to the iTunes search? I imagine that it must be a tree of some sort, but I can't figure out an easy structure for...
1
by: gihope | last post by:
Hi, I wonder if someone could help me. I'm trying to develop a search algorithm as I need an engine that can determine the shortest route between two locations based on the number of stops. I...
3
by: mintominto82 | last post by:
Hello, I am a very newbie (big emphasis on being a newbie) to c++. I read c++PRIMER but apparently, it did not provide much help. I need a bit of help in codes loading txt file in C++. My...
1
blyxx86
by: blyxx86 | last post by:
Hey everyone, I was wondering how I would go about creating some sort of search algorithm that would find partial matches to a string variable search? (ie. 50123 would be an 80% match with 05123 .....
6
Kelicula
by: Kelicula | last post by:
Why?: One commonly used algorithm is the binary search. If you don't already know it, you should read on. Very helpful. Saves much CPU. Reduces computations exponentially. When searching...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.