469,631 Members | 1,719 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,631 developers. It's quick & easy.

anagram finder / dict mapping question

Hi All,

I wrote a program that takes a string sequence and finds all the words
inside a text file (one word per line) and prints them:

def anagfind(letters): #find anagrams of these letters
fin = open('text.txt') #one word per line file
wordbox = [] #this is where the words will go
for line in fin:
word = line.strip()
count = 0
for char in letters:
if char not in word:
break
else:
count += 1
if count == len(word):
wordbox.append(word)
return wordbox

Now I'd like to modify the code to naturally find all anagrams inside a
wordlist. What would be the best way to do this? Using Hints? Is it
possible to iterate over dict keys? How can I make a dict that maps
from a set of letters to a list of words that are spelled from those
letters? Wouldn't I need to make the set of letters a key in a dict?

As always - Thanks for helping someone trying to learn...

Dave

Jun 27 '08 #1
19 2598
On Thu, 08 May 2008 11:02:12 +1000, dave <sq*************@1ya2hoo3.net
wrote:
Hi All,

I wrote a program that takes a string sequence and finds all the words
inside a text file (one word per line) and prints them:

def anagfind(letters): #find anagrams of these letters
fin = open('text.txt') #one word per line file
wordbox = [] #this is where the words will go
for line in fin:
word = line.strip()
count = 0
for char in letters:
if char not in word:
break
else:
count += 1
if count == len(word):
wordbox.append(word)
return wordbox

Now I'd like to modify the code to naturally find all anagrams inside a
wordlist. What would be the best way to do this? Using Hints? Is it
possible to iterate over dict keys? How can I make a dict that maps
from a set of letters to a list of words that are spelled from those
letters? Wouldn't I need to make the set of letters a key in a dict?

As always - Thanks for helping someone trying to learn...

Dave
Suggestion: for each word, sort their characters and use them as the
dictionary key. If two words have the same combination of characters,
then they are anagrams. For example: "edam" and "made" are anagrams
because they have the letters 'a', 'd', 'e' and 'm'.

Refer "Programming Pearls" by Jon Bentley.

--
Kam-Hung Soh <a href="http://kamhungsoh.com/blog">Software Salariman</a>
Jun 27 '08 #2
This is what i've came up with. My problem is that I can't get them to
properly evaluate.. when comparewords() runs it finds itself...
Should I have the keys of mapdict iterate over itself? Is that
possible?

def annafind():
fin = open('text.txt') # file has one word per line
mapdic = {} # each word gets sorted & goes in here
for line in fin:
rawword = line.strip()
word = list(rawword)
word.sort()
mapdic[''.join(word)] = 0
return mapdic
def comparewords(): ***not working as intended
fin = open('text.txt')
for line in fin:
line = line.strip()
word = list(line)
word.sort()
sortedword = (''.join(word))
if sortedword in mapdic:
print line


On 2008-05-07 19:25:53 -0600, "Kam-Hung Soh" <ka*********@gmail.comsaid:
On Thu, 08 May 2008 11:02:12 +1000, dave <sq*************@1ya2hoo3.net>
wrote:
>Hi All,

I wrote a program that takes a string sequence and finds all the words
>inside a text file (one word per line) and prints them:

def anagfind(letters): #find anagrams of these letters
fin = open('text.txt') #one word per line file
wordbox = [] #this is where the words will go
for line in fin:
word = line.strip()
count = 0
for char in letters:
if char not in word:
break
else:
count += 1
if count == len(word):
wordbox.append(word)
return wordbox

Now I'd like to modify the code to naturally find all anagrams inside
a
>wordlist. What would be the best way to do this? Using Hints? Is it
>possible to iterate over dict keys? How can I make a dict that maps
>from a set of letters to a list of words that are spelled from those
>letters? Wouldn't I need to make the set of letters a key in a dict?

As always - Thanks for helping someone trying to learn...

Dave

Suggestion: for each word, sort their characters and use them as the
dictionary key. If two words have the same combination of characters,
then they are anagrams. For example: "edam" and "made" are anagrams
because they have the letters 'a', 'd', 'e' and 'm'.

Refer "Programming Pearls" by Jon Bentley.

--
Kam-Hung Soh <a href="http://kamhungsoh.com/blog">Software Salariman</ a>

Jun 27 '08 #3
On Thu, 08 May 2008 15:42:07 +1000, dave <sq*************@1ya2hoo3.net
wrote:
This is what i've came up with. My problem is that I can't get them to
properly evaluate.. when comparewords() runs it finds itself... Should
I have the keys of mapdict iterate over itself? Is that possible?

def annafind():
fin = open('text.txt') # file has one word per line
mapdic = {} # each word gets sorted & goes in here
for line in fin:
rawword = line.strip()
word = list(rawword)
word.sort()
mapdic[''.join(word)] = 0
return mapdic
def comparewords(): ***not working as intended
fin = open('text.txt')
for line in fin:
line = line.strip()
word = list(line)
word.sort()
sortedword = (''.join(word))
if sortedword in mapdic:
print line


On 2008-05-07 19:25:53 -0600, "Kam-Hung Soh" <ka*********@gmail.com
said:
>On Thu, 08 May 2008 11:02:12 +1000, dave <sq*************@1ya2hoo3.net>
wrote:
>>Hi All,
I wrote a program that takes a string sequence and finds all the words
>>inside a text file (one word per line) and prints them:
def anagfind(letters): #find anagrams of these letters
fin = open('text.txt') #one word per line file
wordbox = [] #this is where the words will go
for line in fin:
word = line.strip()
count = 0
for char in letters:
if char not in word:
break
else:
count += 1
if count == len(word):
wordbox.append(word)
return wordbox
Now I'd like to modify the code to naturally find all anagrams inside
a
>>wordlist. What would be the best way to do this? Using Hints? Is it
>>possible to iterate over dict keys? How can I make a dict that maps
>>from a set of letters to a list of words that are spelled from those
>>letters? Wouldn't I need to make the set of letters a key in a dict?
As always - Thanks for helping someone trying to learn...
Dave
Suggestion: for each word, sort their characters and use them as the
dictionary key. If two words have the same combination of characters,
then they are anagrams. For example: "edam" and "made" are anagrams
because they have the letters 'a', 'd', 'e' and 'm'.
Refer "Programming Pearls" by Jon Bentley.
--
Kam-Hung Soh <a href="http://kamhungsoh.com/blog">Software Salariman</
a>

Your code is always going to return the same list because every word is an
anagram of itself.

Tip: Create a list for each dictionary key, then add a word to the list if
that word is not in the list. So:

mapdic('adem') --["edam", "made"]

P.S. When replying, the convention is to add your response to the bottom,
not top of the message.

--
Kam-Hung Soh <a href="http://kamhungsoh.com/blog">Software Salariman</a>
Jun 27 '08 #4
>On 2008-05-07 19:25:53 -0600, "Kam-Hung Soh" <ka*********@gmail.com>
>
>said:
>>On Thu, 08 May 2008 11:02:12 +1000, dave <sq*************@1ya2hoo3.ne
t>
>>wrote:

Hi All,
I wrote a program that takes a string sequence and finds all the wo
rds
>>>
inside a text file (one word per line) and prints them:
def anagfind(letters): #find anagrams of these letters
fin = open('text.txt') #one word per line file
wordbox = [] #this is where the words will go
for line in fin:
word = line.strip()
count = 0
for char in letters:
if char not in word:
break
else:
count += 1
if count == len(word):
wordbox.append(word)
return wordbox
Now I'd like to modify the code to naturally find all anagrams insi
de
>>a
wordlist. What would be the best way to do this? Using Hints? Is
it
>>>
possible to iterate over dict keys? How can I make a dict that maps
>>>
from a set of letters to a list of words that are spelled from those
>>>
letters? Wouldn't I need to make the set of letters a key in a dict
?
>>> As always - Thanks for helping someone trying to learn...
Dave

Suggestion: for each word, sort their characters and use them as the
>>dictionary key. If two words have the same combination of characters
,
>>then they are anagrams. For example: "edam" and "made" are anagrams
because they have the letters 'a', 'd', 'e' and 'm'.
Refer "Programming Pearls" by Jon Bentley.
--
Kam-Hung Soh <a href="http://kamhungsoh.com/blog">Software Salarima
n</
>>a>


Your code is always going to return the same list because every word is an
anagram of itself.

Tip: Create a list for each dictionary key, then add a word to the list if
that word is not in the list. So:

mapdic('adem') --["edam", "made"]

P.S. When replying, the convention is to add your response to the bottom ,
not top of the message.

--
Kam-Hung Soh <a href="http://kamhungsoh.com/blog">Software Salariman</ a>
I got it! Thanks for all your help!!! Please tell me what you think:

def anafind():
fin = open('short.txt') #one word per line
mapdic = {} #this dic maps letters to anagrams
for line in fin:
line = line.strip()
templist = sorted(list(line)) #sort letters
newtlist = (''.join(templist)) #join letters
if newtlist not in mapdic:
mapdic[newtlist] = [line]
if line not in mapdic[newtlist]:
mapdic[newtlist].append(line)
for value in mapdic.values():
if len(value) <= 1:
pass
else:
print value
Jun 27 '08 #6
dave <sq*************@1ya2hoo3.netwrites:
if newtlist not in mapdic:
mapdic[newtlist] = [line]
if line not in mapdic[newtlist]:
mapdic[newtlist].append(line)
I'd use the defaultdict module to avoid the above.
for value in mapdic.values():
if len(value) <= 1:
pass
else:
print value
I'd write:

for value in mapdic.values():
if len(value) 1:
print value

I'd actually use itervalues rather than values, but that's a little
bit fancy, you don't have to worry about it til later.
Jun 27 '08 #7
On Fri, 09 May 2008 09:52:53 +1000, dave <sq*************@1ya2hoo3.net
wrote:
I got it! Thanks for all your help!!! Please tell me what you think:

def anafind():
fin = open('short.txt') #one word per line
mapdic = {} #this dic maps letters to anagrams
for line in fin:
line = line.strip()
templist = sorted(list(line)) #sort letters
newtlist = (''.join(templist)) #join letters
if newtlist not in mapdic:
mapdic[newtlist] = [line]
if line not in mapdic[newtlist]:
mapdic[newtlist].append(line)
for value in mapdic.values():
if len(value) <= 1:
pass
else:
print value

I avoid creating a temporary variable if it's only used once immediately,
so I would have written:

newtlist = ''.join(sorted(list(line)))

Looks pretty good, and I found the other tips in parallel responses useful
to me too!

--
Kam-Hung Soh <a href="http://kamhungsoh.com/blog">Software Salariman</a>
Jun 27 '08 #8
On Thu, May 8, 2008 at 7:52 PM, dave <sq*************@1ya2hoo3.netwrote:
I got it! Thanks for all your help!!! Please tell me what you think:
Here's yet another version of the same thing, using defaultdicts and
sets. This way we don't have to test for membership in either the
dictionary or the collection of anagrams. The code is shorter, but no
less readable in my opinion.

from collections import defaultdict

def anafind(words):
"""
Given a sequence of words, return a dictionary with groups
of letters as keys and sets of anagrams as values
"""
anagrams = defaultdict(set)
for word in words:
key = ''.join(sorted(word))
anagrams[key].add(word)
return anagrams

if __name__ == "__main__":
wordlist = ['live', 'evil', 'one', 'nose', 'vile', 'neo']
anagrams = anafind(wordlist)
for letters, words in anagrams.items():
print "%s: %s" % (letters, words)

--
Jerry
Jun 27 '08 #9
"Kam-Hung Soh" <ka*********@gmail.comwrites:
I avoid creating a temporary variable if it's only used once
immediately, so I would have written:

newtlist = ''.join(sorted(list(line)))
Or ''.join(sorted(line))

as sorted() works with any iterable.

--
Arnaud
Jun 27 '08 #10
>>key = ''.join(sorted(word))

I tend to strip and lower the word as well, otherwise "Hello" and
"hello" do not compare...depends on what you want though!
Plus you might get a lot of "word\n" as keys...

My technique is the this way

def anagram_finder(words):
anagrams = {}
for word in words:
word = word.strip()
key = ''.join(sorted(word.lower()))
anagrams.setdefault(key, []).append(word)
return anagrams
Jun 27 '08 #11
On May 9, 1:45 am, cokofree...@gmail.com wrote:
>key = ''.join(sorted(word))

I tend to strip and lower the word as well, otherwise "Hello" and
"hello" do not compare...depends on what you want though!
Plus you might get a lot of "word\n" as keys...

My technique is the this way

def anagram_finder(words):
anagrams = {}
for word in words:
word = word.strip()
key = ''.join(sorted(word.lower()))
anagrams.setdefault(key, []).append(word)
return anagrams
What would be the best method to print the top results, the one's that
had the highest amount of anagrams?? Create a new histogram dict?
Jun 27 '08 #12
um******@gmail.com writes:
On May 9, 1:45 am, cokofree...@gmail.com wrote:
>>key = ''.join(sorted(word))

I tend to strip and lower the word as well, otherwise "Hello" and
"hello" do not compare...depends on what you want though!
Plus you might get a lot of "word\n" as keys...

My technique is the this way

def anagram_finder(words):
anagrams = {}
for word in words:
word = word.strip()
key = ''.join(sorted(word.lower()))
anagrams.setdefault(key, []).append(word)
return anagrams

What would be the best method to print the top results, the one's that
had the highest amount of anagrams?? Create a new histogram dict?
You can use the max() function to find the biggest list of anagrams:

top_results = max(anagrams.itervalues(), key=len)

--
Arnaud
Jun 27 '08 #13
What would be the best method to print the top results, the one's that
had the highest amount of anagrams?? Create a new histogram dict?

You can use the max() function to find the biggest list of anagrams:

top_results = max(anagrams.itervalues(), key=len)

--
Arnaud
That is the biggest list of anagrams, what if I wanted the 3 biggest
lists? Is there a way to specific top three w/ a max command??

Jun 27 '08 #14
On Sat, 10 May 2008 07:19:38 +1000, <um******@gmail.comwrote:
What would be the best method to print the top results, the one's that
had the highest amount of anagrams?? Create a new histogram dict?

You can use the max() function to find the biggest list of anagrams:

top_results = max(anagrams.itervalues(), key=len)

--
Arnaud

That is the biggest list of anagrams, what if I wanted the 3 biggest
lists? Is there a way to specific top three w/ a max command??
Built-in max() function only returns one result.

My solution is to make a sorted list and return last three items:

sorted((len(anagrams[key]), key) for key in anagrams.keys())[-3:]

--
Kam-Hung Soh <a href="http://kamhungsoh.com/blog">Software Salariman</a>
Jun 27 '08 #15
On May 9, 5:19*pm, umpsu...@gmail.com wrote:
What would be the best method to print the top results, the one's that
had the highest amount of anagrams?? *Create a new histogram dict?
You can use the max() function to find the biggest list of anagrams:
top_results = max(anagrams.itervalues(), key=len)
--
Arnaud

That is the biggest list of anagrams, what if I wanted the 3 biggest
lists? *Is there a way to specific top three w/ a max command??
>>import heapq
help(heapq.nlargest)
Help on function nlargest in module heapq:

nlargest(n, iterable, key=None)
Find the n largest elements in a dataset.

Equivalent to: sorted(iterable, key=key, reverse=True)[:n]
HTH,
George
Jun 27 '08 #16
On 2008-05-09 18:53:19 -0600, George Sakkis <ge***********@gmail.comsaid:
On May 9, 5:19*pm, umpsu...@gmail.com wrote:
>>>What would be the best method to print the top results, the one's that
>>>had the highest amount of anagrams?? *Create a new histogram dict?
>>You can use the max() function to find the biggest list of anagrams:
>>top_results = max(anagrams.itervalues(), key=len)
>>--
Arnaud

That is the biggest list of anagrams, what if I wanted the 3 biggest
lists? *Is there a way to specific top three w/ a max command??
>>>import heapq
help(heapq.nlargest)
Help on function nlargest in module heapq:

nlargest(n, iterable, key=None)
Find the n largest elements in a dataset.

Equivalent to: sorted(iterable, key=key, reverse=True)[:n]
HTH,
George
I both the 'nlargest' and the 'sorted' methods.. I could only get the
sorted to work.. however it would only return values like (3, edam)
not the list of words..

Here is what I have now.. Am I over-analyzing this? It doesn't seem
like it should be this hard to get the program to print the largest set
of anagrams first...
def anafind():
fin = open('text.txt')
mapdic = {}
finalres = [] # top answers go here
for line in fin:
line = line.strip()
alphaword = ''.join(sorted(line)) #sorted word as key
if alphaword not in mapdic:
mapdic[alphaword] = [line]
else:
mapdic[alphaword].append(line)
topans = sorted((len(mapdic[key]), key) for key in mapdic.keys())[-3:]
#top 3 answers
for key, value in topans: #
finalres.append(mapdic[value])
print finalres

Jun 27 '08 #17
On May 9, 11:19*pm, dave <squareswallo...@1ya2hoo3.netwrote:
On 2008-05-09 18:53:19 -0600, George Sakkis <george.sak...@gmail.comsaid:
On May 9, 5:19*pm, umpsu...@gmail.com wrote:
>>What would be the best method to print the top results, the one's that
>>had the highest amount of anagrams?? *Create a new histogram dict?
>You can use the max() function to find the biggest list of anagrams:
>top_results = max(anagrams.itervalues(), key=len)
>--
Arnaud
That is the biggest list of anagrams, what if I wanted the 3 biggest
lists? *Is there a way to specific top three w/ a max command??
>>import heapq
help(heapq.nlargest)
Help on function nlargest in module heapq:
nlargest(n, iterable, key=None)
* * Find the n largest elements in a dataset.
* * Equivalent to: *sorted(iterable, key=key, reverse=True)[:n]
HTH,
George

I both the 'nlargest' and the 'sorted' methods.. I could only get the
sorted to work.. however it would only return values like (3, *edam)
not the list of words..

Here is what I have now.. Am I over-analyzing this? *It doesn't seem
like it should be this hard to get the program to print the largest set
of anagrams first...

def anafind():
* * * * fin = open('text.txt')
* * * * mapdic = {}
* * * * finalres = [] * * * * * * * * * # top answers go here
* * * * for line in fin:
* * * * * * * * line = line.strip()
* * * * * * * * alphaword = ''.join(sorted(line)) * * * #sorted word as key
* * * * * * * * if alphaword not in mapdic:
* * * * * * * * * * * * mapdic[alphaword] = [line]
* * * * * * * * else:
* * * * * * * * * * * * mapdic[alphaword].append(line)
* * * * topans = sorted((len(mapdic[key]), key) for key in mapdic.keys())[-3:]
* #top 3 answers
* * * * for key, value in topans: * * * #
* * * * * * * * finalres.append(mapdic[value])
* * * * print finalres
Here is a working, cleaned up version:

from heapq import nlargest
from collections import defaultdict

def anagrams(words, top=None):
key2words = defaultdict(set)
for word in words:
word = word.strip()
key = ''.join(sorted(word))
key2words[key].add(word)
if top is None:
return sorted(key2words.itervalues(), key=len, reverse=True)
else:
return nlargest(top, key2words.itervalues(), key=len)

if __name__ == '__main__':
wordlist = ['live', 'evil', 'one', 'nose', 'vile', 'neo']
for words in anagrams(wordlist,2):
print words
By the way, try to generalize your functions (and rest of the code for
that matter) so that it can be reused easily. For example, hardcoding
the input file name in the function's body is usually undesirable.
Similarly for other constants like 'get top 3 answers'; it doesn't
cost you anything to replace 3 with 'top' and pass it as an argument
(or set it as default top=3 if that's the default case).

HTH,
George
Jun 27 '08 #18
"Kam-Hung Soh" <ka*********@gmail.comwrites:
On Sat, 10 May 2008 07:19:38 +1000, <um******@gmail.comwrote:
>What would be the best method to print the top results, the one's that
had the highest amount of anagrams?? Create a new histogram dict?

You can use the max() function to find the biggest list of anagrams:

top_results = max(anagrams.itervalues(), key=len)

--
Arnaud

That is the biggest list of anagrams, what if I wanted the 3 biggest
lists? Is there a way to specific top three w/ a max command??

Built-in max() function only returns one result.

My solution is to make a sorted list and return last three items:

sorted((len(anagrams[key]), key) for key in anagrams.keys())[-3:]
Using the key= parameter of sorted() beats explicit DSU:

sorted(anagrams.itervalues(), key=len)[-3:]

Or even

sorted(anagrams.itervalues(), key=len, reverse=True)[:3]

--
Arnaud
Jun 27 '08 #19
On Sat, 10 May 2008 18:06:17 +1000, Arnaud Delobelle
<ar*****@googlemail.comwrote:
"Kam-Hung Soh" <ka*********@gmail.comwrites:
>On Sat, 10 May 2008 07:19:38 +1000, <um******@gmail.comwrote:
>>What would be the best method to print the top results, the one's
that
had the highest amount of anagrams?? Create a new histogram dict?

You can use the max() function to find the biggest list of anagrams:

top_results = max(anagrams.itervalues(), key=len)

--
Arnaud

That is the biggest list of anagrams, what if I wanted the 3 biggest
lists? Is there a way to specific top three w/ a max command??

Built-in max() function only returns one result.

My solution is to make a sorted list and return last three items:

sorted((len(anagrams[key]), key) for key in anagrams.keys())[-3:]

Using the key= parameter of sorted() beats explicit DSU:

sorted(anagrams.itervalues(), key=len)[-3:]

Or even

sorted(anagrams.itervalues(), key=len, reverse=True)[:3]
Nice!

I just found out that DSU = Decorate-Sort-Undecorate idiom, from
http://wiki.python.org/moin/HowTo/Sorting.

--
Kam-Hung Soh <a href="http://kamhungsoh.com/blog">Software Salariman</a>
Jun 27 '08 #20

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by GrelEns | last post: by
17 posts views Thread by Pierre Fortin | last post: by
25 posts views Thread by Steven Bethard | last post: by
7 posts views Thread by Lowell Kirsh | last post: by
7 posts views Thread by Marcio Rosa da Silva | last post: by
2 posts views Thread by Alex | last post: by
3 posts views Thread by bruce | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.