472,101 Members | 1,503 Online

# Using the split command in a list

26
If you have a list how do you use the split command to split each line into word, and freqency?
Jun 10 '07 #1
14 39852
Smygis
126 100+
If you have a list how do you use the split command to split each line into word, and freqency?
What?

Like:
Expand|Select|Wrap|Line Numbers
1. >>> ListOWords = ["Hello Wold", "I am a", "list of words"]
2. >>> Stuff = [i.split() for i in ListOWords]
3. >>> Stuff
4. [['Hello', 'Wold'], ['I', 'am', 'a'], ['list', 'of', 'words']]
5. >>>
6.
I guess its totaly offbase but i have no idea what you want.
Jun 10 '07 #2
bvdet
2,851 Expert Mod 2GB
If you have a list how do you use the split command to split each line into word, and freqency?
Do you have a list, a line, or a list of lines? Maybe this will help:
Expand|Select|Wrap|Line Numbers
1. >>> sentences = 'This is a sentence that we are going to split. We will also determine the frequency of each word. This sentence is here just for the heck of it.'
2. >>> wordList = [s.lower() for s in sentences.split()]
3. >>> wordCnt = [wordList.count(w) for w in wordList]
4. >>> dd = dict(zip(wordList,wordCnt))
5. >>> for item in dd:
6. ...     print "Word '%s' occurs %d times." % (item, dd[item])
7. ...
8. Word 'just' occurs 1 times.
9. Word 'sentence' occurs 2 times.
10. Word 'is' occurs 2 times.
11. Word 'word.' occurs 1 times.
12. Word 'frequency' occurs 1 times.
13. Word 'are' occurs 1 times.
14. Word 'determine' occurs 1 times.
15. Word 'for' occurs 1 times.
16. Word 'to' occurs 1 times.
17. Word 'also' occurs 1 times.
18. Word 'going' occurs 1 times.
19. Word 'split.' occurs 1 times.
20. Word 'it.' occurs 1 times.
21. Word 'we' occurs 2 times.
22. Word 'that' occurs 1 times.
23. Word 'here' occurs 1 times.
24. Word 'a' occurs 1 times.
25. Word 'this' occurs 2 times.
26. Word 'of' occurs 2 times.
27. Word 'will' occurs 1 times.
28. Word 'heck' occurs 1 times.
29. Word 'each' occurs 1 times.
30. Word 'the' occurs 2 times.
31. >>>
Here's a way to get a list of words from lines from a file without using split:
Expand|Select|Wrap|Line Numbers
1. import re
2.
4. pat = "\w+"
5. wordList = []
6.
7. for line in lineList:
8.     wordList += [w.lower() for w in re.findall(pat,line)]
9.
10. wordCnt = [wordList.count(w) for w in wordList]
11.
12. dd = dict(zip(wordList,wordCnt))
13.
14. for item in dd:
15.     print "Word '%s' occurs %d times." % (item, dd[item])
This way will exclude any punctuation. If you have a list of lines and you don't care about possible punctuation:
Expand|Select|Wrap|Line Numbers
1. >>> wordList = []
2. >>> for line in lineList:
3. ...     wordList += [s.lower() for s in line.strip().split()]
Jun 11 '07 #3
bartonc
6,596 Expert 4TB
I alway like to add this little touch of elegance:
Expand|Select|Wrap|Line Numbers
1. >>> for item in dd:
2. ...     i = dd[item]
3. ...     print "Word '%s' occurs %d time%s." % (item, i, ('s', '')[int(i == 1)])
Jun 11 '07 #4
bvdet
2,851 Expert Mod 2GB
I alway like to add this little touch of elegance:
Expand|Select|Wrap|Line Numbers
1. >>> for item in dd:
2. ...     i = dd[item]
3. ...     print "Word '%s' occurs %d time%s." % (item, i, ('s', '')[int(i == 1)])
Barton,

You have posted something similar to this before. I am beginning to catch on. Thanks! :)

BV
Jun 11 '07 #5
bartonc
6,596 Expert 4TB
Barton,

You have posted something similar to this before. I am beginning to catch on. Thanks! :)

BV
Nope. I think that this is the first opportunity. It comes up often in GUI programming where (say) you have a RadioButton and you want the screen to reflect its state elsewhere, as in:
Expand|Select|Wrap|Line Numbers
1. flag = aRadioButton.GetState() # actually an int, not bool
2. stateStr = ("Off", "On")[flag]  # tuples require int indexes so there is often a cast from bool to int
I've always felt that software should be smart enough to know if it is relaying data about a thing or several things. To me it's a glaring omission on the part of the programmer when the user is told that he has 1 things.
Jun 11 '07 #6
bvdet
2,851 Expert Mod 2GB
Nope. I think that this is the first opportunity. It comes up often in GUI programming where (say) you have a RadioButton and you want the screen to reflect its state elsewhere, as in:
Expand|Select|Wrap|Line Numbers
1. flag = aRadioButton.GetState() # actually an int, not bool
2. stateStr = ("Off", "On")[flag]  # tuples require int indexes so there is often a cast from bool to int
I've always felt that software should be smart enough to know if it is relaying data about a thing or several things. To me it's a glaring omission on the part of the programmer when the user is told that he has 1 things.
This is the snippet I was referring to. I had never thought of supplying a sliced tuple or list as an argument to a string format character.
Expand|Select|Wrap|Line Numbers
1. # test utility functions and rules
2. for i in range(20):
3.     RoleDice(dice)
4.     PrintDice(dice)
5.     print "All dice are%sequal" % [" not ", " "][AllEqual(dice)]
6.     print
Jun 11 '07 #7
ilikepython
844 Expert 512MB
I alway like to add this little touch of elegance:
Expand|Select|Wrap|Line Numbers
1. >>> for item in dd:
2. ...     i = dd[item]
3. ...     print "Word '%s' occurs %d time%s." % (item, i, ('s', '')[int(i == 1)])
Hmm, that's a clever way of doing it, never would have thought of it.
Jun 11 '07 #8
bartonc
6,596 Expert 4TB
This is the snippet I was referring to. I had never thought of supplying a sliced tuple or list as an argument to a string format character.
Expand|Select|Wrap|Line Numbers
1. # test utility functions and rules
2. for i in range(20):
3.     RoleDice(dice)
4.     PrintDice(dice)
5.     print "All dice are%sequal" % [" not ", " "][AllEqual(dice)]
6.     print
Yep. I figured that. As soon as I submitted, I thought "that's not on point", but what's done is done (sort of).
Jun 11 '07 #9
texas22
26
Ok, this is kind of making sense so once I have pulled out say the three longest, shortest, and middle words what syntax do I use to tell it to take those words and split each word onto a line listing the frequency or number of times each of the words occurs in the list.
Jun 11 '07 #10
bartonc
6,596 Expert 4TB
Ok, this is kind of making sense so once I have pulled out say the three longest, shortest, and middle words what syntax do I use to tell it to take those words and split each word onto a line listing the frequency or number of times each of the words occurs in the list.
I s'pose you could use some complicated method of keeping track, or you can just do this:
Expand|Select|Wrap|Line Numbers
1. >>> anStr = "The fat cat ran into a fat cow"
2. >>> anStr.count("fat")
3. 2
4. >>>
Basically, in Python, any time you think that an object should/could have a certain functionality, just go to the interactive interpreter and try it. (This is exactly what I have done above). If it doesn't work, then turn to the Docs. If that fails, well, you know - ask somebody.
Jun 11 '07 #11
bartonc
6,596 Expert 4TB
I s'pose you could use some complicated method of keeping track, or you can just do this:
Expand|Select|Wrap|Line Numbers
1. >>> anStr = "The fat cat ran into a fat cow"
2. >>> anStr.count("fat")
3. 2
4. >>>
Basically, in Python, any time you think that an object should/could have a certain functionality, just go to the interactive interpreter and try it. (This is exactly what I have done above). If it doesn't work, then turn to the Docs. If that fails, well, you know - ask somebody.
Sorry, I forgot that we were working on lists:
Expand|Select|Wrap|Line Numbers
1. >>> aList = anStr.split()
2. >>> aList
3. ['The', 'fat', 'cat', 'ran', 'into', 'a', 'fat', 'cow']
4. >>> aList.count('fat')
5. 2
6. >>>
Jun 12 '07 #12
texas22
26
Is there a way I can use a three dimensional array that will output the number of times a word appears in a list where in len in the first, the words in the second, and the frequency in the third?
Jun 12 '07 #13
Smygis
126 100+
Is there a way I can use a three dimensional array that will output the number of times a word appears in a list where in len in the first, the words in the second, and the frequency in the third?
something like:

Expand|Select|Wrap|Line Numbers
1. thing = [[len(word), word, listOfWords.count(word)] for word in listOfWords]
2.
Jun 12 '07 #14
texas22
26
Thanks for the help that helped make sense of it
Jun 12 '07 #15