472,101 Members | 1,503 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,101 software developers and data experts.

Using the split command in a list

26
If you have a list how do you use the split command to split each line into word, and freqency?
Jun 10 '07 #1
14 39852
Smygis
126 100+
If you have a list how do you use the split command to split each line into word, and freqency?
What?

Like:
Expand|Select|Wrap|Line Numbers
  1. >>> ListOWords = ["Hello Wold", "I am a", "list of words"]
  2. >>> Stuff = [i.split() for i in ListOWords]
  3. >>> Stuff
  4. [['Hello', 'Wold'], ['I', 'am', 'a'], ['list', 'of', 'words']]
  5. >>> 
  6.  
I guess its totaly offbase but i have no idea what you want.
Jun 10 '07 #2
bvdet
2,851 Expert Mod 2GB
If you have a list how do you use the split command to split each line into word, and freqency?
Do you have a list, a line, or a list of lines? Maybe this will help:
Expand|Select|Wrap|Line Numbers
  1. >>> sentences = 'This is a sentence that we are going to split. We will also determine the frequency of each word. This sentence is here just for the heck of it.'
  2. >>> wordList = [s.lower() for s in sentences.split()]
  3. >>> wordCnt = [wordList.count(w) for w in wordList]
  4. >>> dd = dict(zip(wordList,wordCnt))
  5. >>> for item in dd:
  6. ...     print "Word '%s' occurs %d times." % (item, dd[item])
  7. ...     
  8. Word 'just' occurs 1 times.
  9. Word 'sentence' occurs 2 times.
  10. Word 'is' occurs 2 times.
  11. Word 'word.' occurs 1 times.
  12. Word 'frequency' occurs 1 times.
  13. Word 'are' occurs 1 times.
  14. Word 'determine' occurs 1 times.
  15. Word 'for' occurs 1 times.
  16. Word 'to' occurs 1 times.
  17. Word 'also' occurs 1 times.
  18. Word 'going' occurs 1 times.
  19. Word 'split.' occurs 1 times.
  20. Word 'it.' occurs 1 times.
  21. Word 'we' occurs 2 times.
  22. Word 'that' occurs 1 times.
  23. Word 'here' occurs 1 times.
  24. Word 'a' occurs 1 times.
  25. Word 'this' occurs 2 times.
  26. Word 'of' occurs 2 times.
  27. Word 'will' occurs 1 times.
  28. Word 'heck' occurs 1 times.
  29. Word 'each' occurs 1 times.
  30. Word 'the' occurs 2 times.
  31. >>>
Here's a way to get a list of words from lines from a file without using split:
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. lineList = open(r'X:/path/subdir/name_of_file').readlines()
  4. pat = "\w+"
  5. wordList = []
  6.  
  7. for line in lineList:
  8.     wordList += [w.lower() for w in re.findall(pat,line)]
  9.  
  10. wordCnt = [wordList.count(w) for w in wordList]
  11.  
  12. dd = dict(zip(wordList,wordCnt))
  13.  
  14. for item in dd:
  15.     print "Word '%s' occurs %d times." % (item, dd[item])
This way will exclude any punctuation. If you have a list of lines and you don't care about possible punctuation:
Expand|Select|Wrap|Line Numbers
  1. >>> wordList = []
  2. >>> for line in lineList:
  3. ...     wordList += [s.lower() for s in line.strip().split()]
Jun 11 '07 #3
bartonc
6,596 Expert 4TB
I alway like to add this little touch of elegance:
Expand|Select|Wrap|Line Numbers
  1. >>> for item in dd:
  2. ...     i = dd[item]
  3. ...     print "Word '%s' occurs %d time%s." % (item, i, ('s', '')[int(i == 1)])
Jun 11 '07 #4
bvdet
2,851 Expert Mod 2GB
I alway like to add this little touch of elegance:
Expand|Select|Wrap|Line Numbers
  1. >>> for item in dd:
  2. ...     i = dd[item]
  3. ...     print "Word '%s' occurs %d time%s." % (item, i, ('s', '')[int(i == 1)])
Barton,

You have posted something similar to this before. I am beginning to catch on. Thanks! :)

BV
Jun 11 '07 #5
bartonc
6,596 Expert 4TB
Barton,

You have posted something similar to this before. I am beginning to catch on. Thanks! :)

BV
Nope. I think that this is the first opportunity. It comes up often in GUI programming where (say) you have a RadioButton and you want the screen to reflect its state elsewhere, as in:
Expand|Select|Wrap|Line Numbers
  1. flag = aRadioButton.GetState() # actually an int, not bool
  2. stateStr = ("Off", "On")[flag]  # tuples require int indexes so there is often a cast from bool to int
I've always felt that software should be smart enough to know if it is relaying data about a thing or several things. To me it's a glaring omission on the part of the programmer when the user is told that he has 1 things.
Jun 11 '07 #6
bvdet
2,851 Expert Mod 2GB
Nope. I think that this is the first opportunity. It comes up often in GUI programming where (say) you have a RadioButton and you want the screen to reflect its state elsewhere, as in:
Expand|Select|Wrap|Line Numbers
  1. flag = aRadioButton.GetState() # actually an int, not bool
  2. stateStr = ("Off", "On")[flag]  # tuples require int indexes so there is often a cast from bool to int
I've always felt that software should be smart enough to know if it is relaying data about a thing or several things. To me it's a glaring omission on the part of the programmer when the user is told that he has 1 things.
This is the snippet I was referring to. I had never thought of supplying a sliced tuple or list as an argument to a string format character.
Expand|Select|Wrap|Line Numbers
  1. # test utility functions and rules
  2. for i in range(20):
  3.     RoleDice(dice)
  4.     PrintDice(dice)
  5.     print "All dice are%sequal" % [" not ", " "][AllEqual(dice)]
  6.     print
Jun 11 '07 #7
ilikepython
844 Expert 512MB
I alway like to add this little touch of elegance:
Expand|Select|Wrap|Line Numbers
  1. >>> for item in dd:
  2. ...     i = dd[item]
  3. ...     print "Word '%s' occurs %d time%s." % (item, i, ('s', '')[int(i == 1)])
Hmm, that's a clever way of doing it, never would have thought of it.
Jun 11 '07 #8
bartonc
6,596 Expert 4TB
This is the snippet I was referring to. I had never thought of supplying a sliced tuple or list as an argument to a string format character.
Expand|Select|Wrap|Line Numbers
  1. # test utility functions and rules
  2. for i in range(20):
  3.     RoleDice(dice)
  4.     PrintDice(dice)
  5.     print "All dice are%sequal" % [" not ", " "][AllEqual(dice)]
  6.     print
Yep. I figured that. As soon as I submitted, I thought "that's not on point", but what's done is done (sort of).
Jun 11 '07 #9
texas22
26
Ok, this is kind of making sense so once I have pulled out say the three longest, shortest, and middle words what syntax do I use to tell it to take those words and split each word onto a line listing the frequency or number of times each of the words occurs in the list.
Jun 11 '07 #10
bartonc
6,596 Expert 4TB
Ok, this is kind of making sense so once I have pulled out say the three longest, shortest, and middle words what syntax do I use to tell it to take those words and split each word onto a line listing the frequency or number of times each of the words occurs in the list.
I s'pose you could use some complicated method of keeping track, or you can just do this:
Expand|Select|Wrap|Line Numbers
  1. >>> anStr = "The fat cat ran into a fat cow"
  2. >>> anStr.count("fat")
  3. 2
  4. >>> 
Basically, in Python, any time you think that an object should/could have a certain functionality, just go to the interactive interpreter and try it. (This is exactly what I have done above). If it doesn't work, then turn to the Docs. If that fails, well, you know - ask somebody.
Jun 11 '07 #11
bartonc
6,596 Expert 4TB
I s'pose you could use some complicated method of keeping track, or you can just do this:
Expand|Select|Wrap|Line Numbers
  1. >>> anStr = "The fat cat ran into a fat cow"
  2. >>> anStr.count("fat")
  3. 2
  4. >>> 
Basically, in Python, any time you think that an object should/could have a certain functionality, just go to the interactive interpreter and try it. (This is exactly what I have done above). If it doesn't work, then turn to the Docs. If that fails, well, you know - ask somebody.
Sorry, I forgot that we were working on lists:
Expand|Select|Wrap|Line Numbers
  1. >>> aList = anStr.split()
  2. >>> aList
  3. ['The', 'fat', 'cat', 'ran', 'into', 'a', 'fat', 'cow']
  4. >>> aList.count('fat')
  5. 2
  6. >>> 
Jun 12 '07 #12
texas22
26
Is there a way I can use a three dimensional array that will output the number of times a word appears in a list where in len in the first, the words in the second, and the frequency in the third?
Jun 12 '07 #13
Smygis
126 100+
Is there a way I can use a three dimensional array that will output the number of times a word appears in a list where in len in the first, the words in the second, and the frequency in the third?
something like:


Expand|Select|Wrap|Line Numbers
  1. thing = [[len(word), word, listOfWords.count(word)] for word in listOfWords]
  2.  
Jun 12 '07 #14
texas22
26
Thanks for the help that helped make sense of it
Jun 12 '07 #15

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

1 post views Thread by Mike P | last post: by
reply views Thread by den 2005 | last post: by
debasisdas
reply views Thread by debasisdas | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.