By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
464,504 Members | 793 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 464,504 IT Pros & Developers. It's quick & easy.

Parse a file by multiple delimiters

P: 2
I have a file with multiple lines of words for example:
office space.

I want to delimit each word by space/newline/punctuation so that I have a list with "groovy", "office", "space", "bypass", "bad".
So far I have this code written:
Expand|Select|Wrap|Line Numbers
  1. f = open('test.txt', 'r')
  2.     lines =
  3.     print(lines)
  4.     f.close()
However, this only gets me spaces and new lines... not punctuation. Can anyone help me?
Sep 28 '14 #1
Share this Question
Share on Google+
3 Replies

Expert 100+
P: 626
You would then separate "isn't" into "isn" and "t". Sentences ending with punctuation will have a space following it so spaces should get you what you want. If not, please include a sample of the input that would require a split on punctuation, and the desired output.

Note that "office space" can be divided by splitting each line on a space. Previous post about iterating over a file.
Sep 29 '14 #2

P: 2
This is true, but you can assume isn't won't be a case. Also, office space can be split with spaces, but the period would still be there. I think I figured it out anyway on my own though, but if you still have a suggestion i'm open to it.
Sep 29 '14 #3

Expert 100+
P: 626
Don't know how you can get "office space." using
Expand|Select|Wrap|Line Numbers
  1. lines = 
which is different from split("\n"), but anyway...the code is the same either way

You would only keep characters that are letters, which removes numbers, punctuation, and any other non-letter characters (removing punctuation instead of keeping letters is "bass-ackwards" as you automatically keep anything that you didn't think of). A solution using list comprehension.
Expand|Select|Wrap|Line Numbers
  1. import string
  3. test_lines="""groo**vy
  4. office space."""
  5. words = test_lines.split()
  6. ## print while testing
  7. print words
  8. print string.letters
  10. for word in words:
  11.     new_word =[chr for chr in word if chr in string.letters]
  12.     print "".join(new_word) 
Sep 29 '14 #4

Post your reply

Sign in to post your reply or Sign up for a free account.