473,386 Members | 1,606 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Parse a file by multiple delimiters

I have a file with multiple lines of words for example:
groovy
office space.
bypass
bad

I want to delimit each word by space/newline/punctuation so that I have a list with "groovy", "office", "space", "bypass", "bad".
So far I have this code written:
Expand|Select|Wrap|Line Numbers
  1. f = open('test.txt', 'r')
  2.     lines = f.read().split()
  3.     print(lines)
  4.     f.close()
  5.  
However, this only gets me spaces and new lines... not punctuation. Can anyone help me?
Sep 28 '14 #1
3 4242
dwblas
626 Expert 512MB
You would then separate "isn't" into "isn" and "t". Sentences ending with punctuation will have a space following it so spaces should get you what you want. If not, please include a sample of the input that would require a split on punctuation, and the desired output.

Note that "office space" can be divided by splitting each line on a space. Previous post about iterating over a file.
Sep 29 '14 #2
This is true, but you can assume isn't won't be a case. Also, office space can be split with spaces, but the period would still be there. I think I figured it out anyway on my own though, but if you still have a suggestion i'm open to it.
Sep 29 '14 #3
dwblas
626 Expert 512MB
Don't know how you can get "office space." using
Expand|Select|Wrap|Line Numbers
  1. lines = f.read().split() 
which is different from split("\n"), but anyway...the code is the same either way

You would only keep characters that are letters, which removes numbers, punctuation, and any other non-letter characters (removing punctuation instead of keeping letters is "bass-ackwards" as you automatically keep anything that you didn't think of). A solution using list comprehension.
Expand|Select|Wrap|Line Numbers
  1. import string
  2.  
  3. test_lines="""groo**vy
  4. office space."""
  5. words = test_lines.split()
  6. ## print while testing
  7. print words
  8. print string.letters
  9.  
  10. for word in words:
  11.     new_word =[chr for chr in word if chr in string.letters]
  12.     print "".join(new_word) 
Sep 29 '14 #4

Sign in to post your reply or Sign up for a free account.

Similar topics

3
by: Boris ©avc | last post by:
How do I read data from file with delimiters (for instance A;B;ccc;S45A;UU)? I'd like to write that data to MYSQL table... Thanks for the help, Boris Savc
1
by: Harobed | last post by:
Hi, I have a xml file encode in ISO-8859-15. When xml.dom parse this file, it send this error : xml.parsers.expat.ExpatError: not well-formed (invalid token): line 9, column 46 Line 9 content...
7
by: Guyon Morée | last post by:
If I have multiple threads reading from the same file, would that be a problem? if yes, how would I solve it? Let's say I want to take it a step further and start writing to 1 file form...
19
by: Peter A. Schott | last post by:
I've got a file that seems to come across more like a dictionary from what I can tell. Something like the following format: ###,1,val_1,2,val_2,3,val_3,5,val_5,10,val_10...
3
by: IWP506 | last post by:
Hey, I have a lot of common things I want to be included on different pages (i.e. the page title, the header, some buttons and such, etc.). So I was thinking of putting things like...
1
by: vang | last post by:
How do I find out which delimiter if found/used when splitting a string with multiple delimiters are defined in a char array? Example: dim i as integer dim returnText as string dim InputText...
1
by: pencoder | last post by:
Actually i wrote C# codings to read text between delimiters.it works nicely. in my codings i have a condition it checks whether the delimiter indexed first time or the second time.based on that i...
2
by: Mersilla | last post by:
Dear All, I am currently new to perl and is experiencing difficulties with splitting and array with multiple conditions. For example , I would like to spilt an element, which contains one or...
1
by: gopiganguly | last post by:
Hi everyone, There is a small problem encountered while creating a package in sql server 2005. Actually i am using a flat file which has 820 rows and 2 columns which are seperated by line...
3
by: nuatpy | last post by:
I have a csv file that is mainly comma seperated, but one of the fields has currency values in the form "$1,378.59". By just specifying a delimiter:',' in my dialect, the above value gets parsed as...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.