By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,002 Members | 1,171 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,002 IT Pros & Developers. It's quick & easy.

Parsing quoted values separated by spaces

P: 5
Hi, I'm beating my against the wall to parse a file like this:

Expand|Select|Wrap|Line Numbers
  1. Value1  10  "A string which may contain \"quotes\" or slashes \\" anothervalue
  2. Value2  11  "A string which may contain \"quotes\" or slashes \\"
  3. Value2  "A string which may contain \"quotes\" or slashes \\"    "another quote"
  4.  
I want to split it to a list.
Tried with string.split("\"") etc. But ran into problem since I don't know when the line contains several quotes.

Every value is seperated by one or more spaces (except in the quote of course which is always one value)

Please help me to get started.

Thanks,
Andreas
Dec 4 '07 #1
Share this Question
Share on Google+
5 Replies


bvdet
Expert Mod 2.5K+
P: 2,851
Hi, I'm beating my against the wall to parse a file like this:

Expand|Select|Wrap|Line Numbers
  1. Value1  10  "A string which may contain \"quotes\" or slashes \\" anothervalue
  2. Value2  11  "A string which may contain \"quotes\" or slashes \\"
  3. Value2  "A string which may contain \"quotes\" or slashes \\"    "another quote"
  4.  
I want to split it to a list.
Tried with string.split("\"") etc. But ran into problem since I don't know when the line contains several quotes.

Every value is seperated by one or more spaces (except in the quote of course which is always one value)

Please help me to get started.

Thanks,
Andreas
The following uses a combination of the str.replace() method and re.findall():
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. fStr = open('your_file').read()
  4. fStr = fStr.replace('\\\\"', '"').replace('\\"', '')
  5. print fStr
  6.  
  7. patt = re.compile('\".+?\"|\S+')
  8. for item in fStr.split('\n'):
  9.     print [s.strip('"') for s in patt.findall(item)]
Output:
>>> Value1 10 "A string which may contain quotes or slashes " anothervalue
Value2 11 "A string which may contain quotes or slashes "
Value2 "A string which may contain quotes or slashes " "another quote"
['Value1', '10', 'A string which may contain quotes or slashes ', 'anothervalue']
['Value2', '11', 'A string which may contain quotes or slashes ']
['Value2', 'A string which may contain quotes or slashes ', 'another quote']
>>>
Dec 4 '07 #2

P: 5
Thanks alot. Guess I just have to learn Regular Expressions to understand this... :-)
Dec 5 '07 #3

P: 1
Sorry, but this proposed solution does not preserve the literal quote or slash characters that should have been present in the final output.

From my understanding of the original request, I expected the "sentence" to appear as follows, inside each of the three output lines:

Expand|Select|Wrap|Line Numbers
  1. A string which may contain "quotes" or slashes \
... and not as:

Expand|Select|Wrap|Line Numbers
  1. A string which may contain quotes or slashes
Does anyone have a correct solution?
Mar 18 '08 #4

P: 43
It looks like you want to omit any slash+quote and find a quote only, or a slash+quote+space if it is at the end of a string. So if it is a quote only or a slash+quote+space (or /n), add one to some counter and if the counter is even, slice the string after that point and replace any slash+character with the chacter only. But it's difficult to tell from the 3 lines given.
Mar 19 '08 #5

Subsciber123
P: 87
Not knowing regular expressions to any useful extent, I would at least create a temporary (naively inefficient) solution until you can come up with a better one.

You could use the find() method of the string to find the first quote, check if it is escaped by checking the character behind it, split the string, search the second part of the string, et cetera. Although this is incredibly inefficient, it is better than a state machine (even less efficient) and will work for the time being. Just trying to give you an idea, not actual code. I've done something like this before, but I think I gave up and resorted to using a state machine that progressed linearly through the string one character at a time. However, this was in a prospective code-conversion software (nowhere close to finished, may eventually turn a subset of python to ugly c code), and so speed didn't matter (since it was going to be able to convert itself to ugly c).

Good luck!
Mar 20 '08 #6

Post your reply

Sign in to post your reply or Sign up for a free account.