473,513 Members | 2,403 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Parsing quoted values separated by spaces

5 New Member
Hi, I'm beating my against the wall to parse a file like this:

Expand|Select|Wrap|Line Numbers
  1. Value1  10  "A string which may contain \"quotes\" or slashes \\" anothervalue
  2. Value2  11  "A string which may contain \"quotes\" or slashes \\"
  3. Value2  "A string which may contain \"quotes\" or slashes \\"    "another quote"
  4.  
I want to split it to a list.
Tried with string.split("\"") etc. But ran into problem since I don't know when the line contains several quotes.

Every value is seperated by one or more spaces (except in the quote of course which is always one value)

Please help me to get started.

Thanks,
Andreas
Dec 4 '07 #1
5 1883
bvdet
2,851 Recognized Expert Moderator Specialist
Hi, I'm beating my against the wall to parse a file like this:

Expand|Select|Wrap|Line Numbers
  1. Value1  10  "A string which may contain \"quotes\" or slashes \\" anothervalue
  2. Value2  11  "A string which may contain \"quotes\" or slashes \\"
  3. Value2  "A string which may contain \"quotes\" or slashes \\"    "another quote"
  4.  
I want to split it to a list.
Tried with string.split("\"") etc. But ran into problem since I don't know when the line contains several quotes.

Every value is seperated by one or more spaces (except in the quote of course which is always one value)

Please help me to get started.

Thanks,
Andreas
The following uses a combination of the str.replace() method and re.findall():
Expand|Select|Wrap|Line Numbers
  1. import re
  2.  
  3. fStr = open('your_file').read()
  4. fStr = fStr.replace('\\\\"', '"').replace('\\"', '')
  5. print fStr
  6.  
  7. patt = re.compile('\".+?\"|\S+')
  8. for item in fStr.split('\n'):
  9.     print [s.strip('"') for s in patt.findall(item)]
Output:
>>> Value1 10 "A string which may contain quotes or slashes " anothervalue
Value2 11 "A string which may contain quotes or slashes "
Value2 "A string which may contain quotes or slashes " "another quote"
['Value1', '10', 'A string which may contain quotes or slashes ', 'anothervalue']
['Value2', '11', 'A string which may contain quotes or slashes ']
['Value2', 'A string which may contain quotes or slashes ', 'another quote']
>>>
Dec 4 '07 #2
Lasdjfk
5 New Member
Thanks alot. Guess I just have to learn Regular Expressions to understand this... :-)
Dec 5 '07 #3
script0r
1 New Member
Sorry, but this proposed solution does not preserve the literal quote or slash characters that should have been present in the final output.

From my understanding of the original request, I expected the "sentence" to appear as follows, inside each of the three output lines:

Expand|Select|Wrap|Line Numbers
  1. A string which may contain "quotes" or slashes \
... and not as:

Expand|Select|Wrap|Line Numbers
  1. A string which may contain quotes or slashes
Does anyone have a correct solution?
Mar 18 '08 #4
woooee
43 New Member
It looks like you want to omit any slash+quote and find a quote only, or a slash+quote+space if it is at the end of a string. So if it is a quote only or a slash+quote+space (or /n), add one to some counter and if the counter is even, slice the string after that point and replace any slash+character with the chacter only. But it's difficult to tell from the 3 lines given.
Mar 19 '08 #5
Subsciber123
87 New Member
Not knowing regular expressions to any useful extent, I would at least create a temporary (naively inefficient) solution until you can come up with a better one.

You could use the find() method of the string to find the first quote, check if it is escaped by checking the character behind it, split the string, search the second part of the string, et cetera. Although this is incredibly inefficient, it is better than a state machine (even less efficient) and will work for the time being. Just trying to give you an idea, not actual code. I've done something like this before, but I think I gave up and resorted to using a state machine that progressed linearly through the string one character at a time. However, this was in a prospective code-conversion software (nowhere close to finished, may eventually turn a subset of python to ugly c code), and so speed didn't matter (since it was going to be able to convert itself to ugly c).

Good luck!
Mar 20 '08 #6

Sign in to post your reply or Sign up for a free account.

Similar topics

7
2577
by: YoBro | last post by:
Hi I have used some of this code from the PHP manual, but I am bloody hopeless with regular expressions. Was hoping somebody could offer a hand. The output of this will put the name of a form...
19
3986
by: ARK | last post by:
I am writing a search program in ASP(VBScript). The user can enter keywords and press submit. The user can separate the keywords by spaces and/or commas and key words may contain plain words,...
6
18474
by: Allan Bruce | last post by:
I have a string like: "FL:1234ABCD:3:FileName With Spaces.txt\n" and I want to read the values separated by ':' into variables. I tried to use sscanf like this: sscanf("FL:%s:%d:%s\n",...
7
5112
by: Lucas Tam | last post by:
Hi all, Does anyone know of a GOOD example on parsing text with text qualifiers? I am hoping to parse text with variable length delimiters/qualifiers. Also, qualified text could run onto...
4
2620
by: erikjalevik | last post by:
I have a long string of quoted strings, like: "string 1" "string 2" ... and I need to split this up into the constituent quoted strings. I was thinking it would be nice if I could somehow put...
17
2764
by: Mark | last post by:
I must create a routine that finds tokens in small, arbitrary VB code snippets. For example, it might have to find all occurrences of {Formula} I was thinking that using regular expressions...
15
9531
by: VMI | last post by:
I'm parsing a comma-delimited record but I want it to do something if some of the string is between "". How can I do this? With the Excel import it does it correct. I'm using String.Split()....
3
2530
by: Chris | last post by:
Hi everyone, I'm trying to parse through the contents of some text files with regular expressions, but am new to regular expressions and how to use them in VB.net. I'm pretty sure that the...
3
1656
by: raghudr | last post by:
Hi all, I am parsing a .xml file.My main intention is to retrieve the name value of node "Signal":- "Name Value" which is "rag". i want to store only the <signal"name value" that is only...
0
7264
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7166
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7386
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
7106
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
5689
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
5094
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
3236
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
1
805
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
459
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.