By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,742 Members | 773 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,742 IT Pros & Developers. It's quick & easy.

Reading files, splitting on a delimiter and newlines.

P: n/a
Hello,

I have a situation where I have a file that contains text similar to:

myValue1 = contents of value1
myValue2 = contents of value2 but
with a new line here
myValue3 = contents of value3

My first approach was to open the file, use readlines to split the
lines on the "=" delimiter into a key/value pair (to be stored in a
dict).

After processing a couple files I noticed its possible that a newline
can be present in the value as shown in myValue2.

In this case its not an option to say remove the newlines if its a
"multi line" value as the value data needs to stay intact.

I'm a bit confused as how to go about getting this to work.

Any suggestions on an approach would be greatly appreciated!

Jul 25 '07 #1
Share this Question
Share on Google+
6 Replies


P: n/a
On Jul 25, 10:46 am, chris...@gmail.com wrote:
Hello,

I have a situation where I have a file that contains text similar to:

myValue1 = contents of value1
myValue2 = contents of value2 but
with a new line here
myValue3 = contents of value3

My first approach was to open the file, use readlines to split the
lines on the "=" delimiter into a key/value pair (to be stored in a
dict).

After processing a couple files I noticed its possible that a newline
can be present in the value as shown in myValue2.

In this case its not an option to say remove the newlines if its a
"multi line" value as the value data needs to stay intact.

I'm a bit confused as how to go about getting this to work.

Any suggestions on an approach would be greatly appreciated!
I'm confused. You don't want the newline to be present, but you can't
remove it because the data has to stay intact? If you don't want to
change it, then what's the problem?

Mike

Jul 25 '07 #2

P: n/a
On Wed, 25 Jul 2007 09:16:26 -0700, kyosohma wrote:
On Jul 25, 10:46 am, chris...@gmail.com wrote:
>Hello,

I have a situation where I have a file that contains text similar to:

myValue1 = contents of value1
myValue2 = contents of value2 but
with a new line here
myValue3 = contents of value3

My first approach was to open the file, use readlines to split the
lines on the "=" delimiter into a key/value pair (to be stored in a
dict).

After processing a couple files I noticed its possible that a newline
can be present in the value as shown in myValue2.

In this case its not an option to say remove the newlines if its a
"multi line" value as the value data needs to stay intact.

I'm a bit confused as how to go about getting this to work.

Any suggestions on an approach would be greatly appreciated!

I'm confused. You don't want the newline to be present, but you can't
remove it because the data has to stay intact? If you don't want to
change it, then what's the problem?

Mike
It's obviously that simple line-by-line filtering won't handle multi-line
statements.

You could solve that by saving the last item you added something to and,
if the line currently handles doesn't look like an assignment, append it
to this item. You might run into problems with such data:

foo = modern maths
proved that 1 = 1
bar = single

If your dataset always has indendation on subsequent lines, you might use
this. Or if the key's name is always just one word.

HTH,
Stargaming
Jul 25 '07 #3

P: n/a
On Jul 26, 3:08 am, Stargaming <stargam...@gmail.comwrote:
On Wed, 25 Jul 2007 09:16:26 -0700, kyosohma wrote:
On Jul 25, 10:46 am, chris...@gmail.com wrote:
Hello,
I have a situation where I have a file that contains text similar to:
myValue1 = contents of value1
myValue2 = contents of value2 but
with a new line here
myValue3 = contents of value3
My first approach was to open the file, use readlines to split the
lines on the "=" delimiter into a key/value pair (to be stored in a
dict).
After processing a couple files I noticed its possible that a newline
can be present in the value as shown in myValue2.
In this case its not an option to say remove the newlines if its a
"multi line" value as the value data needs to stay intact.
I'm a bit confused as how to go about getting this to work.
Any suggestions on an approach would be greatly appreciated!
I'm confused. You don't want the newline to be present, but you can't
remove it because the data has to stay intact? If you don't want to
change it, then what's the problem?
Mike

It's obviously that simple line-by-line filtering won't handle multi-line
statements.

You could solve that by saving the last item you added something to and,
if the line currently handles doesn't look like an assignment, append it
to this item. You might run into problems with such data:

foo = modern maths
proved that 1 = 1
bar = single

If your dataset always has indendation on subsequent lines, you might use
this. Or if the key's name is always just one word.
My take: all of the above, plus: Given that you want to extract stuff
of the form <LHS= <RHSI'd suggest developing a fairly precise
regular expression for LHS, maybe even for RHS, and trying this on as
many of these files as you can.

Why an RE for RHS? Consider:

foo = somebody said "I think that
REs = trouble
maybe_better = pyparsing"

:-)

Jul 25 '07 #4

P: n/a
On Jul 25, 8:46 am, chris...@gmail.com wrote:
Hello,

I have a situation where I have a file that contains text similar to:

myValue1 = contents of value1
myValue2 = contents of value2 but
with a new line here
myValue3 = contents of value3

My first approach was to open the file, use readlines to split the
lines on the "=" delimiter into a key/value pair (to be stored in a
dict).

After processing a couple files I noticed its possible that a newline
can be present in the value as shown in myValue2.

In this case its not an option to say remove the newlines if its a
"multi line" value as the value data needs to stay intact.

I'm a bit confused as how to go about getting this to work.

Any suggestions on an approach would be greatly appreciated!


Check the length of the list returned from split; this allows
your to append to the previously extracted value if need be.

import StringIO
import pprint

buf = """\
myValue1 = contents of value1
myValue2 = contents of value2 but
with a new line here
myValue3 = contents of value3
"""

mockfile = StringIO.StringIO(buf)

record=dict()

for line in mockfile:
kvpair = line.split('=', 2)
if len(kvpair) == 2:
key, value = kvpair
record[key] = value
else:
record[key] += line

pprint.pprint(record)

# lstrip() to remove newlines if needed ...

--
Hope this helps,
Steven

Jul 26 '07 #5

P: n/a
: <kyo...ma@gmail.comWrote:
On Jul 25, 10:46 am, chris...@gmail.com wrote:
Hello,

I have a situation where I have a file that contains text similar to:

myValue1 = contents of value1
myValue2 = contents of value2 but
with a new line here
myValue3 = contents of value3

My first approach was to open the file, use readlines to split the
lines on the "=" delimiter into a key/value pair (to be stored in a
dict).

After processing a couple files I noticed its possible that a newline
can be present in the value as shown in myValue2.

In this case its not an option to say remove the newlines if its a
"multi line" value as the value data needs to stay intact.

I'm a bit confused as how to go about getting this to work.

Any suggestions on an approach would be greatly appreciated!

I'm confused. You don't want the newline to be present, but you can't
remove it because the data has to stay intact? If you don't want to
change it, then what's the problem?
I think the OP's trouble is that the value he wants gets split up by the
newline at the end of the line when he uses readline().

One can try adding the single value to the previous value in the previous
key/value pair when the split does not yield two values - a bit hackish,
but given structured input data it might work.

- Hendrik

Jul 26 '07 #6

P: n/a
ch******@gmail.com a écrit :
Hello,

I have a situation where I have a file that contains text similar to:

myValue1 = contents of value1
myValue2 = contents of value2 but
with a new line here
myValue3 = contents of value3

My first approach was to open the file, use readlines to split the
lines on the "=" delimiter into a key/value pair (to be stored in a
dict).

After processing a couple files I noticed its possible that a newline
can be present in the value as shown in myValue2.

In this case its not an option to say remove the newlines if its a
"multi line" value as the value data needs to stay intact.

I'm a bit confused as how to go about getting this to work.

Any suggestions on an approach would be greatly appreciated!
data = {}
key = None
for line in open('yourfile.txt'):
line = line.strip()
if not line:
# skip empty lines
continue
if '=' in line:
key, value = map(str.strip, line.split('=', 1))
data[key] = value
elif key is None:
# first line without a '='
raise ValueError("invalid format")
else:
# multiline
data[key] += "\n" + line
print data
={'myValue3': 'contents of value3', 'myValue2': 'contents of value2
but\nwith a new line here', 'myValue1': 'contents of value1'}

HTH
Jul 26 '07 #7

This discussion thread is closed

Replies have been disabled for this discussion.