By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,818 Members | 1,347 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,818 IT Pros & Developers. It's quick & easy.

paseline(my favorite simple script): does something similar exist?

P: n/a
One of my all-time favorite scripts is parseline, which is printed
below

def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = []
words = line.split()
for i in range(len(format)):
f = format[i]
trans = xlat.get(f,'None')
if trans: result.append(trans(words[i]))
if len(result) == 0: return None
if len(result) == 1: return result[0]
return result

This takes a line of text, splits it, and then applies simple
formatting characters to return different python types. For example,
given the line

H 0.000 0.000 0.000

I can call parseline(line,'sfff') and it will return the string 'H',
and three floats. If I wanted to omit the first, I could just call
parseline(line,'xfff'). If I only wanted the first 0.000, I could call
parseline(line,'xf'). Clearly I don't do all of my parsing this way,
but I find parseline useful in a surprising number of applications.

I'm posting this here because (1) I'm feeling smug at what a bright
little coder I am, and (2) (in a more realistic and humble frame of
mind) I realize that many many people have probably found solutions to
similar needs, and I'd imaging that many are better than the above. I
would love to hear how other people do similar things.

Rick

Oct 12 '06 #1
Share this Question
Share on Google+
10 Replies


P: n/a
"RickMuller" <rp******@gmail.comwrites:
def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = []
words = line.split()
for i in range(len(format)):
f = format[i]
trans = xlat.get(f,'None')
if trans: result.append(trans(words[i]))
if len(result) == 0: return None
if len(result) == 1: return result[0]
return result
Untested, but maybe more in current Pythonic style:

def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = []
words = line.split()
for f,w in zip(format, words):
trans = xlat[f]
if trans is not None:
result.append(trans(w))
return result

Differences:
- doesn't ignore improper format characters, raises exception instead
- always returns values in a list, including as an empty list if
there's no values
- uses iterator protocol and zip to avoid ugly index variable
and subscripts
Oct 12 '06 #2

P: n/a
Hi Rick,

Nice little script indeed !

You probably mean
trans = xlat.get(f,None)
instead of
trans = xlat.get(f,'None')
in the case where an invalid format character is supplied. The string
'None' evaluates to True, so that trans(words[i]) raises an exception

A variant, with a list comprehension instead of the for loop :

def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = []
words = line.split()
result = [ xlat[f](w) for f,w in zip(format,words)
if xlat.get(f,None) ]
if not result: return None
if len(result) == 1: return result[0]
return result

Regards,
Pierre

Oct 12 '06 #3

P: n/a
RickMuller wrote:
One of my all-time favorite scripts is parseline, which is printed
here is another way to write that:

def parseline(line, format):
trans = {'x':lambda x:None,'s':str,'f':float,'d':int,'i':int}
return [ trans[f](w) for f,w in zip(format, line.split() ) ]
>>parseline( 'A 1 22 3 6', 'sdxf')
['A', 1, None, 3.0]
I.

Oct 12 '06 #4

P: n/a
>parseline( 'A 1 22 3 6', 'sdxf')
['A', 1, None, 3.0]
Yes, but in this case the OP expects to get ['A',1,3.0]

A shorter version :

def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = [ xlat[f](w) for f,w in zip(format,line.split())
if xlat.get(f,None) ]
if len(result) == 0: return None
if len(result) == 1: return result[0]
return result

Pierre

Oct 12 '06 #5

P: n/a
On 2006-10-12, Pierre Quentel <qu************@wanadoo.frwrote:
>>parseline( 'A 1 22 3 6', 'sdxf')
['A', 1, None, 3.0]

Yes, but in this case the OP expects to get ['A',1,3.0]

A shorter version :

def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = [ xlat[f](w) for f,w in zip(format,line.split())
if xlat.get(f,None) ]
if len(result) == 0: return None
if len(result) == 1: return result[0]
return result
I don't like the name, since it actually seems to be parsing a
string.

--
Neil Cerutti
Oct 12 '06 #6

P: n/a

Rickdef parseline(line,format):
Rick xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
Rick result = []
Rick words = line.split()
Rick for i in range(len(format)):
Rick f = format[i]
Rick trans = xlat.get(f,'None')
Rick if trans: result.append(trans(words[i]))
Rick if len(result) == 0: return None
Rick if len(result) == 1: return result[0]
Rick return result

Note that your setting and testing of the trans variable is problematic. If
you're going to use xlat.get(), either spell None correctly or take the
default:

trans = xlat.get(f)
if trans:
result.append(trans(words[i]))

As Paul indicated though, it would also be better to not to silently let
unrecognized format characters pass. I probably wouldn't let KeyError float
up to the caller though:

trans = xlat.get(f)
if trans:
result.append(trans(words[i]))
else:
raise ValueError, "unrecognized format character %s" % f

Finally, you might consider doing the splitting outside of this function and
pass in a list. That way you could (for example) easily pass in a row of
values read by the csv module's reader class (untested):

def format(words, fmt):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = []
for i in range(len(fmt)):
f = fmt[i]
trans = xlat.get(f)
if trans:
result.append(trans(words[i]))
else:
raise ValueError, "unrecognized format character %s" % f
return result

RickI'm posting this here because (1) I'm feeling smug at what a
Rickbright little coder I am, and (2) (in a more realistic and humble
Rickframe of mind) I realize that many many people have probably found
Ricksolutions to similar needs, and I'd imaging that many are better
Rickthan the above. I would love to hear how other people do similar
Rickthings.

It seems quite clever to me.

Skip
Oct 12 '06 #7

P: n/a
Wow! 6 responses in just a few minutes. Thanks for all of the great
feedback!
sk**@pobox.com wrote:
Rickdef parseline(line,format):
Rick xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
Rick result = []
Rick words = line.split()
Rick for i in range(len(format)):
Rick f = format[i]
Rick trans = xlat.get(f,'None')
Rick if trans: result.append(trans(words[i]))
Rick if len(result) == 0: return None
Rick if len(result) == 1: return result[0]
Rick return result

Note that your setting and testing of the trans variable is problematic. If
you're going to use xlat.get(), either spell None correctly or take the
default:

trans = xlat.get(f)
if trans:
result.append(trans(words[i]))

As Paul indicated though, it would also be better to not to silently let
unrecognized format characters pass. I probably wouldn't let KeyError float
up to the caller though:

trans = xlat.get(f)
if trans:
result.append(trans(words[i]))
else:
raise ValueError, "unrecognized format character %s" % f

Finally, you might consider doing the splitting outside of this function and
pass in a list. That way you could (for example) easily pass in a row of
values read by the csv module's reader class (untested):

def format(words, fmt):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = []
for i in range(len(fmt)):
f = fmt[i]
trans = xlat.get(f)
if trans:
result.append(trans(words[i]))
else:
raise ValueError, "unrecognized format character %s" % f
return result

RickI'm posting this here because (1) I'm feeling smug at what a
Rickbright little coder I am, and (2) (in a more realistic and humble
Rickframe of mind) I realize that many many people have probably found
Ricksolutions to similar needs, and I'd imaging that many are better
Rickthan the above. I would love to hear how other people do similar
Rickthings.

It seems quite clever to me.

Skip
Oct 12 '06 #8

P: n/a
RickMuller wrote:
I'm posting this here because (1) I'm feeling smug at what a bright
little coder I am
if you want to show off, and use a more pythonic interface, you can do
it with a lot fewer lines. here's one example:

def parseline(line, *types):
result = [c(x) for (x, c) in zip(line.split(), types) if c] or [None]
return len(result) != 1 and result or result[0]

text = "H 0.000 0.000 0.000"

print parseline(text, str, float, float, float)
print parseline(text, None, float, float, float)
print parseline(text, None, float)

etc. and since you know how many items you'll get back from the
function, you might as well go for the one-liner version, and do
the unpacking on the way out:

def parseline(line, *types):
return [c(x) for (x, c) in zip(line.split(), types) if c] or [None]

text = "H 0.000 0.000 0.000"

[tag, value] = parseline(text, str, float)
[value] = parseline(text, None, float)

</F>

Oct 12 '06 #9

P: n/a

RickMuller wrote:
One of my all-time favorite scripts is parseline, which is printed
below

def parseline(line,format):
xlat = {'x':None,'s':str,'f':float,'d':int,'i':int}
result = []
words = line.split()
for i in range(len(format)):
f = format[i]
trans = xlat.get(f,'None')
if trans: result.append(trans(words[i]))
if len(result) == 0: return None
if len(result) == 1: return result[0]
return result

This takes a line of text, splits it, and then applies simple
formatting characters to return different python types. For example,
given the line

H 0.000 0.000 0.000

I can call parseline(line,'sfff') and it will return the string 'H',
and three floats. If I wanted to omit the first, I could just call
parseline(line,'xfff'). If I only wanted the first 0.000, I could call
parseline(line,'xf').
[...]
I would love to hear how other people do similar things.

Rick
MAP = {'s':str,'f':float,'d':int,'i':int}

def parseline( line, format, separator=' '):
'''
>>parseline('A 1 2 3 4', 'sdxf')
['A', 1, 3.0]
'''
mapping = [ (i, MAP[f]) for (i,f) in enumerate(format) if f != 'x'
]
parts = line.split(separator)
return [f(parts[i]) for (i,f) in mapping]

def parseline2( line, format):
'''
>>parseline('A 1 2 3 4', 'sdxf')
['A', 1, 3.0]
'''
return [f(line.split()[i]) for (i,f) in [(i, MAP[f]) for (i,f) in
enumerate(format) if f != 'x']]

def parselines(lines, format, separator=' '):
'''
>>lines = [ 'A 1 2 3 4', 'B 5 6 7 8', 'C 9 10 11 12']
list(parselines(lines, 'sdxf'))
[['A', 1, 3.0], ['B', 5, 7.0], ['C', 9, 11.0]]
'''
mapping = [ (i, MAP[f]) for (i,f) in enumerate(format) if f != 'x'
]
for line in lines:
parts = line.split(separator)
yield [f(parts[i]) for (i,f) in mapping]
import doctest
doctest.testmod(verbose=True)

Oct 12 '06 #10

P: n/a
Amazing! There were lots of great suggestions to my original post, but
I this is my favorite.

Rick

Fredrik Lundh wrote:
RickMuller wrote:
I'm posting this here because (1) I'm feeling smug at what a bright
little coder I am

if you want to show off, and use a more pythonic interface, you can do
it with a lot fewer lines. here's one example:

def parseline(line, *types):
result = [c(x) for (x, c) in zip(line.split(), types) if c] or [None]
return len(result) != 1 and result or result[0]

text = "H 0.000 0.000 0.000"

print parseline(text, str, float, float, float)
print parseline(text, None, float, float, float)
print parseline(text, None, float)

etc. and since you know how many items you'll get back from the
function, you might as well go for the one-liner version, and do
the unpacking on the way out:

def parseline(line, *types):
return [c(x) for (x, c) in zip(line.split(), types) if c] or [None]

text = "H 0.000 0.000 0.000"

[tag, value] = parseline(text, str, float)
[value] = parseline(text, None, float)

</F>
Oct 15 '06 #11

This discussion thread is closed

Replies have been disabled for this discussion.