473,574 Members | 2,515 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

paseline(my favorite simple script): does something similar exist?

One of my all-time favorite scripts is parseline, which is printed
below

def parseline(line, format):
xlat = {'x':None,'s':s tr,'f':float,'d ':int,'i':int}
result = []
words = line.split()
for i in range(len(forma t)):
f = format[i]
trans = xlat.get(f,'Non e')
if trans: result.append(t rans(words[i]))
if len(result) == 0: return None
if len(result) == 1: return result[0]
return result

This takes a line of text, splits it, and then applies simple
formatting characters to return different python types. For example,
given the line

H 0.000 0.000 0.000

I can call parseline(line, 'sfff') and it will return the string 'H',
and three floats. If I wanted to omit the first, I could just call
parseline(line, 'xfff'). If I only wanted the first 0.000, I could call
parseline(line, 'xf'). Clearly I don't do all of my parsing this way,
but I find parseline useful in a surprising number of applications.

I'm posting this here because (1) I'm feeling smug at what a bright
little coder I am, and (2) (in a more realistic and humble frame of
mind) I realize that many many people have probably found solutions to
similar needs, and I'd imaging that many are better than the above. I
would love to hear how other people do similar things.

Rick

Oct 12 '06 #1
10 1341
"RickMuller " <rp******@gmail .comwrites:
def parseline(line, format):
xlat = {'x':None,'s':s tr,'f':float,'d ':int,'i':int}
result = []
words = line.split()
for i in range(len(forma t)):
f = format[i]
trans = xlat.get(f,'Non e')
if trans: result.append(t rans(words[i]))
if len(result) == 0: return None
if len(result) == 1: return result[0]
return result
Untested, but maybe more in current Pythonic style:

def parseline(line, format):
xlat = {'x':None,'s':s tr,'f':float,'d ':int,'i':int}
result = []
words = line.split()
for f,w in zip(format, words):
trans = xlat[f]
if trans is not None:
result.append(t rans(w))
return result

Differences:
- doesn't ignore improper format characters, raises exception instead
- always returns values in a list, including as an empty list if
there's no values
- uses iterator protocol and zip to avoid ugly index variable
and subscripts
Oct 12 '06 #2
Hi Rick,

Nice little script indeed !

You probably mean
trans = xlat.get(f,None )
instead of
trans = xlat.get(f,'Non e')
in the case where an invalid format character is supplied. The string
'None' evaluates to True, so that trans(words[i]) raises an exception

A variant, with a list comprehension instead of the for loop :

def parseline(line, format):
xlat = {'x':None,'s':s tr,'f':float,'d ':int,'i':int}
result = []
words = line.split()
result = [ xlat[f](w) for f,w in zip(format,word s)
if xlat.get(f,None ) ]
if not result: return None
if len(result) == 1: return result[0]
return result

Regards,
Pierre

Oct 12 '06 #3
RickMuller wrote:
One of my all-time favorite scripts is parseline, which is printed
here is another way to write that:

def parseline(line, format):
trans = {'x':lambda x:None,'s':str, 'f':float,'d':i nt,'i':int}
return [ trans[f](w) for f,w in zip(format, line.split() ) ]
>>parseline( 'A 1 22 3 6', 'sdxf')
['A', 1, None, 3.0]
I.

Oct 12 '06 #4
>parseline( 'A 1 22 3 6', 'sdxf')
['A', 1, None, 3.0]
Yes, but in this case the OP expects to get ['A',1,3.0]

A shorter version :

def parseline(line, format):
xlat = {'x':None,'s':s tr,'f':float,'d ':int,'i':int}
result = [ xlat[f](w) for f,w in zip(format,line .split())
if xlat.get(f,None ) ]
if len(result) == 0: return None
if len(result) == 1: return result[0]
return result

Pierre

Oct 12 '06 #5
On 2006-10-12, Pierre Quentel <qu************ @wanadoo.frwrot e:
>>parseline( 'A 1 22 3 6', 'sdxf')
['A', 1, None, 3.0]

Yes, but in this case the OP expects to get ['A',1,3.0]

A shorter version :

def parseline(line, format):
xlat = {'x':None,'s':s tr,'f':float,'d ':int,'i':int}
result = [ xlat[f](w) for f,w in zip(format,line .split())
if xlat.get(f,None ) ]
if len(result) == 0: return None
if len(result) == 1: return result[0]
return result
I don't like the name, since it actually seems to be parsing a
string.

--
Neil Cerutti
Oct 12 '06 #6

Rickdef parseline(line, format):
Rick xlat = {'x':None,'s':s tr,'f':float,'d ':int,'i':int}
Rick result = []
Rick words = line.split()
Rick for i in range(len(forma t)):
Rick f = format[i]
Rick trans = xlat.get(f,'Non e')
Rick if trans: result.append(t rans(words[i]))
Rick if len(result) == 0: return None
Rick if len(result) == 1: return result[0]
Rick return result

Note that your setting and testing of the trans variable is problematic. If
you're going to use xlat.get(), either spell None correctly or take the
default:

trans = xlat.get(f)
if trans:
result.append(t rans(words[i]))

As Paul indicated though, it would also be better to not to silently let
unrecognized format characters pass. I probably wouldn't let KeyError float
up to the caller though:

trans = xlat.get(f)
if trans:
result.append(t rans(words[i]))
else:
raise ValueError, "unrecogniz ed format character %s" % f

Finally, you might consider doing the splitting outside of this function and
pass in a list. That way you could (for example) easily pass in a row of
values read by the csv module's reader class (untested):

def format(words, fmt):
xlat = {'x':None,'s':s tr,'f':float,'d ':int,'i':int}
result = []
for i in range(len(fmt)) :
f = fmt[i]
trans = xlat.get(f)
if trans:
result.append(t rans(words[i]))
else:
raise ValueError, "unrecogniz ed format character %s" % f
return result

RickI'm posting this here because (1) I'm feeling smug at what a
Rickbright little coder I am, and (2) (in a more realistic and humble
Rickframe of mind) I realize that many many people have probably found
Ricksolutions to similar needs, and I'd imaging that many are better
Rickthan the above. I would love to hear how other people do similar
Rickthings.

It seems quite clever to me.

Skip
Oct 12 '06 #7
Wow! 6 responses in just a few minutes. Thanks for all of the great
feedback!
sk**@pobox.com wrote:
Rickdef parseline(line, format):
Rick xlat = {'x':None,'s':s tr,'f':float,'d ':int,'i':int}
Rick result = []
Rick words = line.split()
Rick for i in range(len(forma t)):
Rick f = format[i]
Rick trans = xlat.get(f,'Non e')
Rick if trans: result.append(t rans(words[i]))
Rick if len(result) == 0: return None
Rick if len(result) == 1: return result[0]
Rick return result

Note that your setting and testing of the trans variable is problematic. If
you're going to use xlat.get(), either spell None correctly or take the
default:

trans = xlat.get(f)
if trans:
result.append(t rans(words[i]))

As Paul indicated though, it would also be better to not to silently let
unrecognized format characters pass. I probably wouldn't let KeyError float
up to the caller though:

trans = xlat.get(f)
if trans:
result.append(t rans(words[i]))
else:
raise ValueError, "unrecogniz ed format character %s" % f

Finally, you might consider doing the splitting outside of this function and
pass in a list. That way you could (for example) easily pass in a row of
values read by the csv module's reader class (untested):

def format(words, fmt):
xlat = {'x':None,'s':s tr,'f':float,'d ':int,'i':int}
result = []
for i in range(len(fmt)) :
f = fmt[i]
trans = xlat.get(f)
if trans:
result.append(t rans(words[i]))
else:
raise ValueError, "unrecogniz ed format character %s" % f
return result

RickI'm posting this here because (1) I'm feeling smug at what a
Rickbright little coder I am, and (2) (in a more realistic and humble
Rickframe of mind) I realize that many many people have probably found
Ricksolutions to similar needs, and I'd imaging that many are better
Rickthan the above. I would love to hear how other people do similar
Rickthings.

It seems quite clever to me.

Skip
Oct 12 '06 #8
RickMuller wrote:
I'm posting this here because (1) I'm feeling smug at what a bright
little coder I am
if you want to show off, and use a more pythonic interface, you can do
it with a lot fewer lines. here's one example:

def parseline(line, *types):
result = [c(x) for (x, c) in zip(line.split( ), types) if c] or [None]
return len(result) != 1 and result or result[0]

text = "H 0.000 0.000 0.000"

print parseline(text, str, float, float, float)
print parseline(text, None, float, float, float)
print parseline(text, None, float)

etc. and since you know how many items you'll get back from the
function, you might as well go for the one-liner version, and do
the unpacking on the way out:

def parseline(line, *types):
return [c(x) for (x, c) in zip(line.split( ), types) if c] or [None]

text = "H 0.000 0.000 0.000"

[tag, value] = parseline(text, str, float)
[value] = parseline(text, None, float)

</F>

Oct 12 '06 #9

RickMuller wrote:
One of my all-time favorite scripts is parseline, which is printed
below

def parseline(line, format):
xlat = {'x':None,'s':s tr,'f':float,'d ':int,'i':int}
result = []
words = line.split()
for i in range(len(forma t)):
f = format[i]
trans = xlat.get(f,'Non e')
if trans: result.append(t rans(words[i]))
if len(result) == 0: return None
if len(result) == 1: return result[0]
return result

This takes a line of text, splits it, and then applies simple
formatting characters to return different python types. For example,
given the line

H 0.000 0.000 0.000

I can call parseline(line, 'sfff') and it will return the string 'H',
and three floats. If I wanted to omit the first, I could just call
parseline(line, 'xfff'). If I only wanted the first 0.000, I could call
parseline(line, 'xf').
[...]
I would love to hear how other people do similar things.

Rick
MAP = {'s':str,'f':fl oat,'d':int,'i' :int}

def parseline( line, format, separator=' '):
'''
>>parseline(' A 1 2 3 4', 'sdxf')
['A', 1, 3.0]
'''
mapping = [ (i, MAP[f]) for (i,f) in enumerate(forma t) if f != 'x'
]
parts = line.split(sepa rator)
return [f(parts[i]) for (i,f) in mapping]

def parseline2( line, format):
'''
>>parseline(' A 1 2 3 4', 'sdxf')
['A', 1, 3.0]
'''
return [f(line.split()[i]) for (i,f) in [(i, MAP[f]) for (i,f) in
enumerate(forma t) if f != 'x']]

def parselines(line s, format, separator=' '):
'''
>>lines = [ 'A 1 2 3 4', 'B 5 6 7 8', 'C 9 10 11 12']
list(parselin es(lines, 'sdxf'))
[['A', 1, 3.0], ['B', 5, 7.0], ['C', 9, 11.0]]
'''
mapping = [ (i, MAP[f]) for (i,f) in enumerate(forma t) if f != 'x'
]
for line in lines:
parts = line.split(sepa rator)
yield [f(parts[i]) for (i,f) in mapping]
import doctest
doctest.testmod (verbose=True)

Oct 12 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
1457
by: George Sakkis | last post by:
It occured to me that most times I read a csv file, I'm often doing from scratch things like assigning labels to columns, mapping fields to the appropriate type, ignoring some fields, changing their order, etc. Before I go on and reinvent the wheel, is there a generic high level wrapper around csv.reader that does all this ? Thanks, George
0
7738
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8079
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
8258
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7833
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
8118
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
0
6481
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
5321
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3770
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1359
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.