Connecting Tech Pros Worldwide Forums | Help | Site Map

How do I parse this ? regexp ?

serpent17@gmail.com
Guest
 
Posts: n/a
#1: Jul 19 '05
Hello all,

I have this line of numbers:


04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875,
3.4332275390625, 105.062255859375], [0.093780517578125, 0.041015625,
-0.960662841796875], [0.01556396484375, 0.01220703125,
0.01068115234375]


repeated several times in a text file and I would like each element to
be part of a vector. how do I do this ? I am not very capable in using
regexp as you can see.


Thanks in advance,


Jake.

Jorge Godoy
Guest
 
Posts: n/a
#2: Jul 19 '05

re: How do I parse this ? regexp ?


"serpent17@gmail.com" <serpent17@gmail.com> writes:
[color=blue]
> Hello all,
>
> I have this line of numbers:
>
>
> 04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875,
> 3.4332275390625, 105.062255859375], [0.093780517578125, 0.041015625,
> -0.960662841796875], [0.01556396484375, 0.01220703125,
> 0.01068115234375]
>
>
> repeated several times in a text file and I would like each element to
> be part of a vector. how do I do this ? I am not very capable in using
> regexp as you can see.[/color]

You don't need a regexp to do that.

Use the split string method. It will split on spaces by default. If you want
to keep the values inside "[]" together, remove the spaces before splitting or
split on the "[" char first and then split the first item using spaces as a
separator.


Be seeing you,
--
Jorge Godoy <godoy@ieee.org>
serpent17@gmail.com
Guest
 
Posts: n/a
#3: Jul 19 '05

re: How do I parse this ? regexp ?


Hello,

I am not understanding your answer, but I probably asked the wrong
question :-)

I want to remove the commas, and square brackets [ and ] characters and
rewrite this whole line (and all the ones following in a text file
where only space would be a delimiter. How do I do this ?

I have tried this:

f = open(name3,'r')
r = r"\d+\.\d*"
for line in f:
cols = line.split()
data1 = re.findall(r,line)

and then I don't know what to do with either cols nor data1

Jake.

Jeremy Bowers
Guest
 
Posts: n/a
#4: Jul 19 '05

re: How do I parse this ? regexp ?


On Wed, 27 Apr 2005 07:56:11 -0700, serpent17@gmail.com wrote:
[color=blue]
> Hello all,
>
> I have this line of numbers:
>
>
> 04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875,
> 3.4332275390625, 105.062255859375], [0.093780517578125, 0.041015625,
> -0.960662841796875], [0.01556396484375, 0.01220703125, 0.01068115234375]
>
>
> repeated several times in a text file and I would like each element to be
> part of a vector. how do I do this ? I am not very capable in using regexp
> as you can see.[/color]

I think, based on the responses you've gotten so far, that perhaps you
aren't being clear enough.

Some starter questions:

* Is that all on one line in your file?
* Are there ever variable numbers of the [] fields?
* What do you mean by "vectors"?

If the line format is stable (no variation in numbers), and especially if
that is all one line, given that you are not familiar with regexp I
wouldn't muck about with it. (For me, I'd still say it's borderline if I
would go with that.) Instead, follow along in the following and it'll
probably help, though as I don't precisely know what you're asking I can't
give a complete solution:

Python 2.3.5 (#1, Mar 3 2005, 17:32:12)
[GCC 3.4.3 (Gentoo Linux 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.[color=blue][color=green][color=darkred]
>>> x = "04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875, 3.4332275390[/color][/color][/color]
625, 105.062255859375], [0.093780517578125, 0.041015625, -0.960662841796875], [0
..01556396484375, 0.01220703125, 0.01068115234375]"[color=blue][color=green][color=darkred]
>>> x.split(',', 2)[/color][/color][/color]
['04242005 18:20:42-0.000002', ' 271.1748608', ' [-4.119873046875, 3.43322753906
25, 105.062255859375], [0.093780517578125, 0.041015625, -0.960662841796875], [0.
01556396484375, 0.01220703125, 0.01068115234375]'][color=blue][color=green][color=darkred]
>>> splitted = x.split(',', 2)
>>> splitted[2][/color][/color][/color]
' [-4.119873046875, 3.4332275390625, 105.062255859375], [0.093780517578125, 0.04
1015625, -0.960662841796875], [0.01556396484375, 0.01220703125, 0.01068115234375
]'[color=blue][color=green][color=darkred]
>>> import re
>>> safetyChecker = re.compile(r"^[-\[\]0-9,. ]*$")
>>> if safetyChecker.match(splitted[2]):[/color][/color][/color]
.... eval(splitted[2], {}, {})
....
([-4.119873046875, 3.4332275390625, 105.062255859375], [0.093780517578125,
0.041015625, -0.960662841796875], [0.01556396484375, 0.01220703125,
0.01068115234375])[color=blue][color=green][color=darkred]
>>> splitted[0].split()[/color][/color][/color]
['04242005', '18:20:42-0.000002'][color=blue][color=green][color=darkred]
>>> splitted[0].split()[1].split('-')[/color][/color][/color]
['18:20:42', '0.000002'][color=blue][color=green][color=darkred]
>>>[/color][/color][/color]


I'd like to STRONGLY EMPHASIZE that there is danger in using "eval" as it
is very dangerous if you can't trust the source; *any* python code will
be run. That is why I am extra paranoid and double-check that the
expression only has the characters listed in that simple regex in it.
(Anyone who can construct a malicious string out of those characters will
get my sincere admiration.) You may do as you please, of course, but I
believe it is not helpful to suggest security holes on comp.lang.python
:-) The coincidence of that part of your data, which is also the most
challenging to parse, exactly matching Python syntax is too much to pass
up.

This should give you some good ideas; if you post more detailed questions
we can probably be of more help.

Paul McGuire
Guest
 
Posts: n/a
#5: Jul 19 '05

re: How do I parse this ? regexp ?


Jake -

If regexp's give you pause, here is a pyparsing version that, while
verbose, is fairly straightforward. I made some guesses at what some
of the data fields might be, but that doesn't matter much.

Note the use of setResultsName() to give different parse fragments
names so that they are directly addressable in the results, instead of
having to count out "the 0'th group is the date, the 1'st group is the
time...". Also, there is a commented-out conversion action, to
automatically convert strings to floats during parsing.

Download pyparsing at http://pyparsing.sourceforge.net.

Good luck,
-- Paul


data = """04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875,
3.4332275390625, 105.062255859375], [0.093780517578125, 0.041015625,
-0.960662841796875], [0.01556396484375, 0.01220703125,
0.01068115234375]"""

from pyparsing import *

COMMA = Literal(",").suppress()
LBRACK = Literal("[").suppress()
RBRACK = Literal("]").suppress()

# define a two-digit integer, we'll need a lot of them
int2 = Word(nums,exact=2)
month = int2
day = int2
yr = Combine("20" + int2)
date = Combine(month + day + yr)

hr = int2
min = int2
sec = int2
tz = oneOf("+ -") + Word(nums) + "." + Word(nums)
time = Combine( hr + ":" + min + ":" + sec + tz )

realNum = Combine( Optional("-") + Word(nums) + "." + Word(nums) )
# uncomment the next line and reals will be converted from strings to
floats during parsing
#realNum.setParseAction( lambda s,l,t: float(t[0]) )

triplet = Group( LBRACK + realNum + COMMA + realNum + COMMA + realNum +
RBRACK )
entry = Group( date.setResultsName("date") +
time.setResultsName("time") + COMMA +
realNum.setResultsName("temp") + COMMA +
Group( triplet + COMMA + triplet + COMMA + triplet
).setResultsName("coords") )

dataFormat = OneOrMore(entry)
results = dataFormat.parseString(data)

for d in results:
print d.date
print d.time
print d.temp
print d.coords[0].asList()
print d.coords[1].asList()
print d.coords[2].asList()

returns:

04242005
18:20:42-0.000002
271.1748608
['-4.119873046875', '3.4332275390625', '105.062255859375']
['0.093780517578125', '0.041015625', '-0.960662841796875']
['0.01556396484375', '0.01220703125', '0.01068115234375']

Simon Dahlbacka
Guest
 
Posts: n/a
#6: Jul 19 '05

re: How do I parse this ? regexp ?


[color=blue][color=green][color=darkred]
> >>> safetyChecker = re.compile(r"^[-\[\]0-9,. ]*$")[/color][/color][/color]

...doesn't the dot (.) in your character class mean that you are allowing
EVERYTHING (except newline?)

(you would probably want \. instead)

/Simon


Peter Hansen
Guest
 
Posts: n/a
#7: Jul 19 '05

re: How do I parse this ? regexp ?


Simon Dahlbacka wrote:[color=blue][color=green][color=darkred]
>>>>>safetyChecker = re.compile(r"^[-\[\]0-9,. ]*$")[/color][/color]
>
> ..doesn't the dot (.) in your character class mean that you are allowing
> EVERYTHING (except newline?)[/color]

The re docs clearly say this is not the case:

'''
[]
Used to indicate a set of characters. Characters can be listed
individually, or a range of characters can be indicated by giving two
characters and separating them by a "-". Special characters are not
active inside sets.
'''

Note the last sentence in the above quotation...

-Peter
Jeremy Bowers
Guest
 
Posts: n/a
#8: Jul 19 '05

re: How do I parse this ? regexp ?


On Thu, 28 Apr 2005 20:53:14 -0400, Peter Hansen wrote:[color=blue]
> The re docs clearly say this is not the case:
>
> '''
> []
> Used to indicate a set of characters. Characters can be listed
> individually, or a range of characters can be indicated by giving two
> characters and separating them by a "-". Special characters are not active
> inside sets.
> '''
>
> Note the last sentence in the above quotation...
>
> -Peter[/color]

Aren't regexes /fun/?

Also from that passage, Simon, note the "-" right in front of
[-\[\]0-9,. ], another one that's tripped me up more than once.

Wheeee!

"Some people, when confronted with a problem, think ``I know, I'll use
regular expressions.'' Now they have two problems." - jwz
http://www.jwz.org/hacks/marginal.html

Closed Thread


Similar Python bytes