How do I parse this ? regexp ?

serpent17

Hello all,

I have this line of numbers:
04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875,
3.4332275390625 , 105.06225585937 5], [0.0937805175781 25, 0.041015625,
-0.9606628417968 75], [0.0155639648437 5, 0.01220703125,
0.0106811523437 5]
repeated several times in a text file and I would like each element to
be part of a vector. how do I do this ? I am not very capable in using
regexp as you can see.
Thanks in advance,
Jake.

Jul 19 '05 #1

Subscribe Reply

1727

Jorge Godoy

"se*******@gmai l.com" <se*******@gmai l.com> writes:

Hello all,

I have this line of numbers:
04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875,
3.4332275390625 , 105.06225585937 5], [0.0937805175781 25, 0.041015625,
-0.9606628417968 75], [0.0155639648437 5, 0.01220703125,
0.0106811523437 5]
repeated several times in a text file and I would like each element to
be part of a vector. how do I do this ? I am not very capable in using
regexp as you can see.

You don't need a regexp to do that.

Use the split string method. It will split on spaces by default. If you want
to keep the values inside "[]" together, remove the spaces before splitting or
split on the "[" char first and then split the first item using spaces as a
separator.
Be seeing you,
--
Jorge Godoy <go***@ieee.org >

Jul 19 '05 #2

serpent17

Hello,

I am not understanding your answer, but I probably asked the wrong
question :-)

I want to remove the commas, and square brackets [ and ] characters and
rewrite this whole line (and all the ones following in a text file
where only space would be a delimiter. How do I do this ?

I have tried this:

f = open(name3,'r')
r = r"\d+\.\d*"
for line in f:
cols = line.split()
data1 = re.findall(r,li ne)

and then I don't know what to do with either cols nor data1

Jake.

Jul 19 '05 #3

Jeremy Bowers

On Wed, 27 Apr 2005 07:56:11 -0700, se*******@gmail .com wrote:

Hello all,

I have this line of numbers:
04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875,
3.4332275390625 , 105.06225585937 5], [0.0937805175781 25, 0.041015625,
-0.9606628417968 75], [0.0155639648437 5, 0.01220703125, 0.0106811523437 5]
repeated several times in a text file and I would like each element to be
part of a vector. how do I do this ? I am not very capable in using regexp
as you can see.

I think, based on the responses you've gotten so far, that perhaps you
aren't being clear enough.

Some starter questions:

* Is that all on one line in your file?
* Are there ever variable numbers of the [] fields?
* What do you mean by "vectors"?

If the line format is stable (no variation in numbers), and especially if
that is all one line, given that you are not familiar with regexp I
wouldn't muck about with it. (For me, I'd still say it's borderline if I
would go with that.) Instead, follow along in the following and it'll
probably help, though as I don't precisely know what you're asking I can't
give a complete solution:

Python 2.3.5 (#1, Mar 3 2005, 17:32:12)
[GCC 3.4.3 (Gentoo Linux 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2
Type "help", "copyright" , "credits" or "license" for more information.

x = "04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875, 3.4332275390 625, 105.06225585937 5], [0.0937805175781 25, 0.041015625, -0.9606628417968 75], [0
..0155639648437 5, 0.01220703125, 0.0106811523437 5]" x.split(',', 2) ['04242005 18:20:42-0.000002', ' 271.1748608', ' [-4.119873046875, 3.43322753906
25, 105.06225585937 5], [0.0937805175781 25, 0.041015625, -0.9606628417968 75], [0.
01556396484375, 0.01220703125, 0.0106811523437 5]'] splitted = x.split(',', 2)
splitted[2] ' [-4.119873046875, 3.4332275390625 , 105.06225585937 5], [0.0937805175781 25, 0.04
1015625, -0.9606628417968 75], [0.0155639648437 5, 0.01220703125, 0.0106811523437 5
]' import re
safetyChecker = re.compile(r"^[-\[\]0-9,. ]*$")
if safetyChecker.m atch(splitted[2]): .... eval(splitted[2], {}, {})
....
([-4.119873046875, 3.4332275390625 , 105.06225585937 5], [0.0937805175781 25,
0.041015625, -0.9606628417968 75], [0.0155639648437 5, 0.01220703125,
0.0106811523437 5]) splitted[0].split() ['04242005', '18:20:42-0.000002'] splitted[0].split()[1].split('-') ['18:20:42', '0.000002']

I'd like to STRONGLY EMPHASIZE that there is danger in using "eval" as it
is very dangerous if you can't trust the source; *any* python code will
be run. That is why I am extra paranoid and double-check that the
expression only has the characters listed in that simple regex in it.
(Anyone who can construct a malicious string out of those characters will
get my sincere admiration.) You may do as you please, of course, but I
believe it is not helpful to suggest security holes on comp.lang.pytho n
:-) The coincidence of that part of your data, which is also the most
challenging to parse, exactly matching Python syntax is too much to pass
up.

This should give you some good ideas; if you post more detailed questions
we can probably be of more help.

Jul 19 '05 #4

Paul McGuire

Jake -

If regexp's give you pause, here is a pyparsing version that, while
verbose, is fairly straightforward . I made some guesses at what some
of the data fields might be, but that doesn't matter much.

Note the use of setResultsName( ) to give different parse fragments
names so that they are directly addressable in the results, instead of
having to count out "the 0'th group is the date, the 1'st group is the
time...". Also, there is a commented-out conversion action, to
automatically convert strings to floats during parsing.

Download pyparsing at http://pyparsing.sourceforge.net.

Good luck,
-- Paul
data = """04242005 18:20:42-0.000002, 271.1748608, [-4.119873046875,
3.4332275390625 , 105.06225585937 5], [0.0937805175781 25, 0.041015625,
-0.9606628417968 75], [0.0155639648437 5, 0.01220703125,
0.0106811523437 5]"""

from pyparsing import *

COMMA = Literal(",").su ppress()
LBRACK = Literal("[").suppress ()
RBRACK = Literal("]").suppress ()

# define a two-digit integer, we'll need a lot of them
int2 = Word(nums,exact =2)
month = int2
day = int2
yr = Combine("20" + int2)
date = Combine(month + day + yr)

hr = int2
min = int2
sec = int2
tz = oneOf("+ -") + Word(nums) + "." + Word(nums)
time = Combine( hr + ":" + min + ":" + sec + tz )

realNum = Combine( Optional("-") + Word(nums) + "." + Word(nums) )
# uncomment the next line and reals will be converted from strings to
floats during parsing
#realNum.setPar seAction( lambda s,l,t: float(t[0]) )

triplet = Group( LBRACK + realNum + COMMA + realNum + COMMA + realNum +
RBRACK )
entry = Group( date.setResults Name("date") +
time.setResults Name("time") + COMMA +
realNum.setResu ltsName("temp") + COMMA +
Group( triplet + COMMA + triplet + COMMA + triplet
).setResultsNam e("coords") )

dataFormat = OneOrMore(entry )
results = dataFormat.pars eString(data)

for d in results:
print d.date
print d.time
print d.temp
print d.coords[0].asList()
print d.coords[1].asList()
print d.coords[2].asList()

returns:

04242005
18:20:42-0.000002
271.1748608
['-4.119873046875' , '3.433227539062 5', '105.0622558593 75']
['0.093780517578 125', '0.041015625', '-0.9606628417968 75']
['0.015563964843 75', '0.01220703125' , '0.010681152343 75']

Jul 19 '05 #5

Simon Dahlbacka

safetyChecker = re.compile(r"^[-\[\]0-9,. ]*$")

...doesn't the dot (.) in your character class mean that you are allowing
EVERYTHING (except newline?)

(you would probably want \. instead)

/Simon

Jul 19 '05 #6

Peter Hansen

Simon Dahlbacka wrote:

>safetyChec ker = re.compile(r"^[-\[\]0-9,. ]*$")

..doesn't the dot (.) in your character class mean that you are allowing
EVERYTHING (except newline?)

The re docs clearly say this is not the case:

'''
[]
Used to indicate a set of characters. Characters can be listed
individually, or a range of characters can be indicated by giving two
characters and separating them by a "-". Special characters are not
active inside sets.
'''

Note the last sentence in the above quotation...

-Peter

Jul 19 '05 #7

Jeremy Bowers

On Thu, 28 Apr 2005 20:53:14 -0400, Peter Hansen wrote:

The re docs clearly say this is not the case:

'''
[]
Used to indicate a set of characters. Characters can be listed
individually, or a range of characters can be indicated by giving two
characters and separating them by a "-". Special characters are not active
inside sets.
'''

Note the last sentence in the above quotation...

-Peter

Aren't regexes /fun/?

Also from that passage, Simon, note the "-" right in front of
[-\[\]0-9,. ], another one that's tripped me up more than once.

Wheeee!

"Some people, when confronted with a problem, think ``I know, I'll use
regular expressions.'' Now they have two problems." - jwz
http://www.jwz.org/hacks/marginal.html

Jul 19 '05 #8

Similar topics

2339

Saving search results in a dictionary

by: Lukas Holcik | last post by:

Hi everyone! How can I simply search text for regexps (lets say <a href="(.*?)">(.*?)</a>) and save all URLs(1) and link contents(2) in a dictionary { name : URL}? In a single pass if it could. Or how can I replace the html &entities; in a string "blablabla&blablabal&balbalbal" with the chars they mean using re.sub? I found out they are...

Python

2173

module to parse "pseudo natural" language?

by: Andrew E | last post by:

Hi all I've written a python program that adds orders into our order routing simulation system. It works well, and has a syntax along these lines: ./neworder --instrument NOKIA --size 23 --price MARKET --repeats 20 etc However, I'd like to add a mode that will handle, say:

Python

12657

How do I parse name/value pairs from a config string?

by: Bill | last post by:

If, for example, I retrieve a connectionstring from a config file using something like: Value = ConfigurationSettings.AppSettings; This will return a string that is semi-colon delimited. If I want, say, to retrieve the password from this string will I need to explicity parse it?

.NET Framework

7458

RegExp to strip accents while ignoring case

by: Jon Maz | last post by:

Hi All, I want to strip the accents off characters in a string so that, for example, the (Spanish) word "práctico" comes out as "practico" - but ignoring case, so that "PRÁCTICO" comes out as "PRACTICO". What's the best way to do this? TIA,

C# / C Sharp

10399

JSON.parse

by: Douglas Crockford | last post by:

There is a new version of JSON.parse in JavaScript. It is vastly faster and smaller than the previous version. It uses a single call to eval to do the conversion, guarded by a single regexp test to assure that it is safe. JSON.parse = function (text) { return (/^(\s|]|"(\\|)*"|-?\d+(\.\d*)?(?\d+)?|true|false|null)+$/.test(text)) &&...

Javascript

1335

Parse value from name/value pair string

by: David Lozzi | last post by:

Howdy, I'm trying to get the values from a string of name/value pairs. I'm using a RegEx (I'm very new to RegEx) expression as seen below Dim regExp As Regex Dim m As Match m = regExp.Match(strResult, "RESULT=((.|\n)*?)&")

ASP.NET

1955

how do I parse the words of a sentence?

by: mike | last post by:

Hello, I am trying to write some code to parse a sentence and hyperlink just the words in it. I used Aaron's code from an earlier question as a start. So far, all the code does below is hyperlink everything separated by a space, which means stuff like "work." "happy." "Well;" "not." from the sentence become hyperlinks (whereas im...

ASP / Active Server Pages

8900

Regular expression to parse and split string into array

by: rupinderbatra | last post by:

Hello everyone, I am using a regular expression to parse a text string into various parts -- for ex: string "How do you do" will be changed to array with all the words and white spaces. I am using the following code (which has been copied from internet) <html> <body> <script type="text/javascript">

Javascript

3892

RegExp.test() with global flag set

by: Matt | last post by:

Hello all, I have just discovered (the long way) that using a RegExp object with the 'global' flag set produces inconsistent results when its test() method is executed. I realize that 'global' is not an appropriate modifier for the test() function - test() searches the entire string by default. However, I would expect it to degrade...

Javascript

7700

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...

General

7614

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...

Windows Server

7676

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...

Windows Server

6284

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...

Career Advice

5219

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...

C# / C Sharp

3653

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...

Networking - Hardware / Configuration

3642

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

2114

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

1221

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP