473,739 Members | 2,531 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

using pyparsing to extract METEO DATAS

DISCUSSION IN

USER nappie writes:
Hello, I'm Peter and I'm new in python codying and I'm using parsying
to extract data from one meteo Arpege file. This file is long file and
it's composed by word and number arguments like this: GRILLE EURAT5
Coin Nord-Ouest : 46.50/ 0.50 Coin Sud-E
Hello,
I'm Peter and I'm new in python codying and I'm using parsying to
extract data from one meteo Arpege file.
This file is long file and it's composed by word and number arguments
like this:

[[code]]

GRILLE EURAT5 Coin Nord-Ouest : 46.50/ 0.50 Coin Sud-Est : 44.50/ 2.50
MODELE PA PARAMETRE P
NIVEAU MER 0 ECHEANCE 0.0 DATE 20020304000000 NB_POINTS 25
1020.91 1020.87 1020.91 1021.05 1021.13
1020.07 1020.27 1020.49 1020.91 1021.15
1019.37 1019.65 1019.79 1020.53 1020.77
1018.73 1018.89 1019.19 1019.83 1020.81
1018.05 1018.19 1018.75 1019.55 1020.27
NIVEAU MER 0 ECHEANCE 3.0 DATE 20020304000000 NB_POINTS 25
1019.80 1019.78 1019.92 1020.18 1020.34
1018.94 1019.24 1019.54 1020.08 1020.32
1018.24 1018.64 1018.94 1019.84 1019.98
1017.48 1017.88 1018.28 1018.98 1019.98
1016.62 1017.08 1017.66 1018.26 1018.34
NIVEAU MER 0 ECHEANCE 6.0 DATE 20020304000000 NB_POINTS 25
1019.37 1019.39 1019.57 ........ ........
........ .........
........
........
........
........ .........
NIVEAU MER 0 ECHEANCE 48.0 DATE 20020304000000 NB_POINTS 25
1017.84 1017.46 1017.14 1016.86 1016.58
1017.28 1016.90 1016.46 1016.48 1016.34
1016.50 1016.06 1015.62 1015.90 1015.72
1015.94 1015.30 1014.78 1014.68 1014.86
1015.86 1015.10 1014.36 1014.00 1013.90

............... ...............
MODELE PA PARAMETRE T
NIVEAU HAUTEUR 2 ECHEANCE 0.0 DATE 20020304000000 NB_POINTS 25
1.34 1.51 1.40 0.56 -0.36
1.73 1.43 0.89 -0.16 -0.99
2.06 1.39 1.14 -0.53 -0.99
2.12 2.22 2.15 0.76 -1.16
1.67 1.45 1.40 1.26 0.28
NIVEAU HAUTEUR 2 ECHEANCE 3.0 DATE 20020304000000 NB_POINTS 25
0.94 1.16 1.03 0.44 -0.41
0.95 0.61 0.22 ............... ............... ...............

[[code]]

I'am at the begginning of computation and for the moment I write this
code to extract only number data in form of a string:

[[code format="python"]]

from pyparsing import *
dec
Combine (Optional( "-" ) + delimitedList( Word( nums ), ".", combine
True ))
datas = ZeroOrMore( dec )
f=file("arqal-Arpege.00", "r")
g=file("out3", "w")
for line in f:
try:
result = datas.parseStri ng (line)
add = result
add1 = ";".join(ad d)
print >g,"(",add1," )"
except ParseException, pe:
print pe

[[code]]

This is the output result in file g=file("out3", "w")

( )
( )
( )
( 1020.91;1020.87 ;1020.91;1021.0 5;1021.13 )
( 1020.07;1020.27 ;1020.49;1020.9 1;1021.15 )
( 1019.37;1019.65 ;1019.79;1020.5 3;1020.77 )
( 1018.73;1018.89 ;1019.19;1019.8 3;1020.81 )
( 1018.05;1018.19 ;1018.75;1019.5 5;1020.27 )
( )
( 1019.80;1019.78 ;1019.92;1020.1 8;1020.34 )
( 1018.94;1019.24 ;1019.54;1020.0 8;1020.32 )
( 1018.24;1018.64 ;1018.94;1019.8 4;1019.98 )
( 1017.48;1017.88 ;1018.28;1018.9 8;1019.98 )
( 1016.62;1017.08 ;1017.66;1018.2 6;1018.34 )
( )
( 1019.37;1019.39 ;1019.57;1019.9 ;......;
.........
..........;1016 .87)
( )
( 1017.84;1017.46 ;1017.14;1016.8 6;1016.58 )
( 1017.28;1016.90 ;1016.46;1016.4 8;1016.34 )
( 1016.50;1016.06 ;1015.62;1015.9 0;1015.72 )
( 1015.94;1015.30 ;1014.78;1014.6 8;1014.86 )
( 1015.86;1015.10 ;1014.36;1014.0 0;1013.90 )

[[code]]

So I don't have any word but the problem is that Now I have to put in
order this numerical datas in a type of NESTED matrix emulated by
python like a nested dictionary :

[[code]]

{ 'P ' : { MER 0 : [ (1020.91;1020.8 7;........;1020 .27 ) ;
(.........) ; ( 1019.80;1019.78 ;........;1018. 26;1018.34 ) ]; ......;
SOL 0 : [ ( .......);.....; (........ ) ] } ; 'T' : { SOL 0 :
[(.....;......) ; (ECHEANCE 3.0) ; (ECHEANCE 6.0) ;
(.......;...... ..) ]; HAUTEUR 2 : [(.......;...... ;......) ] } }
=
=>>>>>{ 'Parameter X' : { Level X : [ (predict step 3 hours from +0
to +48 hours ) ;]} }

o the bigger shell is fixed by
Dictionary PARAMETER in the example is P= 'Pressure' but thre are many
of this Temperature
T , Wind
U and V ecc... the second nested
shell is setted by another Dictionary NIVEAU MER 0 in the example is
MER 0
sea level or SOL 0, but can be
HAUTER 2,10 (HEIGHT 2,10 METERS)ecc..... (soil level , 1;0 meter from
soil) ecc (from French language) and after every Level is associated
with a LIST OF TUPLE: [(....);(....);( ....)] to rappresented every
step hours of prediction or expiration hours in French language:
ECHEANCE XX.X
predicted hour +3.0 +6.0 until 48H
is setted of a list of tuple [(ECHEANCE 3.0);(ECHEANCE 6.0); (ECHEANCE
XX.0);......... ;(ECHEANCE 48.0)] like so:
[1019.37;1019.39 ;........;1020. 27 );(.........);
(1019.80;1019.7 8;........;1018 .26;1018.34 )] where every list is at
the end the is the datas grill: (5 x 5 points)= 25 datas

1020.91 1020.87 1020.91 1021.05 1021.13
1020.07 1020.27 1020.49 1020.91 1021.15
1019.37 1019.65 1019.79 1020.53 1020.77
1018.73 1018.89 1019.19 1019.83 1020.81
1018.05 1018.19 1018.75 1019.55 1020.27

[[code]]

So I ask you wich is the best way to begin to code the grammar
parsying to make recognize him the 'word' inside of the data file and
put the data in the form of nested dictionary and list of tuple
illustrated before.
in attached file there is meteo datas file

Thanks a lot for everyone can said me anything to solve this big
problem (for me)!!!!

REPLY BY USER: ptmcg http://pyparsing.wikis paces.com/message/

Posted Yesterday 4:00 pm

Peter -

Your first attempt at pyparsing is a good step - just get something
working! You've got a first pattern working that detects and extracts
all decimal numbers. (I think you are the first one to model a decimal
number as a delimited list of integers with "." as the delimiter.)
The next step is to start looking for some higher-level text groups or
patterns. Your data is well structured as an n-level hierarchy, that
looks to me like:

[[code]]

- model+parameter
- level
- nb_points
- level
- nb_points
- level
- nb_points
- model+parameter
- level
- nb_points
- level
- nb_points
...

[[code]]

You can build your pyparsing grammar from the ground up, first to
parse individual terminal expressions (such as decimal numbers which
you already have), and then buld up to more and more complex
structures within your data.
The first thing to change about your approach is to start looking at
this data as a whole, instead of line by line. Instead of extracting
this first line of 5 point values:

[[code]]

1020.91 1020.87 1020.91 1021.05 1021.13

[[code]]

look at this as one piece of a larger structure, a data set for a
given niveau:

[[code]]

NIVEAU MER 0 ECHEANCE 0.0 DATE 20020304000000 NB_POINTS 25
1020.91 1020.87 1020.91 1021.05 1021.13
1020.07 1020.27 1020.49 1020.91 1021.15
1019.37 1019.65 1019.79 1020.53 1020.77
1018.73 1018.89 1019.19 1019.83 1020.81
1018.05 1018.19 1018.75 1019.55 1020.27

[[code]]

So let's create a parser for this structure that is the next step up
in the data hierarchy.

NIVEAU, ECHEANCE, DATE, and NB_POINTS are helpful labels for marking
the data, but not really important to return in the parsed results. So
I will start by creating definitions for these labels which will parse
them, but leave out (suppress) them from the returned data:

[[code]]

NIVEAU, ECHEANCE, DATE, NB_POINTS = \
map(Suppress,"N IVEAU ECHEANCE DATE NB_POINTS"
.split())

[[code]]

You stated that there are several options for what a niveau identifier
can look like, so this should be its own expression:

[[code]]

niveau_ref = Literal("MER 0") | Literal("SOL 0") | \
Combine(Literal ("HAUTEUR ") + eurodec)

[[code]]

(I defined eurodec as you defined dec, but with a comma delimiter.)
I'll also define a dateString as a Word(nums) of exactly 14 digits,
but you can come back to this later and refine this as you like (build
in parse-time conversion for example).

[[code]]

dateString = Word(nums,exact =14)

[[code]]

And then you can create an expression for a full niveau's-worth of
data:

[[code]]

niveau = NIVEAU + niveau_ref +
ECHEANCE + dec +
DATE + dateString +
NB_POINTS + countedArray(de c)

[[code]]

Notice that we can use the pyparsing built-in countedArray to capture
all of the data point values, since NB_POINTS gives the number of
points to follow, and these are followed immediately by the points
themselves. Pyparsing will convert all of these into a nice n-element
list for us.
You astutely requested that these values should be accessible like
values in a dict, so we do this in pyparsing by adding results names:

[[code]]

niveau = NIVEAU + niveau_ref.setR esultsName("niv eau") + \
ECHEANCE + dec.setResultsN ame("echeance") + \
DATE + dateString.setR esultsName("dat e") + \
NB_POINTS + countedArray(de c).setResultsNa me("nb_points" )

[[code]]

Now you should be able to search through your data file, extracting
all of the niveaux (?) and their related data:

[[code]]

f=file("arqal-Arpege.00", "r")
fdata = f.read() # read the entire file, instead of going line-
by-
line
for n in niveau.searchSt ring(fdata):
print n.niveau
print n.dump()
pointValues = map(float,n.nb_ points[0])
print "- NB_POINTS mean:", sum(pointValues ) / len(pointValues )
print

[[code]]

(I also added some examples of extracting data using the results
names. You can also use dict-style notation, n["niveau"], if you
prefer.)
Gives this output (I've truncated with '...' for the sake of posting,
but the actual program gives the full lists of values):

[[code]]

MER 0
['MER 0', '0.0', '20020304000000 ', ['1020.91', '1020.87', ...
- date: 20020304000000
- echeance: 0.0
- nb_points: [['1020.91', '1020.87', '1020.91', '1021.05', ...
- niveau: MER 0
- NB_POINTS mean: 1020.0052

[[code]]

[[code]]

MER 0
['MER 0', '3.0', '20020304000000 ', ['1019.80', '1019.78', ...
- date: 20020304000000
- echeance: 3.0
- nb_points: [['1019.80', '1019.78', '1019.92', '1020.18', ...
- niveau: MER 0
- NB_POINTS mean: 1018.9736

[[code]]

[[code]]

MER 0
['MER 0', '48.0', '20020304000000 ', ['1017.84', '1017.46', ...
- date: 20020304000000
- echeance: 48.0
- nb_points: [['1017.84', '1017.46', '1017.14', '1016.86', ...
- niveau: MER 0
- NB_POINTS mean: 1015.9168

[[code]]

[[code]]

HAUTEUR 2
['HAUTEUR 2', '0.0', '20020304000000 ', ['1.34', '1.51', '1.40', ...
- date: 20020304000000
- echeance: 0.0
- nb_points: [['1.34', '1.51', '1.40', '0.56', '-0.36', '1.73', ...
- niveau: HAUTEUR 2
- NB_POINTS mean: 0.9028

[[code]]

[[code]]
HAUTEUR 2,4
['HAUTEUR 2,4', '3.0', '20020304000000 ', ['1.34', '1.51', '1.40', ...
- date: 20020304000000
- echeance: 3.0
- nb_points: [['1.34', '1.51', '1.40', '0.56', '-0.36', '1.73', ...
- niveau: HAUTEUR 2,4
- NB_POINTS mean: 0.9028

[[code]]

Now I'll let you take this the next step: compose the expression for
the model+parameter hierarchy level (hint: the body of each model
+parameter value will be an expression of OneOrMore( Group( niveau ) )
- be sure to give this a results name, too).

May 4 '07 #1
0 2048

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
2354
by: Lukas Holcik | last post by:
Hi everyone! How can I simply search text for regexps (lets say <a href="(.*?)">(.*?)</a>) and save all URLs(1) and link contents(2) in a dictionary { name : URL}? In a single pass if it could. Or how can I replace the html &entities; in a string "blablabla&amp;blablabal&amp;balbalbal" with the chars they mean using re.sub? I found out they are stored in an dict . I though about this functionality:
2
2585
by: Peter Fein | last post by:
I'm trying to use pyparsing write a screenscraper. I've got some arbitrary HTML text I define as opener & closer. In between is the HTML data I want to extract. However, the data may contain the same characters as used in the closer (but not the exact same text, obviously). I'd like to get the *minimal* amount of data between these. Here's an example (whitespace may differ): from pyparsing import *
7
1990
by: Lad | last post by:
I came across pyparsing module by Paul McGuire. It seems to be nice but I am not sure if it is the best for my need. I need to extract some text from html page. The text is in tables and a table can be inside another table. Is it better and easier to use the pyparsing module or HTMLparser? Thanks for suggestions. La.
15
17899
by: could ildg | last post by:
In re, the punctuation "^" can exclude a single character, but I want to exclude a whole word now. for example I have a string "hi, how are you. hello", I want to extract all the part before the world "hello", I can't use ".*" because "^" only exclude single char "h" or "e" or "l" or "o". Will somebody tell me how to do it? Thanks.
2
1932
by: Inyeol Lee | last post by:
I'm trying to extract module contents from Verilog, which has the form of; module foo (port1, port2, ... ); // module contents to extract here. ... endmodule
4
1442
by: Sullivan WxPyQtKinter | last post by:
I do not know if there is any lib specially designed to process the strings in scipt language. for example: I hope to process the string"print a,b,c,d,e "in the form"command argumentlist" and return: {'command'='print', 'argumentlist'=} Are there any lib to implement this?
13
2061
by: 7stud | last post by:
To the developer: 1) I went to the pyparsing wiki to download the pyparsing module and try it 2) At the wiki, there was no index entry in the table of contents for Downloads. After searching around a bit, I finally discovered a tiny link buried in some text at the top of the home page. 3) Link goes to sourceforge. At sourceforge, there was a nice, green 'download' button that stood out from the page. 4) I clicked on the download...
2
1985
by: Nathan Harmston | last post by:
Hi, I know this isnt the pyparsing list, but it doesnt seem like there is one. I m trying to use pyparsing to parse a file however I cant get the Optional keyword to work. My file generally looks like this: ALIGNMENT 1020 YS2-10a02.q1k chr09 1295 42 141045 142297 C 1254 95.06 1295 reject_bad_break 0 or this:
2
3100
by: Fabian Braennstroem | last post by:
Hi, I would like to delete a region on a log file which has this kind of structure: #------flutest------------------------------------------------------------ 498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04 8.3956e-04 3.8560e-03 4.8384e-02 11:40:01 499 499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
0
8969
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9483
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9341
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9269
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9211
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8216
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6056
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4572
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
2748
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.