473,799 Members | 3,147 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Help with regular expressions

I have a problem. I have written a python based theme for a linux app
called superkaramba, which is effectively an engine for desktop applets
that utilises python as its theming language. The script basically parses
weather websites, and displays the info in a visually appealing way.

A couple of other people have contributed code to this project,
particularly relating to the parsing of the websites. Unfortunately, it
is not parsing one particular part of the website properly. This is
because it is expecting the data to be in a certain form, and occasionally
it is in a different form. Unfortunately this causes the entire script to
fail to run.

Unfortunately, I know very little about regular expressions and can't get
hold of the person who wrote this part of the script. The other issue I am
struggling with is that there are no error messages as to what's going
wrong, which makes it more difficult to code round the issue.

The issue comes down to a couple of lines in the html for the web page.
The following lines parse correctly:

<TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo1> Wind:</TD>
<TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo2> From the Northeast at 6&nbsp;mph</TD>

these don't:

<TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo1> Wind:</TD>
<TD ALIGN=LEFT VALIGN=TOP CLASS=obsInfo2> calm&nbsp;</TD>

the relevant portion of the python script is as follows:

print '============== =============== =============== ======='
p_current = r'''(?isx) # Ignore case, Dot matches all, Verbose
wxicons/52/(?P<icon>\d*?)\ .gif # Icon
..*?obsTempText A>(?P<temp>\d*? )&deg; # Temp
..*?obsTextA>(? P<sky>.*?)</b> # Sky
..*?Feels\sLike <br>(?P<heat>.* ?)&deg; # Heat
..*?UV\sIndex:. *?Info2>(?P<uv> .*?)&nbsp
..*?Dew\sPoint: .*?Info2>(?P<de w>.*?)&deg;
..*?Humidity:.* ?Info2>(?P<hum> \d+)
..*?Visibility: .+?Info2>(?P<vi s>.*?)</td>
..*?Pressure:.+ ?Info2>(?P<baro >.*?)\sinches\s and\s(?P<change >.*?)</td>
..*?Wind:.+?Inf o2>(?P<wind>.*? )\sat\s(?P<spee d>\d*?)&nbsp;
'''
match = re.search(p_cur rent, data1)
if match:
now.icon(match. group('icon'))
now.temperature (match.group('t emp'), 'F')
now.relative_he at(match.group( 'heat'), 'F')
now.sky(match.g roup('sky'))
now.uv(match.gr oup('uv'))
now.dewpoint(ma tch.group('dew' ), 'F')
now.humidity(ma tch.group('dew' ))
now.visibility( match.group('vi s'))
now.pressure(ma tch.group('baro '), 'inHg')
now.pressure_ch ange(match.grou p('change'))
mywind = match.group('wi nd')
now.wind(mywind .replace('From the ', ''))
now.wind_speed( match.group('sp eed'), 'mph')

Obviously the issue is that the regular expression expects "at", and in
the second line of the html that doesn't parse, there is no at.

The question I have, is how do I go about fixing this. What I want is to
test to see if the line does or doesn't contain an "at", and if not,
change it to contain an "at". I'm just not sure how to code the RE in
python to do this.

Any help would be appreciated.

Matt
Jul 18 '05 #1
3 1840
dmbkiwi enlightened us with:
A couple of other people have contributed code to this project,
particularly relating to the parsing of the websites.
Unfortunately, it is not parsing one particular part of the website
properly. This is because it is expecting the data to be in a
certain form, and occasionally it is in a different form.
Unfortunately this causes the entire script to fail to run.


You seem to expect old HTML. Why not use XHTML only ('tidy' can
convert between them) and use a regular XML parser? Much, much, much
easier! And you won't have to be afraid of messing up your regular
expressions ;-)

Sybren
--
The problem with the world is stupidity. Not saying there should be a
capital punishment for stupidity, but why don't we just take the
safety labels off of everything and let the problem solve itself?
Jul 18 '05 #2
On Tue, 26 Aug 2003 08:47:33 +0000, Sybren Stuvel wrote:
dmbkiwi enlightened us with:
A couple of other people have contributed code to this project,
particularly relating to the parsing of the websites.
Unfortunately, it is not parsing one particular part of the website
properly. This is because it is expecting the data to be in a
certain form, and occasionally it is in a different form.
Unfortunately this causes the entire script to fail to run.


You seem to expect old HTML. Why not use XHTML only ('tidy' can
convert between them) and use a regular XML parser? Much, much, much
easier! And you won't have to be afraid of messing up your regular
expressions ;-)

Sybren


XML would be nice, but unfortunately I have no choice as to the markup
language used by the site. It's a website on the world wide web, not a
site overwhich I have any control. My regular expressions are at the
mercy of the developers of that site.

Any other suggestions?

Matt
Jul 18 '05 #3
dmbkiwi <dm*****@yahoo. com> writes:
On Tue, 26 Aug 2003 08:47:33 +0000, Sybren Stuvel wrote:

[...]
You seem to expect old HTML. Why not use XHTML only ('tidy' can
convert between them) and use a regular XML parser? Much, much, much
easier! And you won't have to be afraid of messing up your regular
expressions ;-)

Sybren


XML would be nice, but unfortunately I have no choice as to the markup
language used by the site. It's a website on the world wide web, not a
site overwhich I have any control. My regular expressions are at the
mercy of the developers of that site.


You misunderstand. HTMLTidy (or its descendant, tidylib) reads ugly,
non-conformant HTML and spits out clean, conformant XHTML (or HTML).

uTidylib is a ctypes wrapper of tidylib.

import tidy
from cStringIO import StringIO
tidydoc = tidy.parseStrin g(html)
s = StringIO()
tidydoc.write(s )
tidied_html = s.getvalue()
mxTidy is a wrapper of a shared-library-ized HTMLTidy.

from mx.Tidy import tidy
tidied_html = tidy(html)[2]
John
Jul 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
3702
by: Steve | last post by:
Hello, I am writing a script that calls a URL and reads the resulting HTML into a function that strips out everthing and returns ONLY the links, this is so that I can build a link index of various pages. I have been programming in PHP for over 2 years now and have never encountered a problem like the one I am having now. To me this seems like it should be just about the simplest thing in the world, but I must admit I'm stumped BIG TIME!...
1
4187
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make regular expressions easier to create and use (and in my experience as a regular expression user, it makes them MUCH easier to create and use.) I'm still working on formal documentation, and in any case, such documentation isn't necessarily the...
8
2991
by: Johnny | last post by:
I need to determine whether a text box contains a value that does not convert to a decimal. If the value does not convert to a decimal, I want to throw a MessageBox to have the user correct the value in the text box. I have the following code but when the user enters a decimal value the Regex.IsMatch catches it (ex. 250.50 should be allowed, but 250.50.0 should not). My code is as follows: if( ! Regex.IsMatch( tboxQtyCounted.Text,...
2
5100
by: Sehboo | last post by:
Hi, I have several regular expressions that I need to run against documents. Is it possible to combine several expressions in one expression in Regex object. So that it is faster, or will I have to use all the expressions seperately? Here are my regular expressions that check for valid email address and link Dim Expression As String =
5
1660
by: Greg Vereschagin | last post by:
I'm trying to figure out a regular expression that will match the innermost tag and the contents in between. Specifically, the string that I am attempting to match looks as follows: ....<table>...<table>...>Final<...</table>...</table>... I want to match: <table>...>Final<...</table> from this example. The string could also, of course, look like the following:
2
1600
by: news.microsoft.com | last post by:
I need help design a reg exp. I am parsing an html file to get the input values, here is one example <input VALUE="Staff Writer" size=60 type="text" name="author"> Can I grab the value "Staff Writer" if name = "author"? is it possible using regexp? Thanks
1
3726
by: Rahul | last post by:
Hi Everybody I have some problem in my script. please help me. This is script file. I have one *.inq file. I want run this script in XML files. But this script errors shows . If u want i am attach this script files and inq files. I cant understand this error. Please suggest me. You can talk with my yahoo id b_sahoo1@yahoo.com. Now i am online. Plz....Plz..Plz...
3
1248
by: Zach | last post by:
I'm writing an app which is going to rely extremely heavily on the usage of regular expressions. I'm reading the docs but having trouble wrapping my head around some of this since it's all fairly new to me. I have two questions, I'm hoping I can get answers to at least one :) Any help is better than no help: 1) I have many cases I am checking if a particular string matches against a particular regular expression. However, if the match...
1
4388
by: Allan Ebdrup | last post by:
I have a dynamic list of regular expressions, the expressions don't change very often but they can change. And I have a single string that I want to match the regular expressions against and find the first regular expression that matches the string. I've gor the regular expressions ordered so that the highest priority is first (if two or more regular expressions match the string I want the first one returned) The code that does this has...
9
2496
by: Rene | last post by:
I'm trying to basically remove chunks of html from a page but I must not be doing my regular expression correctly. What i'm trying with no avail. $site = preg_replace("/<!DOCTYPE(.|\s)*<div class=\"notice_tan\">(.| \s)*</div>/", "", $site); I'm trying to remove from the very top to a specific div Top of file:
0
9688
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10491
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10247
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
7571
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6809
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5593
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4146
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3762
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2941
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.