473,466 Members | 1,334 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

String parsing


The string below is a piece of a longer string of about 20000
characters returned from a web page. I need to isolate the number at
the end of the line containing 'LastUpdated'. I can find
'LastUpdated' with .find but not sure about how to isolate the
number. 'LastUpdated' is guaranteed to occur only once. Would
appreciate it if one of you string parsing whizzes would take a stab
at it.

Thanks,

jh

<input type="hidden" name="RFP" value="-1"/>
<!--<input type="hidden" name="EnteredBy" value="johnxxxx"/>-->
<input type="hidden" name="EnteredBy" value="john"/>
<input type="hidden" name="ServiceIndex" value="1"/>
<input type="hidden" name="LastUpdated" value="1178658863"/>
<input type="hidden" name="NextPage" value="../active/active.php"/>
<input type="hidden" name="ExistingStatus" value="10" ?>
<table width="98%" cellpadding="0" cellspacing="0" border="0"
align="center"

May 9 '07 #1
12 3073
En Tue, 08 May 2007 22:09:52 -0300, HMS Surprise <jo**@datavoiceint.com>
escribió:
The string below is a piece of a longer string of about 20000
characters returned from a web page. I need to isolate the number at
the end of the line containing 'LastUpdated'. I can find
'LastUpdated' with .find but not sure about how to isolate the
number. 'LastUpdated' is guaranteed to occur only once. Would
appreciate it if one of you string parsing whizzes would take a stab
at it.
<input type="hidden" name="RFP" value="-1"/>
<!--<input type="hidden" name="EnteredBy" value="johnxxxx"/>-->
<input type="hidden" name="EnteredBy" value="john"/>
<input type="hidden" name="ServiceIndex" value="1"/>
<input type="hidden" name="LastUpdated" value="1178658863"/>
<input type="hidden" name="NextPage" value="../active/active.php"/>
<input type="hidden" name="ExistingStatus" value="10" ?>
<table width="98%" cellpadding="0" cellspacing="0" border="0"
align="center"
You really should use an html parser here. But assuming that the page will
not change a lot its structure you could use a regular expression like
this:

expr = re.compile(r'name\s*=\s*"LastUpdated"\s+value\s*=\ s*"(.*?)"',
re.IGNORECASE)
number = expr.search(text).group(1)
(Handling of "not found" and "duplicate" cases is left as an exercise for
the reader)

Note that <input value="1178658863" type="hidden" name="LastUpdated" /is
as valid as your html, but won't match the expression.

--
Gabriel Genellina

May 9 '07 #2
On 8 May 2007 18:09:52 -0700, HMS Surprise <jo**@datavoiceint.comwrote:
>
The string below is a piece of a longer string of about 20000
characters returned from a web page. I need to isolate the number at
the end of the line containing 'LastUpdated'. I can find
'LastUpdated' with .find but not sure about how to isolate the
number. 'LastUpdated' is guaranteed to occur only once. Would
appreciate it if one of you string parsing whizzes would take a stab
at it.
Does this help?

In [7]: s = '<input type="hidden" name="LastUpdated"
value="1178658863"/>'

In [8]: int(s.split("=")[-1].split('"')[1])
Out[8]: 1178658863

There's probably a hundred different ways of doing this, but this is
the first that came to mind.

Cheers,

Tim
Thanks,

jh

<input type="hidden" name="RFP" value="-1"/>
<!--<input type="hidden" name="EnteredBy" value="johnxxxx"/>-->
<input type="hidden" name="EnteredBy" value="john"/>
<input type="hidden" name="ServiceIndex" value="1"/>
<input type="hidden" name="LastUpdated" value="1178658863"/>
<input type="hidden" name="NextPage" value="../active/active.php"/>
<input type="hidden" name="ExistingStatus" value="10" ?>
<table width="98%" cellpadding="0" cellspacing="0" border="0"
align="center"

--
http://mail.python.org/mailman/listinfo/python-list
May 9 '07 #3
Thanks for posting. Could you reccommend an HTML parser that can be
used with python or jython?
john
May 9 '07 #4
Yes it could, after I isolate that one string. Making sure I that I
isolate that complete line and only that line is part of the problem.

thanks for posting.

jh

May 9 '07 #5
On May 8, 9:19 pm, HMS Surprise <j...@datavoiceint.comwrote:
Yes it could, after I isolate that one string. Making sure I that I
isolate that complete line and only that line is part of the problem.
It comes in as one large string...
May 9 '07 #6
En Tue, 08 May 2007 23:06:14 -0300, HMS Surprise <jo**@datavoiceint.com>
escribió:
Thanks for posting. Could you reccommend an HTML parser that can be
used with python or jython?
Try BeautifoulSoup, which handles malformed pages pretty well.

--
Gabriel Genellina

May 9 '07 #7
On 8 May 2007 19:06:14 -0700, HMS Surprise wrote
Thanks for posting. Could you reccommend an HTML parser that can be
used with python or jython?
BeautifulSoup (http://www.crummy.com/software/BeautifulSoup/) makes HTML
parsing easy as pie, and sufficiently old versions seem to work with Jython. I
just tested this with Jython 2.2a1 and BeautifulSoup 1.x:

Jython 2.2a1 on java1.5.0_07 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup("""<input type="hidden" name="LastUpdated"
value="1178658863"/>""")
>>print soup.first('input', {'name':'LastUpdated'}).get('value')
1178658863

Hope this helps,

--
Carsten Haese
http://informixdb.sourceforge.net

May 9 '07 #8
Thanks all.

Carsten, you are here early and late. Do you ever sleep? ;^)

May 9 '07 #9
This looks to be simple HTML (and I'm presuming that's a type on
that ?ending). A quick glance at the Python library reference (you do
have a copy, don't you) reveals at least two HTML parsing modules...
No that is not a typo and bears investigation. Thanks for the find.

I found HTMLParser but had trouble setting it up.
About five minutes work gave me this:
My effort has been orders of magnitude greater in time.....

Thanks all for all the excellent suggestions.
jh

May 9 '07 #10
BTW, here's what I used, the other ideas have been squirreled away in
my neat tricks and methods folder.

for el in data.splitlines():
if el.find('LastUpdated') <-1:
s = el.split("=")[-1].split('"')[1]
print 's:', s
Thanks again,

jh

May 9 '07 #11
On 9 May, 06:42, Dennis Lee Bieber <wlfr...@ix.netcom.comwrote:
>
[HTMLParser-based solution]

Here's another approach using libxml2dom [1] in HTML parsing mode:

import libxml2dom

# The text, courtesy of Dennis.
sample = """<input type="hidden" name="RFP" value="-1"/>
<!--<input type="hidden" name="EnteredBy" value="johnxxxx"/>-->
<input type="hidden" name="EnteredBy" value="john"/>
<input type="hidden" name="ServiceIndex" value="1"/>
<input type="hidden" name="LastUpdated" value="1178658863"/>
<input type="hidden" name="NextPage" value="../active/active.php"/>
<input type="hidden" name="ExistingStatus" value="10" />
<table width="98%" cellpadding="0" cellspacing="0" border="0"
align="center" >"""

# Parse the string in HTML mode.
d = libxml2dom.parseString(sample, html=1)

# For all input fields having the name 'LastUpdated',
# get the value attribute.
last_updated_fields = d.xpath("//input[@name='LastUpdated']/@value")

# Assuming we find one, print the contents of the value attribute.
print last_updated_fields[0].nodeValue

Paul

[1] http://www.python.org/pypi/libxml2dom

May 9 '07 #12
Dennis Lee Bieber wrote:
>
I was trying to stay with a solution the should have been available
in the version of Python equivalent to the Jython being used by the
original poster. HTMLParser, according to the documents, was 2.2 level.
I guess I should read the whole thread before posting. ;-) I'll have
to look into libxml2 availability for Java, though, as it appears
(from various accounts) that some Java platform users struggle with
HTML parsing or have a really limited selection of decent and
performant parsers in that area.

Another thing for the "to do" list...

Paul

May 9 '07 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

26
by: Kai Jaensch | last post by:
Hello, i am an newbie and i have to to solve this problem as fast as i can. But at this time i don´t have a lot of success. Can anybody help me (and understand my english :-))? I have a...
18
by: Steve Litvack | last post by:
Hello, I have built an XMLDocument object instance and I get the following string when I examine the InnerXml property: <?xml version=\"1.0\"?><ROOT><UserData UserID=\"2282\"><Tag1...
50
by: z. f. | last post by:
HI, i have string in format dd/mm/yyyyy hh:mm:ss and giving this as an input to DateTime.Parse gives a string was not recognized as a valid date time format string error. how do i make the parse...
9
by: Python.LeoJay | last post by:
Dear all, i need to parse billions of numbers from a file into float numbers for further calculation. i'm not satisfied with the speed of atof() function on my machine(i'm using visual c++ 6)....
4
by: Michael Meckelein | last post by:
Hello, Wondering, if C# (framework 2.0) does not support parsing DateTime timezones in three letter acronyms. I would like to parse date strings like "2005 Nov 01 11:58:47.490 CST -6:00" but...
3
by: dimasteg | last post by:
Hi all C. Nead some help with string "on the fly" parsing, how it can be realized ? Any ideas? I got some of my own, but it's interesting to get other points of view . Regards.
9
balabaster
by: balabaster | last post by:
I'm looking for some ideas regarding string parsing and brackets. Say I have the following string: 56*(73+23/(28+(7/14)-(3/2)) What would be the best way to parse the string for each opening...
6
by: James Arnold | last post by:
Hello, I am new to C and I am trying to write a few small applications to get some hands-on practise! I am trying to write a random string generator, based on a masked input. For example, given...
6
by: (2b|!2b)==? | last post by:
I am expecting a string of this format: "id1:param1,param2;id2:param1,param2,param3;id" The tokens are seperated by semicolon ";" However each token is really a struct of the following...
1
by: eyeore | last post by:
Hello everyone my String reverse code works but my professor wants me to use pop top push or Stack code and parsing code could you please teach me how to make this code work with pop top push or...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.