String parsing

HMS Surprise

The string below is a piece of a longer string of about 20000
characters returned from a web page. I need to isolate the number at
the end of the line containing 'LastUpdated'. I can find
'LastUpdated' with .find but not sure about how to isolate the
number. 'LastUpdated' is guaranteed to occur only once. Would
appreciate it if one of you string parsing whizzes would take a stab
at it.

Thanks,

jh

<input type="hidden" name="RFP" value="-1"/>

<input type="hidden" name="EnteredBy" value="john"/>
<input type="hidden" name="ServiceIndex" value="1"/>
<input type="hidden" name="LastUpdated" value="1178658863"/>
<input type="hidden" name="NextPage" value="../active/active.php"/>
<input type="hidden" name="ExistingStatus" value="10" ?>
<table width="98%" cellpadding="0" cellspacing="0" border="0"
align="center"

May 9 '07 #1

Subscribe Reply

3073

Gabriel Genellina

En Tue, 08 May 2007 22:09:52 -0300, HMS Surprise <jo**@datavoiceint.com>
escribió:

The string below is a piece of a longer string of about 20000
characters returned from a web page. I need to isolate the number at
the end of the line containing 'LastUpdated'. I can find
'LastUpdated' with .find but not sure about how to isolate the
number. 'LastUpdated' is guaranteed to occur only once. Would
appreciate it if one of you string parsing whizzes would take a stab
at it.

<input type="hidden" name="RFP" value="-1"/>

<input type="hidden" name="EnteredBy" value="john"/>
<input type="hidden" name="ServiceIndex" value="1"/>
<input type="hidden" name="LastUpdated" value="1178658863"/>
<input type="hidden" name="NextPage" value="../active/active.php"/>
<input type="hidden" name="ExistingStatus" value="10" ?>
<table width="98%" cellpadding="0" cellspacing="0" border="0"
align="center"

You really should use an html parser here. But assuming that the page will
not change a lot its structure you could use a regular expression like
this:

expr = re.compile(r'name\s*=\s*"LastUpdated"\s+value\s*=\ s*"(.*?)"',
re.IGNORECASE)
number = expr.search(text).group(1)
(Handling of "not found" and "duplicate" cases is left as an exercise for
the reader)

Note that <input value="1178658863" type="hidden" name="LastUpdated" /is
as valid as your html, but won't match the expression.

--
Gabriel Genellina

May 9 '07 #2

Tim Leslie

On 8 May 2007 18:09:52 -0700, HMS Surprise <jo**@datavoiceint.comwrote:

>
The string below is a piece of a longer string of about 20000
characters returned from a web page. I need to isolate the number at
the end of the line containing 'LastUpdated'. I can find
'LastUpdated' with .find but not sure about how to isolate the
number. 'LastUpdated' is guaranteed to occur only once. Would
appreciate it if one of you string parsing whizzes would take a stab
at it.

Does this help?

In [7]: s = '<input type="hidden" name="LastUpdated"
value="1178658863"/>'

In [8]: int(s.split("=")[-1].split('"')[1])
Out[8]: 1178658863

There's probably a hundred different ways of doing this, but this is
the first that came to mind.

Cheers,

Tim

Thanks,

jh

<input type="hidden" name="RFP" value="-1"/>

<input type="hidden" name="EnteredBy" value="john"/>
<input type="hidden" name="ServiceIndex" value="1"/>
<input type="hidden" name="LastUpdated" value="1178658863"/>
<input type="hidden" name="NextPage" value="../active/active.php"/>
<input type="hidden" name="ExistingStatus" value="10" ?>
<table width="98%" cellpadding="0" cellspacing="0" border="0"
align="center"

--
http://mail.python.org/mailman/listinfo/python-list

May 9 '07 #3

HMS Surprise

Thanks for posting. Could you reccommend an HTML parser that can be
used with python or jython?
john

May 9 '07 #4

HMS Surprise

Yes it could, after I isolate that one string. Making sure I that I
isolate that complete line and only that line is part of the problem.

thanks for posting.

jh

May 9 '07 #5

HMS Surprise

On May 8, 9:19 pm, HMS Surprise <j...@datavoiceint.comwrote:

Yes it could, after I isolate that one string. Making sure I that I
isolate that complete line and only that line is part of the problem.

It comes in as one large string...

May 9 '07 #6

Gabriel Genellina

En Tue, 08 May 2007 23:06:14 -0300, HMS Surprise <jo**@datavoiceint.com>
escribió:

Thanks for posting. Could you reccommend an HTML parser that can be
used with python or jython?

Try BeautifoulSoup, which handles malformed pages pretty well.

--
Gabriel Genellina

May 9 '07 #7

Carsten Haese

On 8 May 2007 19:06:14 -0700, HMS Surprise wrote

Thanks for posting. Could you reccommend an HTML parser that can be
used with python or jython?

BeautifulSoup (http://www.crummy.com/software/BeautifulSoup/) makes HTML
parsing easy as pie, and sufficiently old versions seem to work with Jython. I
just tested this with Jython 2.2a1 and BeautifulSoup 1.x:

Jython 2.2a1 on java1.5.0_07 (JIT: null)
Type "copyright", "credits" or "license" for more information.

>>from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup("""<input type="hidden" name="LastUpdated"

value="1178658863"/>""")

>>print soup.first('input', {'name':'LastUpdated'}).get('value')

1178658863

Hope this helps,

--
Carsten Haese
http://informixdb.sourceforge.net

May 9 '07 #8

HMS Surprise

Thanks all.

Carsten, you are here early and late. Do you ever sleep? ;^)

May 9 '07 #9

HMS Surprise

This looks to be simple HTML (and I'm presuming that's a type on
that ?ending). A quick glance at the Python library reference (you do
have a copy, don't you) reveals at least two HTML parsing modules...

No that is not a typo and bears investigation. Thanks for the find.

I found HTMLParser but had trouble setting it up.

About five minutes work gave me this:

My effort has been orders of magnitude greater in time.....

Thanks all for all the excellent suggestions.
jh

May 9 '07 #10

HMS Surprise

BTW, here's what I used, the other ideas have been squirreled away in
my neat tricks and methods folder.

for el in data.splitlines():
if el.find('LastUpdated') <-1:
s = el.split("=")[-1].split('"')[1]
print 's:', s
Thanks again,

jh

May 9 '07 #11

Paul Boddie

On 9 May, 06:42, Dennis Lee Bieber <wlfr...@ix.netcom.comwrote:

>

[HTMLParser-based solution]

Here's another approach using libxml2dom [1] in HTML parsing mode:

import libxml2dom

# The text, courtesy of Dennis.
sample = """<input type="hidden" name="RFP" value="-1"/>

<input type="hidden" name="EnteredBy" value="john"/>
<input type="hidden" name="ServiceIndex" value="1"/>
<input type="hidden" name="LastUpdated" value="1178658863"/>
<input type="hidden" name="NextPage" value="../active/active.php"/>
<input type="hidden" name="ExistingStatus" value="10" />
<table width="98%" cellpadding="0" cellspacing="0" border="0"
align="center" >"""

# Parse the string in HTML mode.
d = libxml2dom.parseString(sample, html=1)

# For all input fields having the name 'LastUpdated',
# get the value attribute.
last_updated_fields = d.xpath("//input[@name='LastUpdated']/@value")

# Assuming we find one, print the contents of the value attribute.
print last_updated_fields[0].nodeValue

Paul

[1] http://www.python.org/pypi/libxml2dom

May 9 '07 #12

Paul Boddie

Dennis Lee Bieber wrote:

>
I was trying to stay with a solution the should have been available
in the version of Python equivalent to the Jython being used by the
original poster. HTMLParser, according to the documents, was 2.2 level.

I guess I should read the whole thread before posting. ;-) I'll have
to look into libxml2 availability for Java, though, as it appears
(from various accounts) that some Java platform users struggle with
HTML parsing or have a really limited selection of decent and
performant parsers in that area.

Another thing for the "to do" list...

Paul

May 9 '07 #13

by: Kai Jaensch | last post by:

Hello, i am an newbie and i have to to solve this problem as fast as i can. But at this time i don´t have a lot of success. Can anybody help me (and understand my english :-))? I have a...

C / C++

Unwanted Escape Codes In String...

by: Steve Litvack | last post by:

Hello, I have built an XMLDocument object instance and I get the following string when I examine the InnerXml property: <?xml version=\"1.0\"?><ROOT><UserData UserID=\"2282\"><Tag1...

C# / C Sharp

parsing date string

by: z. f. | last post by:

HI, i have string in format dd/mm/yyyyy hh:mm:ss and giving this as an input to DateTime.Parse gives a string was not recognized as a valid date time format string error. how do i make the parse...

Visual Basic .NET

is there any faster way to parse string into float number

by: Python.LeoJay | last post by:

Dear all, i need to parse billions of numbers from a file into float numbers for further calculation. i'm not satisfied with the speed of atof() function on my machine(i'm using visual c++ 6)....

C / C++

DateTime - parsing string with timezone in three letter acronyms

by: Michael Meckelein | last post by:

Hello, Wondering, if C# (framework 2.0) does not support parsing DateTime timezones in three letter acronyms. I would like to parse date strings like "2005 Nov 01 11:58:47.490 CST -6:00" but...

C# / C Sharp

String parsing

by: dimasteg | last post by:

Hi all C. Nead some help with string "on the fly" parsing, how it can be realized ? Any ideas? I got some of my own, but it's interesting to get other points of view . Regards.

C / C++

String Parsing and Brackets

by: balabaster | last post by:

I'm looking for some ideas regarding string parsing and brackets. Say I have the following string: 56*(73+23/(28+(7/14)-(3/2)) What would be the best way to parse the string for each opening...

.NET Framework

String Generation using Mask Parsing

by: James Arnold | last post by:

Hello, I am new to C and I am trying to write a few small applications to get some hands-on practise! I am trying to write a random string generator, based on a masked input. For example, given...

C / C++

How to pass this string into a linked list ?

by: (2b|!2b)==? | last post by:

I am expecting a string of this format: "id1:param1,param2;id2:param1,param2,param3;id" The tokens are seperated by semicolon ";" However each token is really a struct of the following...

C / C++

String converting to Stack/ Parsing

by: eyeore | last post by:

Hello everyone my String reverse code works but my professor wants me to use pop top push or Stack code and parsing code could you please teach me how to make this code work with pop top push or...

Java

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

C# / C Sharp

String parsing

Similar topics