473,388 Members | 989 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,388 software developers and data experts.

Parsing an HTML table with XML

I have an HTML table in the following format:

<table>
<tr><td>Header 1</td><td>Header 2</td></tr>
<tr><td>1</td><td>2</td></tr>
<tr><td>3</td><td>4</td></tr>
<tr><td>5</td><td>6</td></tr>
</table>

With an XSLT styles sheet, I can use for-each to grab the values in
each row

However, I dont want to grab the very first row - because this isnt
data!

How do I iterate throught each <trand ignore the first <tr>??

Jul 5 '06 #1
4 4834


Rick Walsh wrote:

How do I iterate throught each <trand ignore the first <tr>??
<xsl:for-each select="table/tr[position() &gt; 1]">

--

Martin Honnen
http://JavaScript.FAQTs.com/
Jul 5 '06 #2
Rick Walsh wrote:
I have an HTML table in the following format:

<table>
<tr><td>Header 1</td><td>Header 2</td></tr>
<tr><td>1</td><td>2</td></tr>
<tr><td>3</td><td>4</td></tr>
<tr><td>5</td><td>6</td></tr>
</table>

With an XSLT styles sheet, I can use for-each to grab the values in
each row

However, I dont want to grab the very first row - because this isnt
data!
Another possiblility would be to change the input by using the (X)HTML
thead and tbody elements, then selecting only tbody/tr.
--
Johannes Koch
In te domine speravi; non confundar in aeternum.
(Te Deum, 4th cent.)
Jul 5 '06 #3

Rick Walsh wrote:
I have an HTML table in the following format:

<table>
<tr><td>Header 1</td><td>Header 2</td></tr>
<tr><td>1</td><td>2</td></tr>
However, I dont want to grab the very first row - because this isnt
data!
Then code it with <th>, not <td>

If this table isn't under your control, then be carweful of parsing it
with an XML parser -- HTML isn't XML (XHTML on the web usually isn't
either). It's not a good assumption to make if you're trying to build
robust code - something as simple as an embedded &nbsp; might break it.

Jul 5 '06 #4
Andy Dingley <di*****@codesmiths.comwrote:
Rick Walsh wrote:
>>I have an HTML table in the following format:

<table>
<tr><td>Header 1</td><td>Header 2</td></tr>
<tr><td>1</td><td>2</td></tr>

>>However, I dont want to grab the very first row - because this isnt
data!


Then code it with <th>, not <td>

If this table isn't under your control, then be carweful of parsing it
with an XML parser -- HTML isn't XML (XHTML on the web usually isn't
either). It's not a good assumption to make if you're trying to build
robust code - something as simple as an embedded &nbsp; might break it.
For this purpose, use an HTML parser ; I personally use neko HTML that I
have included in the RefleX toolkit ; with RefleX, parsing an HTML file
is as simple as parsing an XML file :
http://reflex.gforge.inria.fr/tips.html#N80178E
(section : HTML to XML)

example :
<!--parse a non-well-balanced HTML file to XML-->
<xcl:parse-html name="htmlFile" source="file:///path/to/file.html"/>
<!--apply a stylesheet to it-->
<xcl:transform output="file:///path/to/new-file.html" source="{
$htmlFile }"
stylesheet="file:///path/to/stylesheet.xsl">

of course, you could select with XPath the tag to transform, say the
<bodytag of the parsed HTML ; something like this :
<xcl:transform output="file:///path/to/new-file.html" source="{
$htmlFile/html/body }"
stylesheet="file:///path/to/stylesheet.xsl">

--
Cordialement,

///
(. .)
--------ooO--(_)--Ooo--------
| Philippe Poulard |
-----------------------------
http://reflex.gforge.inria.fr/
Have the RefleX !
Jul 5 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

16
by: Terry | last post by:
Hi, This is a newbie's question. I want to preload 4 images and only when all 4 images has been loaded into browser's cache, I want to start a slideshow() function. If images are not completed...
3
by: Pir8 | last post by:
I have a complex xml file, which contains stories within a magazine. The structure of the xml file is as follows: <?xml version="1.0" encoding="ISO-8859-1" ?> <magazine> <story>...
11
by: rjan Langbakk | last post by:
I'm parsing a CSV-file into a table on a webpage - and I'd like to be able to change the alignment for the _last_ <td> in each <tr> - but, as the file is today, it's not possible for me to assign a...
0
by: bruce | last post by:
hi... it appears that i'm running into a possible problem with mechanize/browser/python rgarding the "select_form" method. i've tried the following and get the error listed: br.select_form(nr...
3
by: steve551979 | last post by:
Hello, I am having some difficulty creating a regular expression for the following string situation in html. I want to find a table that has specific text in it and then extract the html just...
4
by: Neil.Smith | last post by:
I can't seem to find any references to this, but here goes: In there anyway to parse an html/aspx file within an asp.net application to gather a collection of controls in the file. For instance...
1
by: Just Me | last post by:
Hi Geezers, I need some code which will parse and strip attributes from a table in a textbox. Basically, I need to paste in the table and run a little routing to convert the table into a ...
1
by: Robert Neville | last post by:
Basically, I want to create a table in html, xml, or xslt; with any number of regular expressions; a script (Perl or Python) which reads each table row (regex and replacement); and performs the...
13
by: Chris Carlen | last post by:
Hi: Having completed enough serial driver code for a TMS320F2812 microcontroller to talk to a terminal, I am now trying different approaches to command interpretation. I have a very simple...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.