473,587 Members | 2,320 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Creating an array from an HTML table

Before I try to do this myself (I remember doing it in Java years ago
and it was a pain)....

Has anyone run across a function that will take a string parameter
containing an HTML table, and return a 2-dimensional array with each
element corresponding to the contents of a table cell?

I see plenty of examples of doing the opposite: convert an array to
an HTML table. I want to go the other way, from an HTML table to an
array.

-A
Jul 20 '07 #1
9 6871
Rik
On Fri, 20 Jul 2007 04:01:07 +0200, axlq <ax**@spamcop.n etwrote:
Before I try to do this myself (I remember doing it in Java years ago
and it was a pain)....

Has anyone run across a function that will take a string parameter
containing an HTML table, and return a 2-dimensional array with each
element corresponding to the contents of a table cell?

I see plenty of examples of doing the opposite: convert an array to
an HTML table. I want to go the other way, from an HTML table to an
array.
Regex could be the way to go. Before I start an elborate pattern: any
ideas how you'd like to treat col-/rowspans?
--
Rik Wasmus
Jul 20 '07 #2
..oO(Rik)
>On Fri, 20 Jul 2007 04:01:07 +0200, axlq <ax**@spamcop.n etwrote:
>Has anyone run across a function that will take a string parameter
containing an HTML table, and return a 2-dimensional array with each
element corresponding to the contents of a table cell?
[...]

Regex could be the way to go.
Or maybe an XML/DOM approach, if the structure is valid.

Micha
Jul 20 '07 #3
Rik wrote:
Regex could be the way to go.
Argh! No! That way lies nightmares. Get the XML_HTMLSax3 class from PEAR
and use that.

Here's an example that should parse TR, TD and TH tags (ignoring others)
including ROWSPAN and COLSPAN attributes. It creates an array of arrays
representing rows of cells. It uses 0-based indices.

<?php

class TableParser
{
private $currow = -1;
private $curcol = -1;

private $shape = array();
private $data = array();

public function openHandler ($parser, $tag, $attrs)
{
$tag = strtolower($tag );

// Move to the correct cell co-ordinates.
if ($tag=='tr')
{
$this->currow++;
$this->curcol = -1;
}
elseif ($tag=='td'||$t ag=='th')
{
$this->curcol++;
}

// This should account for rowspan and colspan.
while ($this->shape[$this->currow][$this->curcol])
$this->curcol++;
$rowspan = 1;
$colspan = 1;
foreach ($attrs as $k=>$v)
{
$k = strtolower($k);
if ($k=='rowspan')
$rowspan=(int)$ v;
elseif ($k=='colspan')
$colspan=(int)$ v;
}
for ($i=0; $i<$rowspan; $i++)
for ($j=0; $j<$colspan; $j++)
{
$x = $this->currow + $i;
$y = $this->curcol + $j;
if ($this->shape[$x][$y])
error_log('Over lap!');
$this->shape[$x][$y] = TRUE;
}
}

public function closeHandler ($parser, $tag)
{
}

public function dataHandler ($parser, $data)
{
$this->data[$this->currow][$this->curcol] .= $data;
}

public function getData ()
{
unset($this->data[-1]);
foreach ($this->data as $k=>$v)
unset($this->data[$k][-1]);
return $this->data;
}

}
include 'XML/HTMLSax3.php';
$sax = new XML_HTMLSax3;
$hdlr = new TableParser;
$sax->set_object($hd lr);
$sax->set_element_ha ndler('openHand ler', 'closeHandler') ;
$sax->set_data_handl er('dataHandler ');
$sax->parse('
<table>
<tr>
<td rowspan="2">Tes t table lalala</td>
<td>123</td>
<td>456</td>
</tr>
<tr>
<td>789</td>
<td>ABC</td>
</tr>
<tr>
<td colspan="2" rowspan="2">123 </td>
<td>456</td>
</tr>
<tr>
<td>789</td>
</tr>
</table>
');

print_r($hdlr->getData());

?>
--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.12-12mdksmp, up 29 days, 10:43.]

PHP Domain Class
http://tobyinkster.co.uk/blog/2007/0...-domain-class/
Jul 20 '07 #4
Rik
On Fri, 20 Jul 2007 09:56:44 +0200, Toby A Inkster
<us**********@t obyinkster.co.u kwrote:
Rik wrote:
>Regex could be the way to go.

Argh! No! That way lies nightmares.
Depends on how well both the regex(es) and HTML are written. Allthough it
could be a nightmare with nested tables indeed.
Get the XML_HTMLSax3 class from PEAR
and use that.
With a lot of overhead, but it would be the more robust solution indeed.
It's somewhat depended on wether to OP wants a 'fits (almost) all'
solution, or just for a single known table.

--
Rik Wasmus
Jul 20 '07 #5
Toby A Inkster wrote:
class TableParser
I've now published this class on my blog under the LGPL.

This means that the class itself is Open Source -- and any improvements
you make should be shared with the rest of us -- but you may use it
within closed source software if desired.

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.12-12mdksmp, up 29 days, 14:30.]

Parsing an HTML Table with PEAR's XML_HTTPSax3
http://tobyinkster.co.uk/blog/2007/0...table-parsing/
Jul 20 '07 #6
Rik
On Fri, 20 Jul 2007 12:53:29 +0200, Toby A Inkster
<us**********@t obyinkster.co.u kwrote:
Toby A Inkster wrote:
>class TableParser

I've now published this class on my blog under the LGPL.

This means that the class itself is Open Source -- and any improvements
you make should be shared with the rest of us -- but you may use it
within closed source software if desired.
Hehe, that's a nice way of enforcing to keep us informed about possible
progress :-)
--
Rik Wasmus
Jul 20 '07 #7
Thanks everyone for the replies. I found exactly what I need at
http://realyshine.com - a class called tableExtractor. php.class.

It works very well.

-A

In article <f7**********@b lue.rahul.net>, axlq <ax**@spamcop.n etwrote:
>Before I try to do this myself (I remember doing it in Java years ago
and it was a pain)....

Has anyone run across a function that will take a string parameter
containing an HTML table, and return a 2-dimensional array with each
element corresponding to the contents of a table cell?

I see plenty of examples of doing the opposite: convert an array to
an HTML table. I want to go the other way, from an HTML table to an
array.

-A

Jul 21 '07 #8
In article <se************ @ophelia.g5n.co .uk>,
Toby A Inkster <us**********@t obyinkster.co.u kwrote:
>Regex could be the way to go.

Argh! No! That way lies nightmares. Get the XML_HTMLSax3 class from PEAR
and use that.
I agree Regex isn't what I want to mess with either. But PEAR is
unnecessary - especially if you don't run your own server and your
web-host provider doesn't support PEAR. The tableExtractor. class.php
from reallyshiny.com turned out to solve my problem, doesn't require
PEAR, and works quite well.

-A

Jul 23 '07 #9
axlq wrote:
>
I agree Regex isn't what I want to mess with either. But PEAR is
unnecessary - especially if you don't run your own server and your
web-host provider doesn't support PEAR. The tableExtractor. class.php
from reallyshiny.com turned out to solve my problem, doesn't require
PEAR, and works quite well.
Actually, you can install PEAR on many provider's servers without an
special permissions.

http://www.builderau.com.au/program/...0283197,00.htm
Jul 23 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
2719
by: gonzalo briceno | last post by:
I have been using phplib for a while and I really like the framework except for form creation. Maybe it is me but I in my opinion there isn't a good way to create forms or should I say, everything else is so well done that the way you create forms seems to be too cumbersome, in particular making it so that a pull down menu selects a value...
6
3474
by: Thomas Matthews | last post by:
Hi, How do I create a const table of pointers to member functions? I'm implementing a Factory pattern (or jump table). I want to iterate through the table, calling each member function until a non-zero index is returned. Below is my attempt, which generates compiler errors: namespace Reference {
6
7615
by: F-13 | last post by:
I'm working on a BOM in Access 200 from an example downloaded from from the web. The sample database contains three tables, Assemblies (the list of items needed to assemble any assembly), Components (the list of items recognised by the Assemblies Table)and Output (a table used to display the BOM from a chosen assembly). It works fine but...
4
1172
by: ukbrainstorms | last post by:
Hi everyone, I have what is usually a familar problem but unsure about how to solve it in ASP.NET. I have an array of objects containing information, I would like to create a table of these objects in my HTML page. So in PHP, typically I would iterate through the objects generating the html and storing it in a variable, finally i would...
1
2391
by: Rako | last post by:
My problem is: I want to create an index to any of the available picture-groups. This index is a table of thumbs with a scrollbar. If you click on the thumb, you get the full picture displayed. This table must be created from scratch. (The function must be reusable, and speed up the load-time: only one of the various possible picture-groups...
17
46498
Motoma
by: Motoma | last post by:
This article is cross posted from my personal blog. You can find the original article, in all its splendor, at http://motomastyle.com/creating-a-mysql-data-abstraction-layer-in-php/. Introduction: The goal of this tutorial is to design a Data Abstraction Layer (DAL) in PHP, that will allow us to ignore the intricacies of MySQL and focus our...
3
1771
by: barsuk1 | last post by:
Hi there, I have a web page written on PHP. The PHP script itself receives the file uploaded by user, parses it and displays HTML table with the user data - one table row per item. Let's say that (very simplified) PHP code in question goes like this echo "<table>"; foreach ($user_data as $key => $item ){ echo "<tr>"; echo "<td> Row no:...
6
2224
by: lukasso | last post by:
Hi, this is my code that should produce something like a timetable for a few days with each day divided into 30 minute pieces. It makes query from MySQL and then creates a 2d $array which then is to be echoed like a table into html. Almost everything goes well except for one entry going for an hour longer and one disappearing if shorter than...
0
7843
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8206
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
8220
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the...
1
5713
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes...
0
5392
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3875
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2353
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1452
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
1185
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.