Connecting Tech Pros Worldwide Forums | Help | Site Map

Code for returning HTML table data into array?

Sugapablo
Guest
 
Posts: n/a
#1: Jul 17 '05
Before I go building this, I want to know if it already exists.

I need some PHP code that will read a web page and return all text that
comes between <td></td> tags in an array.

So if there were three tables on that page, it would return the first
table's fourth row, third column in a variable such as:

$tableArray[0][3][1]
// ^ ^ ^ - 2nd <td></td>
// ^ ^ - 4th <tr></tr>
// ^ - 1st <table></table>

Does something like this exist somewhere where I can grab it, or do I have
to build it from scratch?


--
[ Sugapablo ]
[ http://www.sugapablo.com <--music ]
[ http://www.sugapablo.net <--personal ]
[ sugapablo@12jabber.com <--jabber IM ]

Pedro Graca
Guest
 
Posts: n/a
#2: Jul 17 '05

re: Code for returning HTML table data into array?


Sugapablo wrote:[color=blue]
> Before I go building this, I want to know if it already exists.
>
> I need some PHP code that will read a web page and return all text that
> comes between <td></td> tags in an array.
>
> So if there were three tables on that page, it would return the first
> table's fourth row, third column in a variable such as:
>
> $tableArray[0][3][1]
> // ^ ^ ^ - 2nd <td></td>
> // ^ ^ - 4th <tr></tr>
> // ^ - 1st <table></table>
>
> Does something like this exist somewhere where I can grab it, or do I have
> to build it from scratch?[/color]

I just recently posted a routine that gets all <input>s from within
<form>s. This tiny URL fetches it from the Google archive:
http://tinyurl.com/3629k

You just have to change it to fetch all <table>s, and all <tr>s from
each table, then all <td>s (maybe <th>s too?) from each <tr>.
Something like

preg_match_all($table_regexp, $html, $tables);
foreach ($tables as $table) {
preg_match_all($tr_regexp, $table_html, $trs);
foreach ($trs as $tr) {
preg_match_all($tr_regexp, $tr_html, $tds);
}
}


Happy Coding :)
--
--= my mail box only accepts =--
--= Content-Type: text/plain =--
--= Size below 10001 bytes =--
Pedro Graca
Guest
 
Posts: n/a
#3: Jul 17 '05

re: Code for returning HTML table data into array?


Pedro Graca wrote:[color=blue]
> You just have to change it to fetch all <table>s, and all <tr>s from
> each table, then all <td>s (maybe <th>s too?) from each <tr>.[/color]

Oops, I just remembered something that turns this into a nasty problem:
you can have <table>s inside <table>s (and, in fact, often do!)
--
--= my mail box only accepts =--
--= Content-Type: text/plain =--
--= Size below 10001 bytes =--
Chung Leong
Guest
 
Posts: n/a
#4: Jul 17 '05

re: Code for returning HTML table data into array?


Are there nest tables in the file? If there are, then the HTML will be
rather difficult to parsed.

Uzytkownik "Sugapablo" <russREMOVE@sugapablo.com> napisal w wiadomosci
news:slrnc03am2.r8t.russREMOVE@dell.sugapablo.net. ..[color=blue]
> Before I go building this, I want to know if it already exists.
>
> I need some PHP code that will read a web page and return all text that
> comes between <td></td> tags in an array.
>
> So if there were three tables on that page, it would return the first
> table's fourth row, third column in a variable such as:
>
> $tableArray[0][3][1]
> // ^ ^ ^ - 2nd <td></td>
> // ^ ^ - 4th <tr></tr>
> // ^ - 1st <table></table>
>
> Does something like this exist somewhere where I can grab it, or do I have
> to build it from scratch?
>
>
> --
> [ Sugapablo ]
> [ http://www.sugapablo.com <--music ]
> [ http://www.sugapablo.net <--personal ]
> [ sugapablo@12jabber.com <--jabber IM ][/color]


Chung Leong
Guest
 
Posts: n/a
#5: Jul 17 '05

re: Code for returning HTML table data into array?


Yeah, and another problem is missing end tags. Parsing HTML is such a pain.
Almost makes you want to try some hack like outputting the captured HTML in
an invisible inline frame, then use Javascript to grab the data and post it
back to the server.

Uzytkownik "Pedro Graca" <hexkid@hotpop.com> napisal w wiadomosci
news:btsfk1$avn23$1@ID-203069.news.uni-berlin.de...[color=blue]
> Pedro Graca wrote:[color=green]
> > You just have to change it to fetch all <table>s, and all <tr>s from
> > each table, then all <td>s (maybe <th>s too?) from each <tr>.[/color]
>
> Oops, I just remembered something that turns this into a nasty problem:
> you can have <table>s inside <table>s (and, in fact, often do!)
> --
> --= my mail box only accepts =--
> --= Content-Type: text/plain =--
> --= Size below 10001 bytes =--[/color]


Andy Hassall
Guest
 
Posts: n/a
#6: Jul 17 '05

re: Code for returning HTML table data into array?


On Sun, 11 Jan 2004 20:01:30 -0000, Sugapablo <russREMOVE@sugapablo.com> wrote:
[color=blue]
>Before I go building this, I want to know if it already exists.
>
>I need some PHP code that will read a web page and return all text that
>comes between <td></td> tags in an array.
>
>So if there were three tables on that page, it would return the first
>table's fourth row, third column in a variable such as:
>
>$tableArray[0][3][1]
>// ^ ^ ^ - 2nd <td></td>
>// ^ ^ - 4th <tr></tr>
>// ^ - 1st <table></table>
>
>Does something like this exist somewhere where I can grab it, or do I have
>to build it from scratch?[/color]

Parsing HTML is not trivial, and coping with marginal and outright broken HTML
is a real pain. Perl has some excellent HTML parsing modules, and one in
particular ideal for this: HTML::TableExtract.

You could write a Perl script and pass it the data you want, and have it
return the information in some more convenient form. Not particularly elegant
since you have to start up a perl intepreter (although that can be mitigated
using something like PersistentPerl which keeps the interpreter running for a
while afterwards so it's reusable by the next request, saving startup times),
but it's got to beat trying to write an HTML parser!

--
Andy Hassall <andy@andyh.co.uk> / Space: disk usage analysis tool
<http://www.andyh.co.uk> / <http://www.andyhsoftware.co.uk/space>
Sugapablo
Guest
 
Posts: n/a
#7: Jul 17 '05

re: Code for returning HTML table data into array?


In article <Q42dnU1AGsojQpzdRVn-jw@comcast.com>, Chung Leong wrote:[color=blue]
> Are there nest tables in the file? If there are, then the HTML will be
> rather difficult to parsed.[/color]

There could be. But I actually don't forsee that as being too much of a
problem as each time the script would come across a new table, it would
realize it. Then what would be in that table data, would be another
table variable.

I knwo it sounds wierd but, hey...things are wierd.

--
[ Sugapablo ]
[ http://www.sugapablo.com <--music ]
[ http://www.sugapablo.net <--personal ]
[ sugapablo@12jabber.com <--jabber IM ]
Chung Leong
Guest
 
Posts: n/a
#8: Jul 17 '05

re: Code for returning HTML table data into array?


If you write the code that realizes it, then there's no problem. The problem
is writing the code realizes it :-)

You can't write a regular expression pattern that would extract the data,
that's why I said it's difficult to do.

Uzytkownik "Sugapablo" <russREMOVE@sugapablo.com> napisal w wiadomosci
news:slrnc03rp7.r8t.russREMOVE@dell.sugapablo.net. ..[color=blue]
> In article <Q42dnU1AGsojQpzdRVn-jw@comcast.com>, Chung Leong wrote:[color=green]
> > Are there nest tables in the file? If there are, then the HTML will be
> > rather difficult to parsed.[/color]
>
> There could be. But I actually don't forsee that as being too much of a
> problem as each time the script would come across a new table, it would
> realize it. Then what would be in that table data, would be another
> table variable.
>
> I knwo it sounds wierd but, hey...things are wierd.
>
> --
> [ Sugapablo ]
> [ http://www.sugapablo.com <--music ]
> [ http://www.sugapablo.net <--personal ]
> [ sugapablo@12jabber.com <--jabber IM ][/color]


Closed Thread