473,390 Members | 1,339 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,390 software developers and data experts.

Code for returning HTML table data into array?

Before I go building this, I want to know if it already exists.

I need some PHP code that will read a web page and return all text that
comes between <td></td> tags in an array.

So if there were three tables on that page, it would return the first
table's fourth row, third column in a variable such as:

$tableArray[0][3][1]
// ^ ^ ^ - 2nd <td></td>
// ^ ^ - 4th <tr></tr>
// ^ - 1st <table></table>

Does something like this exist somewhere where I can grab it, or do I have
to build it from scratch?
--
[ Sugapablo ]
[ http://www.sugapablo.com <--music ]
[ http://www.sugapablo.net <--personal ]
[ su*******@12jabber.com <--jabber IM ]
Jul 17 '05 #1
7 15014
Sugapablo wrote:
Before I go building this, I want to know if it already exists.

I need some PHP code that will read a web page and return all text that
comes between <td></td> tags in an array.

So if there were three tables on that page, it would return the first
table's fourth row, third column in a variable such as:

$tableArray[0][3][1]
// ^ ^ ^ - 2nd <td></td>
// ^ ^ - 4th <tr></tr>
// ^ - 1st <table></table>

Does something like this exist somewhere where I can grab it, or do I have
to build it from scratch?


I just recently posted a routine that gets all <input>s from within
<form>s. This tiny URL fetches it from the Google archive:
http://tinyurl.com/3629k

You just have to change it to fetch all <table>s, and all <tr>s from
each table, then all <td>s (maybe <th>s too?) from each <tr>.
Something like

preg_match_all($table_regexp, $html, $tables);
foreach ($tables as $table) {
preg_match_all($tr_regexp, $table_html, $trs);
foreach ($trs as $tr) {
preg_match_all($tr_regexp, $tr_html, $tds);
}
}
Happy Coding :)
--
--= my mail box only accepts =--
--= Content-Type: text/plain =--
--= Size below 10001 bytes =--
Jul 17 '05 #2
Pedro Graca wrote:
You just have to change it to fetch all <table>s, and all <tr>s from
each table, then all <td>s (maybe <th>s too?) from each <tr>.


Oops, I just remembered something that turns this into a nasty problem:
you can have <table>s inside <table>s (and, in fact, often do!)
--
--= my mail box only accepts =--
--= Content-Type: text/plain =--
--= Size below 10001 bytes =--
Jul 17 '05 #3
Are there nest tables in the file? If there are, then the HTML will be
rather difficult to parsed.

Uzytkownik "Sugapablo" <ru********@sugapablo.com> napisal w wiadomosci
news:sl***********************@dell.sugapablo.net. ..
Before I go building this, I want to know if it already exists.

I need some PHP code that will read a web page and return all text that
comes between <td></td> tags in an array.

So if there were three tables on that page, it would return the first
table's fourth row, third column in a variable such as:

$tableArray[0][3][1]
// ^ ^ ^ - 2nd <td></td>
// ^ ^ - 4th <tr></tr>
// ^ - 1st <table></table>

Does something like this exist somewhere where I can grab it, or do I have
to build it from scratch?
--
[ Sugapablo ]
[ http://www.sugapablo.com <--music ]
[ http://www.sugapablo.net <--personal ]
[ su*******@12jabber.com <--jabber IM ]

Jul 17 '05 #4
Yeah, and another problem is missing end tags. Parsing HTML is such a pain.
Almost makes you want to try some hack like outputting the captured HTML in
an invisible inline frame, then use Javascript to grab the data and post it
back to the server.

Uzytkownik "Pedro Graca" <he****@hotpop.com> napisal w wiadomosci
news:bt************@ID-203069.news.uni-berlin.de...
Pedro Graca wrote:
You just have to change it to fetch all <table>s, and all <tr>s from
each table, then all <td>s (maybe <th>s too?) from each <tr>.


Oops, I just remembered something that turns this into a nasty problem:
you can have <table>s inside <table>s (and, in fact, often do!)
--
--= my mail box only accepts =--
--= Content-Type: text/plain =--
--= Size below 10001 bytes =--

Jul 17 '05 #5
On Sun, 11 Jan 2004 20:01:30 -0000, Sugapablo <ru********@sugapablo.com> wrote:
Before I go building this, I want to know if it already exists.

I need some PHP code that will read a web page and return all text that
comes between <td></td> tags in an array.

So if there were three tables on that page, it would return the first
table's fourth row, third column in a variable such as:

$tableArray[0][3][1]
// ^ ^ ^ - 2nd <td></td>
// ^ ^ - 4th <tr></tr>
// ^ - 1st <table></table>

Does something like this exist somewhere where I can grab it, or do I have
to build it from scratch?


Parsing HTML is not trivial, and coping with marginal and outright broken HTML
is a real pain. Perl has some excellent HTML parsing modules, and one in
particular ideal for this: HTML::TableExtract.

You could write a Perl script and pass it the data you want, and have it
return the information in some more convenient form. Not particularly elegant
since you have to start up a perl intepreter (although that can be mitigated
using something like PersistentPerl which keeps the interpreter running for a
while afterwards so it's reusable by the next request, saving startup times),
but it's got to beat trying to write an HTML parser!

--
Andy Hassall <an**@andyh.co.uk> / Space: disk usage analysis tool
<http://www.andyh.co.uk> / <http://www.andyhsoftware.co.uk/space>
Jul 17 '05 #6
In article <Q4********************@comcast.com>, Chung Leong wrote:
Are there nest tables in the file? If there are, then the HTML will be
rather difficult to parsed.


There could be. But I actually don't forsee that as being too much of a
problem as each time the script would come across a new table, it would
realize it. Then what would be in that table data, would be another
table variable.

I knwo it sounds wierd but, hey...things are wierd.

--
[ Sugapablo ]
[ http://www.sugapablo.com <--music ]
[ http://www.sugapablo.net <--personal ]
[ su*******@12jabber.com <--jabber IM ]
Jul 17 '05 #7
If you write the code that realizes it, then there's no problem. The problem
is writing the code realizes it :-)

You can't write a regular expression pattern that would extract the data,
that's why I said it's difficult to do.

Uzytkownik "Sugapablo" <ru********@sugapablo.com> napisal w wiadomosci
news:sl***********************@dell.sugapablo.net. ..
In article <Q4********************@comcast.com>, Chung Leong wrote:
Are there nest tables in the file? If there are, then the HTML will be
rather difficult to parsed.


There could be. But I actually don't forsee that as being too much of a
problem as each time the script would come across a new table, it would
realize it. Then what would be in that table data, would be another
table variable.

I knwo it sounds wierd but, hey...things are wierd.

--
[ Sugapablo ]
[ http://www.sugapablo.com <--music ]
[ http://www.sugapablo.net <--personal ]
[ su*******@12jabber.com <--jabber IM ]

Jul 17 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
by: Steven T. Hatton | last post by:
This is something I've been looking at because it is central to a currently broken part of the KDevelop new application wizard. I'm not complaining about it being broken, It's a CVS images. ...
10
by: Fraser Ross | last post by:
I need to know the syntax for writing a reference of an array. I haven't seen it done often. I have a class with a member array and I want a member function to return an reference to it. ...
5
by: Robert | last post by:
Hi, This might be a strange question but i would like to know how to return an array from a function. Do you have to use pointers for this? Thanks in advance, Robert
3
by: Bas Wassink | last post by:
Hello there, I'm having trouble understanding a warning produced by 'splint', a code-checker. The warning produced is: keywords.c: (in function keyw_get_string) keywords.c:60:31: Released...
5
by: R. MacDonald | last post by:
Hello, all, I am currently working on a .Net (VB) application that invokes routines in unmanaged (Fortran) DLLs. The unmanaged routines then communicate with the .Net application by means of a...
3
by: josh.kuo | last post by:
Sorry about the subject, I can't think of a better one. I recently wrote some PHP classes that I think might be of interest to this group. Since I have been reaping the benefits of reading news...
15
by: Joseph Geretz | last post by:
I'm a bit puzzled by the current recommendation not to send Datasets or Datatables between application tiers. http://support.microsoft.com/kb/306134 ...
0
by: anuptosh | last post by:
Hi, I have been trying to run the below example to get a Oracle Array as an output from a Java code. This is an example I have found on the web. But, the expected result is that the code should...
11
by: rich | last post by:
I'm having a tough time figuring out which of these two options are best. This is a matter of processing my data in PHP, vs MySQL. Usually that's a no brainer, but I have a couple gotchyas here...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.