473,508 Members | 2,329 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Code for returning HTML table data into array?

Before I go building this, I want to know if it already exists.

I need some PHP code that will read a web page and return all text that
comes between <td></td> tags in an array.

So if there were three tables on that page, it would return the first
table's fourth row, third column in a variable such as:

$tableArray[0][3][1]
// ^ ^ ^ - 2nd <td></td>
// ^ ^ - 4th <tr></tr>
// ^ - 1st <table></table>

Does something like this exist somewhere where I can grab it, or do I have
to build it from scratch?
--
[ Sugapablo ]
[ http://www.sugapablo.com <--music ]
[ http://www.sugapablo.net <--personal ]
[ su*******@12jabber.com <--jabber IM ]
Jul 17 '05 #1
7 15022
Sugapablo wrote:
Before I go building this, I want to know if it already exists.

I need some PHP code that will read a web page and return all text that
comes between <td></td> tags in an array.

So if there were three tables on that page, it would return the first
table's fourth row, third column in a variable such as:

$tableArray[0][3][1]
// ^ ^ ^ - 2nd <td></td>
// ^ ^ - 4th <tr></tr>
// ^ - 1st <table></table>

Does something like this exist somewhere where I can grab it, or do I have
to build it from scratch?


I just recently posted a routine that gets all <input>s from within
<form>s. This tiny URL fetches it from the Google archive:
http://tinyurl.com/3629k

You just have to change it to fetch all <table>s, and all <tr>s from
each table, then all <td>s (maybe <th>s too?) from each <tr>.
Something like

preg_match_all($table_regexp, $html, $tables);
foreach ($tables as $table) {
preg_match_all($tr_regexp, $table_html, $trs);
foreach ($trs as $tr) {
preg_match_all($tr_regexp, $tr_html, $tds);
}
}
Happy Coding :)
--
--= my mail box only accepts =--
--= Content-Type: text/plain =--
--= Size below 10001 bytes =--
Jul 17 '05 #2
Pedro Graca wrote:
You just have to change it to fetch all <table>s, and all <tr>s from
each table, then all <td>s (maybe <th>s too?) from each <tr>.


Oops, I just remembered something that turns this into a nasty problem:
you can have <table>s inside <table>s (and, in fact, often do!)
--
--= my mail box only accepts =--
--= Content-Type: text/plain =--
--= Size below 10001 bytes =--
Jul 17 '05 #3
Are there nest tables in the file? If there are, then the HTML will be
rather difficult to parsed.

Uzytkownik "Sugapablo" <ru********@sugapablo.com> napisal w wiadomosci
news:sl***********************@dell.sugapablo.net. ..
Before I go building this, I want to know if it already exists.

I need some PHP code that will read a web page and return all text that
comes between <td></td> tags in an array.

So if there were three tables on that page, it would return the first
table's fourth row, third column in a variable such as:

$tableArray[0][3][1]
// ^ ^ ^ - 2nd <td></td>
// ^ ^ - 4th <tr></tr>
// ^ - 1st <table></table>

Does something like this exist somewhere where I can grab it, or do I have
to build it from scratch?
--
[ Sugapablo ]
[ http://www.sugapablo.com <--music ]
[ http://www.sugapablo.net <--personal ]
[ su*******@12jabber.com <--jabber IM ]

Jul 17 '05 #4
Yeah, and another problem is missing end tags. Parsing HTML is such a pain.
Almost makes you want to try some hack like outputting the captured HTML in
an invisible inline frame, then use Javascript to grab the data and post it
back to the server.

Uzytkownik "Pedro Graca" <he****@hotpop.com> napisal w wiadomosci
news:bt************@ID-203069.news.uni-berlin.de...
Pedro Graca wrote:
You just have to change it to fetch all <table>s, and all <tr>s from
each table, then all <td>s (maybe <th>s too?) from each <tr>.


Oops, I just remembered something that turns this into a nasty problem:
you can have <table>s inside <table>s (and, in fact, often do!)
--
--= my mail box only accepts =--
--= Content-Type: text/plain =--
--= Size below 10001 bytes =--

Jul 17 '05 #5
On Sun, 11 Jan 2004 20:01:30 -0000, Sugapablo <ru********@sugapablo.com> wrote:
Before I go building this, I want to know if it already exists.

I need some PHP code that will read a web page and return all text that
comes between <td></td> tags in an array.

So if there were three tables on that page, it would return the first
table's fourth row, third column in a variable such as:

$tableArray[0][3][1]
// ^ ^ ^ - 2nd <td></td>
// ^ ^ - 4th <tr></tr>
// ^ - 1st <table></table>

Does something like this exist somewhere where I can grab it, or do I have
to build it from scratch?


Parsing HTML is not trivial, and coping with marginal and outright broken HTML
is a real pain. Perl has some excellent HTML parsing modules, and one in
particular ideal for this: HTML::TableExtract.

You could write a Perl script and pass it the data you want, and have it
return the information in some more convenient form. Not particularly elegant
since you have to start up a perl intepreter (although that can be mitigated
using something like PersistentPerl which keeps the interpreter running for a
while afterwards so it's reusable by the next request, saving startup times),
but it's got to beat trying to write an HTML parser!

--
Andy Hassall <an**@andyh.co.uk> / Space: disk usage analysis tool
<http://www.andyh.co.uk> / <http://www.andyhsoftware.co.uk/space>
Jul 17 '05 #6
In article <Q4********************@comcast.com>, Chung Leong wrote:
Are there nest tables in the file? If there are, then the HTML will be
rather difficult to parsed.


There could be. But I actually don't forsee that as being too much of a
problem as each time the script would come across a new table, it would
realize it. Then what would be in that table data, would be another
table variable.

I knwo it sounds wierd but, hey...things are wierd.

--
[ Sugapablo ]
[ http://www.sugapablo.com <--music ]
[ http://www.sugapablo.net <--personal ]
[ su*******@12jabber.com <--jabber IM ]
Jul 17 '05 #7
If you write the code that realizes it, then there's no problem. The problem
is writing the code realizes it :-)

You can't write a regular expression pattern that would extract the data,
that's why I said it's difficult to do.

Uzytkownik "Sugapablo" <ru********@sugapablo.com> napisal w wiadomosci
news:sl***********************@dell.sugapablo.net. ..
In article <Q4********************@comcast.com>, Chung Leong wrote:
Are there nest tables in the file? If there are, then the HTML will be
rather difficult to parsed.


There could be. But I actually don't forsee that as being too much of a
problem as each time the script would come across a new table, it would
realize it. Then what would be in that table data, would be another
table variable.

I knwo it sounds wierd but, hey...things are wierd.

--
[ Sugapablo ]
[ http://www.sugapablo.com <--music ]
[ http://www.sugapablo.net <--personal ]
[ su*******@12jabber.com <--jabber IM ]

Jul 17 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
3273
by: Steven T. Hatton | last post by:
This is something I've been looking at because it is central to a currently broken part of the KDevelop new application wizard. I'm not complaining about it being broken, It's a CVS images. ...
10
10269
by: Fraser Ross | last post by:
I need to know the syntax for writing a reference of an array. I haven't seen it done often. I have a class with a member array and I want a member function to return an reference to it. ...
5
1902
by: Robert | last post by:
Hi, This might be a strange question but i would like to know how to return an array from a function. Do you have to use pointers for this? Thanks in advance, Robert
3
2703
by: Bas Wassink | last post by:
Hello there, I'm having trouble understanding a warning produced by 'splint', a code-checker. The warning produced is: keywords.c: (in function keyw_get_string) keywords.c:60:31: Released...
5
4762
by: R. MacDonald | last post by:
Hello, all, I am currently working on a .Net (VB) application that invokes routines in unmanaged (Fortran) DLLs. The unmanaged routines then communicate with the .Net application by means of a...
3
2923
by: josh.kuo | last post by:
Sorry about the subject, I can't think of a better one. I recently wrote some PHP classes that I think might be of interest to this group. Since I have been reaping the benefits of reading news...
15
13481
by: Joseph Geretz | last post by:
I'm a bit puzzled by the current recommendation not to send Datasets or Datatables between application tiers. http://support.microsoft.com/kb/306134 ...
0
4077
by: anuptosh | last post by:
Hi, I have been trying to run the below example to get a Oracle Array as an output from a Java code. This is an example I have found on the web. But, the expected result is that the code should...
11
1578
by: rich | last post by:
I'm having a tough time figuring out which of these two options are best. This is a matter of processing my data in PHP, vs MySQL. Usually that's a no brainer, but I have a couple gotchyas here...
0
7123
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7326
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
7046
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7498
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5627
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
3182
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
1557
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
766
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
418
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.