473,498 Members | 1,830 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Help with parsing data out of a HTML File?

I have a league for a game where we get exports after every session in HTML
format

It is broken down into 3 sections and each section has a Table with the
results

Right now I have to create 3 csv files manually out of it

I would like to learn how to create a PHP script that can parse the data for
me out of the html results and create the csv files that I need

Can anyone give me some good (for newbie) references on where I can go to
learn how to do this? I have been searching for a few days and I have not
really been able to find anything that helps me.

Thank you in advance
Jul 17 '05 #1
5 2038
Cliff Roman wrote:
I have a league for a game where we get exports after every session in HTML
format

It is broken down into 3 sections and each section has a Table with the
results

Right now I have to create 3 csv files manually out of it

I would like to learn how to create a PHP script that can parse the data for
me out of the html results and create the csv files that I need

Can anyone give me some good (for newbie) references on where I can go to
learn how to do this? I have been searching for a few days and I have not
really been able to find anything that helps me.

Thank you in advance


I can't write the code for you but I suggest you have a look at
striptags() (or is it strip_tags() ?) to remove the html code, then use
something like explode() or implode() to do whatever it is you want to do...

You could also try making a sample html file available online because
spaces or other delimiters that would signify the begining of a value
and the end of another needs clarifying - As you look at your html
table, you know that the border of the table seperates on value from
another - But if your table cells are plain numeric and there's nothing
else to confuse things, then it shouldn't be too difficult - the biggest
problem is not the data, but the crap that might sit around it...

If you still have difficulties, let us know and I'll try my hand to help
out... but I'd really want to see some sort of sample output to know
what it is you want to chew and spit.

randelld
Jul 17 '05 #2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cliff Roman wrote:
I would like to learn how to create a PHP script that can parse the
data for me out of the html results and create the csv files that I
need


The new XML features of PHP5 could probably do that with ease. Check
out http://slides.bitflux.ch/phpconf2003/slide_23.html or the whole
presentation at http://slides.bitflux.ch/phpconf2003/

If upgrading your web server to PHP5 is not an option, you could
still install the PHP5 cli just for runnning this conversion.
Bob

- --
| B. Johannessen <bo*@db.org> +47 97 15 20 09 - http://db.org/
| Mail & Spam - News, Drafts & Standards - http://db.org/blog/
| On The Origin Of Spam; Spam Statistics - http://db.org/spam/
- --
-----BEGIN PGP SIGNATURE-----

iD8DBQFAN3MkooisUyMOFlgRAtlKAJ0VFManpx3fpZE1q4G+AD 1f37ZWTgCfY49R
qCOdowV7dxUZejS33WgzT0Y=
=mO8F
-----END PGP SIGNATURE-----
Jul 17 '05 #3
Imagine that this file looked like this

<H3 style="color : #173D54">
Session: Practice
</H3>
<H3 style="color : #173D54">
Date: 02/18/04
</H3>

Here is the part that I am unsure of.. My first step I guess is learning
how to say basically
After "Session:" The next word or string would = $session (in this case
Practice)

or in a repeating area (like a table)

<TD style="background : #CEDAEB">
1
</TD>
<TD style="background : #CEDAEB">
79
</TD>

How I would end up saying
After "<TD style="background : #CEDAEB">", the next item would = $rank (in
this case 1)
then have it say
After the next "<TD style="background : #CEDAEB">" it would = $score

etc

Maybe I am approaching this all wrong, I am unsure

Thanks

"Reply via newsgroup" <re****************@please.com> wrote in message
news:VwCZb.564736$JQ1.315614@pd7tw1no...
Cliff Roman wrote:
I have a league for a game where we get exports after every session in HTML format

It is broken down into 3 sections and each section has a Table with the
results

Right now I have to create 3 csv files manually out of it

I would like to learn how to create a PHP script that can parse the data for me out of the html results and create the csv files that I need

Can anyone give me some good (for newbie) references on where I can go to learn how to do this? I have been searching for a few days and I have not really been able to find anything that helps me.

Thank you in advance
I can't write the code for you but I suggest you have a look at
striptags() (or is it strip_tags() ?) to remove the html code, then use
something like explode() or implode() to do whatever it is you want to

do...
You could also try making a sample html file available online because
spaces or other delimiters that would signify the begining of a value
and the end of another needs clarifying - As you look at your html
table, you know that the border of the table seperates on value from
another - But if your table cells are plain numeric and there's nothing
else to confuse things, then it shouldn't be too difficult - the biggest
problem is not the data, but the crap that might sit around it...

If you still have difficulties, let us know and I'll try my hand to help
out... but I'd really want to see some sort of sample output to know
what it is you want to chew and spit.

randelld

Jul 17 '05 #4
Cliff Roman wrote:
Imagine that this file looked like this

<H3 style="color : #173D54">
Session: Practice
</H3>
<H3 style="color : #173D54">
Date: 02/18/04
</H3>

Here is the part that I am unsure of.. My first step I guess is learning
how to say basically
After "Session:" The next word or string would = $session (in this case
Practice)

or in a repeating area (like a table)

<TD style="background : #CEDAEB">
1
</TD>
<TD style="background : #CEDAEB">
79
</TD>

How I would end up saying
After "<TD style="background : #CEDAEB">", the next item would = $rank (in
this case 1)
then have it say
After the next "<TD style="background : #CEDAEB">" it would = $score

etc

Maybe I am approaching this all wrong, I am unsure

Thanks


Ignore the html tags, but look left to right, like you would be reading
a book and confirm a few things for me...

First, (again, reading from left to right) the start of a new table
begins with "Session:" true?

Second, the next description that is fixed is "Date:" true?

Third, until the next "Session", everything else that follows is
numeric, true?

Fourth, how many columns wide is your table, or is it variable?

Last, are your tables one under the other, or side by side? Or do they
have anything else that might get in the way.

Why?

Well I can try and bash out a script once I know some rough facts. I
can use strip_tags() to get rid of the html, then after that we are left
with a stream of text. We can use explode to put each word/number in to
an element of an array on its own - We can use "Switch:" as a flag to
indicate a new table of scores is starting or ending... and we can have
everything space delimited which makes it easy to read and re-write
everything else...

I'll keep an eye here for your answer and I will *try* to help further,
however I can't guarantee...

randelld
Jul 17 '05 #5
I would need the script to create 3 csv files for me.. if you can show me
an example of the first one then I would be more than happy/willing to work
through it and figure it out. I am just not sure where to start

So let me just look at the first one

Lets say for example the script.php file was in its own directory and the
results were in a /results directory. Lets assume the file is called
results.html

If I remove the html tags it would look like this

Session: Qualifying
P # DRIVER TIME
1 15 J_Doe 22.288
2 2 J_Smith 22.310
3 7 M_Johnson 22.376
etc..

The final result I would need would be something like this.. (in a file
called qual.csv)

1,15,J_Doe,22.288
2,2,J_Smith,22.310
3,7,M_Johnson,22.376
etc

I really appreciate the help you have given so far.. like I said, I have no
problem working it out on my own if I can get an example

Thanks,
Cliff

"Reply via newsgroup" <re****************@please.com> wrote in message
news:WlWZb.588570$ts4.528366@pd7tw3no...
Cliff Roman wrote:
Imagine that this file looked like this

<H3 style="color : #173D54">
Session: Practice
</H3>
<H3 style="color : #173D54">
Date: 02/18/04
</H3>

Here is the part that I am unsure of.. My first step I guess is learning how to say basically
After "Session:" The next word or string would = $session (in this case Practice)

or in a repeating area (like a table)

<TD style="background : #CEDAEB">
1
</TD>
<TD style="background : #CEDAEB">
79
</TD>

How I would end up saying
After "<TD style="background : #CEDAEB">", the next item would = $rank (in this case 1)
then have it say
After the next "<TD style="background : #CEDAEB">" it would = $score

etc

Maybe I am approaching this all wrong, I am unsure

Thanks


Ignore the html tags, but look left to right, like you would be reading
a book and confirm a few things for me...

First, (again, reading from left to right) the start of a new table
begins with "Session:" true?

Second, the next description that is fixed is "Date:" true?

Third, until the next "Session", everything else that follows is
numeric, true?

Fourth, how many columns wide is your table, or is it variable?

Last, are your tables one under the other, or side by side? Or do they
have anything else that might get in the way.

Why?

Well I can try and bash out a script once I know some rough facts. I
can use strip_tags() to get rid of the html, then after that we are left
with a stream of text. We can use explode to put each word/number in to
an element of an array on its own - We can use "Switch:" as a flag to
indicate a new table of scores is starting or ending... and we can have
everything space delimited which makes it easy to read and re-write
everything else...

I'll keep an eye here for your answer and I will *try* to help further,
however I can't guarantee...

randelld

Jul 17 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
3657
by: Virginia Kirkendall | last post by:
Hi: I'm new with this & need help creating a XSL table that looks like the following: --------------------------------------------------------- | | | | | |...
8
5448
by: baustin75 | last post by:
Posted: Mon Oct 03, 2005 1:41 pm Post subject: cannot mail() in ie only when debugging in php designer 2005 -------------------------------------------------------------------------------- ...
3
3481
by: Pir8 | last post by:
I have a complex xml file, which contains stories within a magazine. The structure of the xml file is as follows: <?xml version="1.0" encoding="ISO-8859-1" ?> <magazine> <story>...
6
2513
by: momo | last post by:
Guys I need your help on this. I have this one problem and I admitted I am a novice at this. This is a Code Behind in an aspx page. You will see where I have the plus signs below in...
4
4841
by: Rick Walsh | last post by:
I have an HTML table in the following format: <table> <tr><td>Header 1</td><td>Header 2</td></tr> <tr><td>1</td><td>2</td></tr> <tr><td>3</td><td>4</td></tr> <tr><td>5</td><td>6</td></tr>...
0
5518
by: gunimpi | last post by:
http://www.vbforums.com/showthread.php?p=2745431#post2745431 ******************************************************** VB6 OR VBA & Webbrowser DOM Tiny $50 Mini Project Programmer help wanted...
1
2367
by: prakash280681 | last post by:
how to get the correct result when we parse the xml code.Because in parsing when it get the syntax like < and & it break it and forward.. i use the code---- <?php $xml_file = "news.xml"; ...
4
1523
by: egonslokar | last post by:
Hello Python Community, It'd be great if someone could provide guidance or sample code for accomplishing the following: I have a single unicode file that has descriptions of hundreds of...
2
6337
by: embz | last post by:
this post concerns three pages. 1. this page: http://www.katherine-designs.com/sendemail.php i get the following errors: a lot of it seems to deal with the PHP code i inserted to the page....
0
7125
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7004
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
1
6890
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
5464
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
4915
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
3095
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
1423
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
657
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
292
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.