470,848 Members | 1,690 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,848 developers. It's quick & easy.

Help with parsing data out of a HTML File?

I have a league for a game where we get exports after every session in HTML
format

It is broken down into 3 sections and each section has a Table with the
results

Right now I have to create 3 csv files manually out of it

I would like to learn how to create a PHP script that can parse the data for
me out of the html results and create the csv files that I need

Can anyone give me some good (for newbie) references on where I can go to
learn how to do this? I have been searching for a few days and I have not
really been able to find anything that helps me.

Thank you in advance
Jul 17 '05 #1
5 1950
Cliff Roman wrote:
I have a league for a game where we get exports after every session in HTML
format

It is broken down into 3 sections and each section has a Table with the
results

Right now I have to create 3 csv files manually out of it

I would like to learn how to create a PHP script that can parse the data for
me out of the html results and create the csv files that I need

Can anyone give me some good (for newbie) references on where I can go to
learn how to do this? I have been searching for a few days and I have not
really been able to find anything that helps me.

Thank you in advance


I can't write the code for you but I suggest you have a look at
striptags() (or is it strip_tags() ?) to remove the html code, then use
something like explode() or implode() to do whatever it is you want to do...

You could also try making a sample html file available online because
spaces or other delimiters that would signify the begining of a value
and the end of another needs clarifying - As you look at your html
table, you know that the border of the table seperates on value from
another - But if your table cells are plain numeric and there's nothing
else to confuse things, then it shouldn't be too difficult - the biggest
problem is not the data, but the crap that might sit around it...

If you still have difficulties, let us know and I'll try my hand to help
out... but I'd really want to see some sort of sample output to know
what it is you want to chew and spit.

randelld
Jul 17 '05 #2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cliff Roman wrote:
I would like to learn how to create a PHP script that can parse the
data for me out of the html results and create the csv files that I
need


The new XML features of PHP5 could probably do that with ease. Check
out http://slides.bitflux.ch/phpconf2003/slide_23.html or the whole
presentation at http://slides.bitflux.ch/phpconf2003/

If upgrading your web server to PHP5 is not an option, you could
still install the PHP5 cli just for runnning this conversion.
Bob

- --
| B. Johannessen <bo*@db.org> +47 97 15 20 09 - http://db.org/
| Mail & Spam - News, Drafts & Standards - http://db.org/blog/
| On The Origin Of Spam; Spam Statistics - http://db.org/spam/
- --
-----BEGIN PGP SIGNATURE-----

iD8DBQFAN3MkooisUyMOFlgRAtlKAJ0VFManpx3fpZE1q4G+AD 1f37ZWTgCfY49R
qCOdowV7dxUZejS33WgzT0Y=
=mO8F
-----END PGP SIGNATURE-----
Jul 17 '05 #3
Imagine that this file looked like this

<H3 style="color : #173D54">
Session: Practice
</H3>
<H3 style="color : #173D54">
Date: 02/18/04
</H3>

Here is the part that I am unsure of.. My first step I guess is learning
how to say basically
After "Session:" The next word or string would = $session (in this case
Practice)

or in a repeating area (like a table)

<TD style="background : #CEDAEB">
1
</TD>
<TD style="background : #CEDAEB">
79
</TD>

How I would end up saying
After "<TD style="background : #CEDAEB">", the next item would = $rank (in
this case 1)
then have it say
After the next "<TD style="background : #CEDAEB">" it would = $score

etc

Maybe I am approaching this all wrong, I am unsure

Thanks

"Reply via newsgroup" <re****************@please.com> wrote in message
news:VwCZb.564736$JQ1.315614@pd7tw1no...
Cliff Roman wrote:
I have a league for a game where we get exports after every session in HTML format

It is broken down into 3 sections and each section has a Table with the
results

Right now I have to create 3 csv files manually out of it

I would like to learn how to create a PHP script that can parse the data for me out of the html results and create the csv files that I need

Can anyone give me some good (for newbie) references on where I can go to learn how to do this? I have been searching for a few days and I have not really been able to find anything that helps me.

Thank you in advance
I can't write the code for you but I suggest you have a look at
striptags() (or is it strip_tags() ?) to remove the html code, then use
something like explode() or implode() to do whatever it is you want to

do...
You could also try making a sample html file available online because
spaces or other delimiters that would signify the begining of a value
and the end of another needs clarifying - As you look at your html
table, you know that the border of the table seperates on value from
another - But if your table cells are plain numeric and there's nothing
else to confuse things, then it shouldn't be too difficult - the biggest
problem is not the data, but the crap that might sit around it...

If you still have difficulties, let us know and I'll try my hand to help
out... but I'd really want to see some sort of sample output to know
what it is you want to chew and spit.

randelld

Jul 17 '05 #4
Cliff Roman wrote:
Imagine that this file looked like this

<H3 style="color : #173D54">
Session: Practice
</H3>
<H3 style="color : #173D54">
Date: 02/18/04
</H3>

Here is the part that I am unsure of.. My first step I guess is learning
how to say basically
After "Session:" The next word or string would = $session (in this case
Practice)

or in a repeating area (like a table)

<TD style="background : #CEDAEB">
1
</TD>
<TD style="background : #CEDAEB">
79
</TD>

How I would end up saying
After "<TD style="background : #CEDAEB">", the next item would = $rank (in
this case 1)
then have it say
After the next "<TD style="background : #CEDAEB">" it would = $score

etc

Maybe I am approaching this all wrong, I am unsure

Thanks


Ignore the html tags, but look left to right, like you would be reading
a book and confirm a few things for me...

First, (again, reading from left to right) the start of a new table
begins with "Session:" true?

Second, the next description that is fixed is "Date:" true?

Third, until the next "Session", everything else that follows is
numeric, true?

Fourth, how many columns wide is your table, or is it variable?

Last, are your tables one under the other, or side by side? Or do they
have anything else that might get in the way.

Why?

Well I can try and bash out a script once I know some rough facts. I
can use strip_tags() to get rid of the html, then after that we are left
with a stream of text. We can use explode to put each word/number in to
an element of an array on its own - We can use "Switch:" as a flag to
indicate a new table of scores is starting or ending... and we can have
everything space delimited which makes it easy to read and re-write
everything else...

I'll keep an eye here for your answer and I will *try* to help further,
however I can't guarantee...

randelld
Jul 17 '05 #5
I would need the script to create 3 csv files for me.. if you can show me
an example of the first one then I would be more than happy/willing to work
through it and figure it out. I am just not sure where to start

So let me just look at the first one

Lets say for example the script.php file was in its own directory and the
results were in a /results directory. Lets assume the file is called
results.html

If I remove the html tags it would look like this

Session: Qualifying
P # DRIVER TIME
1 15 J_Doe 22.288
2 2 J_Smith 22.310
3 7 M_Johnson 22.376
etc..

The final result I would need would be something like this.. (in a file
called qual.csv)

1,15,J_Doe,22.288
2,2,J_Smith,22.310
3,7,M_Johnson,22.376
etc

I really appreciate the help you have given so far.. like I said, I have no
problem working it out on my own if I can get an example

Thanks,
Cliff

"Reply via newsgroup" <re****************@please.com> wrote in message
news:WlWZb.588570$ts4.528366@pd7tw3no...
Cliff Roman wrote:
Imagine that this file looked like this

<H3 style="color : #173D54">
Session: Practice
</H3>
<H3 style="color : #173D54">
Date: 02/18/04
</H3>

Here is the part that I am unsure of.. My first step I guess is learning how to say basically
After "Session:" The next word or string would = $session (in this case Practice)

or in a repeating area (like a table)

<TD style="background : #CEDAEB">
1
</TD>
<TD style="background : #CEDAEB">
79
</TD>

How I would end up saying
After "<TD style="background : #CEDAEB">", the next item would = $rank (in this case 1)
then have it say
After the next "<TD style="background : #CEDAEB">" it would = $score

etc

Maybe I am approaching this all wrong, I am unsure

Thanks


Ignore the html tags, but look left to right, like you would be reading
a book and confirm a few things for me...

First, (again, reading from left to right) the start of a new table
begins with "Session:" true?

Second, the next description that is fixed is "Date:" true?

Third, until the next "Session", everything else that follows is
numeric, true?

Fourth, how many columns wide is your table, or is it variable?

Last, are your tables one under the other, or side by side? Or do they
have anything else that might get in the way.

Why?

Well I can try and bash out a script once I know some rough facts. I
can use strip_tags() to get rid of the html, then after that we are left
with a stream of text. We can use explode to put each word/number in to
an element of an array on its own - We can use "Switch:" as a flag to
indicate a new table of scores is starting or ending... and we can have
everything space delimited which makes it easy to read and re-write
everything else...

I'll keep an eye here for your answer and I will *try* to help further,
however I can't guarantee...

randelld

Jul 17 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

6 posts views Thread by Virginia Kirkendall | last post: by
3 posts views Thread by Pir8 | last post: by
6 posts views Thread by momo | last post: by
4 posts views Thread by Rick Walsh | last post: by
4 posts views Thread by egonslokar | last post: by
2 posts views Thread by embz | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.