473,378 Members | 1,319 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

Help with parsing data out of a HTML File?

I have a league for a game where we get exports after every session in HTML
format

It is broken down into 3 sections and each section has a Table with the
results

Right now I have to create 3 csv files manually out of it

I would like to learn how to create a PHP script that can parse the data for
me out of the html results and create the csv files that I need

Can anyone give me some good (for newbie) references on where I can go to
learn how to do this? I have been searching for a few days and I have not
really been able to find anything that helps me.

Thank you in advance
Jul 17 '05 #1
5 2036
Cliff Roman wrote:
I have a league for a game where we get exports after every session in HTML
format

It is broken down into 3 sections and each section has a Table with the
results

Right now I have to create 3 csv files manually out of it

I would like to learn how to create a PHP script that can parse the data for
me out of the html results and create the csv files that I need

Can anyone give me some good (for newbie) references on where I can go to
learn how to do this? I have been searching for a few days and I have not
really been able to find anything that helps me.

Thank you in advance


I can't write the code for you but I suggest you have a look at
striptags() (or is it strip_tags() ?) to remove the html code, then use
something like explode() or implode() to do whatever it is you want to do...

You could also try making a sample html file available online because
spaces or other delimiters that would signify the begining of a value
and the end of another needs clarifying - As you look at your html
table, you know that the border of the table seperates on value from
another - But if your table cells are plain numeric and there's nothing
else to confuse things, then it shouldn't be too difficult - the biggest
problem is not the data, but the crap that might sit around it...

If you still have difficulties, let us know and I'll try my hand to help
out... but I'd really want to see some sort of sample output to know
what it is you want to chew and spit.

randelld
Jul 17 '05 #2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cliff Roman wrote:
I would like to learn how to create a PHP script that can parse the
data for me out of the html results and create the csv files that I
need


The new XML features of PHP5 could probably do that with ease. Check
out http://slides.bitflux.ch/phpconf2003/slide_23.html or the whole
presentation at http://slides.bitflux.ch/phpconf2003/

If upgrading your web server to PHP5 is not an option, you could
still install the PHP5 cli just for runnning this conversion.
Bob

- --
| B. Johannessen <bo*@db.org> +47 97 15 20 09 - http://db.org/
| Mail & Spam - News, Drafts & Standards - http://db.org/blog/
| On The Origin Of Spam; Spam Statistics - http://db.org/spam/
- --
-----BEGIN PGP SIGNATURE-----

iD8DBQFAN3MkooisUyMOFlgRAtlKAJ0VFManpx3fpZE1q4G+AD 1f37ZWTgCfY49R
qCOdowV7dxUZejS33WgzT0Y=
=mO8F
-----END PGP SIGNATURE-----
Jul 17 '05 #3
Imagine that this file looked like this

<H3 style="color : #173D54">
Session: Practice
</H3>
<H3 style="color : #173D54">
Date: 02/18/04
</H3>

Here is the part that I am unsure of.. My first step I guess is learning
how to say basically
After "Session:" The next word or string would = $session (in this case
Practice)

or in a repeating area (like a table)

<TD style="background : #CEDAEB">
1
</TD>
<TD style="background : #CEDAEB">
79
</TD>

How I would end up saying
After "<TD style="background : #CEDAEB">", the next item would = $rank (in
this case 1)
then have it say
After the next "<TD style="background : #CEDAEB">" it would = $score

etc

Maybe I am approaching this all wrong, I am unsure

Thanks

"Reply via newsgroup" <re****************@please.com> wrote in message
news:VwCZb.564736$JQ1.315614@pd7tw1no...
Cliff Roman wrote:
I have a league for a game where we get exports after every session in HTML format

It is broken down into 3 sections and each section has a Table with the
results

Right now I have to create 3 csv files manually out of it

I would like to learn how to create a PHP script that can parse the data for me out of the html results and create the csv files that I need

Can anyone give me some good (for newbie) references on where I can go to learn how to do this? I have been searching for a few days and I have not really been able to find anything that helps me.

Thank you in advance
I can't write the code for you but I suggest you have a look at
striptags() (or is it strip_tags() ?) to remove the html code, then use
something like explode() or implode() to do whatever it is you want to

do...
You could also try making a sample html file available online because
spaces or other delimiters that would signify the begining of a value
and the end of another needs clarifying - As you look at your html
table, you know that the border of the table seperates on value from
another - But if your table cells are plain numeric and there's nothing
else to confuse things, then it shouldn't be too difficult - the biggest
problem is not the data, but the crap that might sit around it...

If you still have difficulties, let us know and I'll try my hand to help
out... but I'd really want to see some sort of sample output to know
what it is you want to chew and spit.

randelld

Jul 17 '05 #4
Cliff Roman wrote:
Imagine that this file looked like this

<H3 style="color : #173D54">
Session: Practice
</H3>
<H3 style="color : #173D54">
Date: 02/18/04
</H3>

Here is the part that I am unsure of.. My first step I guess is learning
how to say basically
After "Session:" The next word or string would = $session (in this case
Practice)

or in a repeating area (like a table)

<TD style="background : #CEDAEB">
1
</TD>
<TD style="background : #CEDAEB">
79
</TD>

How I would end up saying
After "<TD style="background : #CEDAEB">", the next item would = $rank (in
this case 1)
then have it say
After the next "<TD style="background : #CEDAEB">" it would = $score

etc

Maybe I am approaching this all wrong, I am unsure

Thanks


Ignore the html tags, but look left to right, like you would be reading
a book and confirm a few things for me...

First, (again, reading from left to right) the start of a new table
begins with "Session:" true?

Second, the next description that is fixed is "Date:" true?

Third, until the next "Session", everything else that follows is
numeric, true?

Fourth, how many columns wide is your table, or is it variable?

Last, are your tables one under the other, or side by side? Or do they
have anything else that might get in the way.

Why?

Well I can try and bash out a script once I know some rough facts. I
can use strip_tags() to get rid of the html, then after that we are left
with a stream of text. We can use explode to put each word/number in to
an element of an array on its own - We can use "Switch:" as a flag to
indicate a new table of scores is starting or ending... and we can have
everything space delimited which makes it easy to read and re-write
everything else...

I'll keep an eye here for your answer and I will *try* to help further,
however I can't guarantee...

randelld
Jul 17 '05 #5
I would need the script to create 3 csv files for me.. if you can show me
an example of the first one then I would be more than happy/willing to work
through it and figure it out. I am just not sure where to start

So let me just look at the first one

Lets say for example the script.php file was in its own directory and the
results were in a /results directory. Lets assume the file is called
results.html

If I remove the html tags it would look like this

Session: Qualifying
P # DRIVER TIME
1 15 J_Doe 22.288
2 2 J_Smith 22.310
3 7 M_Johnson 22.376
etc..

The final result I would need would be something like this.. (in a file
called qual.csv)

1,15,J_Doe,22.288
2,2,J_Smith,22.310
3,7,M_Johnson,22.376
etc

I really appreciate the help you have given so far.. like I said, I have no
problem working it out on my own if I can get an example

Thanks,
Cliff

"Reply via newsgroup" <re****************@please.com> wrote in message
news:WlWZb.588570$ts4.528366@pd7tw3no...
Cliff Roman wrote:
Imagine that this file looked like this

<H3 style="color : #173D54">
Session: Practice
</H3>
<H3 style="color : #173D54">
Date: 02/18/04
</H3>

Here is the part that I am unsure of.. My first step I guess is learning how to say basically
After "Session:" The next word or string would = $session (in this case Practice)

or in a repeating area (like a table)

<TD style="background : #CEDAEB">
1
</TD>
<TD style="background : #CEDAEB">
79
</TD>

How I would end up saying
After "<TD style="background : #CEDAEB">", the next item would = $rank (in this case 1)
then have it say
After the next "<TD style="background : #CEDAEB">" it would = $score

etc

Maybe I am approaching this all wrong, I am unsure

Thanks


Ignore the html tags, but look left to right, like you would be reading
a book and confirm a few things for me...

First, (again, reading from left to right) the start of a new table
begins with "Session:" true?

Second, the next description that is fixed is "Date:" true?

Third, until the next "Session", everything else that follows is
numeric, true?

Fourth, how many columns wide is your table, or is it variable?

Last, are your tables one under the other, or side by side? Or do they
have anything else that might get in the way.

Why?

Well I can try and bash out a script once I know some rough facts. I
can use strip_tags() to get rid of the html, then after that we are left
with a stream of text. We can use explode to put each word/number in to
an element of an array on its own - We can use "Switch:" as a flag to
indicate a new table of scores is starting or ending... and we can have
everything space delimited which makes it easy to read and re-write
everything else...

I'll keep an eye here for your answer and I will *try* to help further,
however I can't guarantee...

randelld

Jul 17 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Virginia Kirkendall | last post by:
Hi: I'm new with this & need help creating a XSL table that looks like the following: --------------------------------------------------------- | | | | | |...
8
by: baustin75 | last post by:
Posted: Mon Oct 03, 2005 1:41 pm Post subject: cannot mail() in ie only when debugging in php designer 2005 -------------------------------------------------------------------------------- ...
3
by: Pir8 | last post by:
I have a complex xml file, which contains stories within a magazine. The structure of the xml file is as follows: <?xml version="1.0" encoding="ISO-8859-1" ?> <magazine> <story>...
6
by: momo | last post by:
Guys I need your help on this. I have this one problem and I admitted I am a novice at this. This is a Code Behind in an aspx page. You will see where I have the plus signs below in...
4
by: Rick Walsh | last post by:
I have an HTML table in the following format: <table> <tr><td>Header 1</td><td>Header 2</td></tr> <tr><td>1</td><td>2</td></tr> <tr><td>3</td><td>4</td></tr> <tr><td>5</td><td>6</td></tr>...
0
by: gunimpi | last post by:
http://www.vbforums.com/showthread.php?p=2745431#post2745431 ******************************************************** VB6 OR VBA & Webbrowser DOM Tiny $50 Mini Project Programmer help wanted...
1
by: prakash280681 | last post by:
how to get the correct result when we parse the xml code.Because in parsing when it get the syntax like < and & it break it and forward.. i use the code---- <?php $xml_file = "news.xml"; ...
4
by: egonslokar | last post by:
Hello Python Community, It'd be great if someone could provide guidance or sample code for accomplishing the following: I have a single unicode file that has descriptions of hundreds of...
2
by: embz | last post by:
this post concerns three pages. 1. this page: http://www.katherine-designs.com/sendemail.php i get the following errors: a lot of it seems to deal with the PHP code i inserted to the page....
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.