473,626 Members | 3,930 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Help with parsing data out of a HTML File?

I have a league for a game where we get exports after every session in HTML
format

It is broken down into 3 sections and each section has a Table with the
results

Right now I have to create 3 csv files manually out of it

I would like to learn how to create a PHP script that can parse the data for
me out of the html results and create the csv files that I need

Can anyone give me some good (for newbie) references on where I can go to
learn how to do this? I have been searching for a few days and I have not
really been able to find anything that helps me.

Thank you in advance
Jul 17 '05 #1
5 2054
Cliff Roman wrote:
I have a league for a game where we get exports after every session in HTML
format

It is broken down into 3 sections and each section has a Table with the
results

Right now I have to create 3 csv files manually out of it

I would like to learn how to create a PHP script that can parse the data for
me out of the html results and create the csv files that I need

Can anyone give me some good (for newbie) references on where I can go to
learn how to do this? I have been searching for a few days and I have not
really been able to find anything that helps me.

Thank you in advance


I can't write the code for you but I suggest you have a look at
striptags() (or is it strip_tags() ?) to remove the html code, then use
something like explode() or implode() to do whatever it is you want to do...

You could also try making a sample html file available online because
spaces or other delimiters that would signify the begining of a value
and the end of another needs clarifying - As you look at your html
table, you know that the border of the table seperates on value from
another - But if your table cells are plain numeric and there's nothing
else to confuse things, then it shouldn't be too difficult - the biggest
problem is not the data, but the crap that might sit around it...

If you still have difficulties, let us know and I'll try my hand to help
out... but I'd really want to see some sort of sample output to know
what it is you want to chew and spit.

randelld
Jul 17 '05 #2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cliff Roman wrote:
I would like to learn how to create a PHP script that can parse the
data for me out of the html results and create the csv files that I
need


The new XML features of PHP5 could probably do that with ease. Check
out http://slides.bitflux.ch/phpconf2003/slide_23.html or the whole
presentation at http://slides.bitflux.ch/phpconf2003/

If upgrading your web server to PHP5 is not an option, you could
still install the PHP5 cli just for runnning this conversion.
Bob

- --
| B. Johannessen <bo*@db.org> +47 97 15 20 09 - http://db.org/
| Mail & Spam - News, Drafts & Standards - http://db.org/blog/
| On The Origin Of Spam; Spam Statistics - http://db.org/spam/
- --
-----BEGIN PGP SIGNATURE-----

iD8DBQFAN3Mkooi sUyMOFlgRAtlKAJ 0VFManpx3fpZE1q 4G+AD1f37ZWTgCf Y49R
qCOdowV7dxUZejS 33WgzT0Y=
=mO8F
-----END PGP SIGNATURE-----
Jul 17 '05 #3
Imagine that this file looked like this

<H3 style="color : #173D54">
Session: Practice
</H3>
<H3 style="color : #173D54">
Date: 02/18/04
</H3>

Here is the part that I am unsure of.. My first step I guess is learning
how to say basically
After "Session:" The next word or string would = $session (in this case
Practice)

or in a repeating area (like a table)

<TD style="backgrou nd : #CEDAEB">
1
</TD>
<TD style="backgrou nd : #CEDAEB">
79
</TD>

How I would end up saying
After "<TD style="backgrou nd : #CEDAEB">", the next item would = $rank (in
this case 1)
then have it say
After the next "<TD style="backgrou nd : #CEDAEB">" it would = $score

etc

Maybe I am approaching this all wrong, I am unsure

Thanks

"Reply via newsgroup" <re************ ****@please.com > wrote in message
news:VwCZb.5647 36$JQ1.315614@p d7tw1no...
Cliff Roman wrote:
I have a league for a game where we get exports after every session in HTML format

It is broken down into 3 sections and each section has a Table with the
results

Right now I have to create 3 csv files manually out of it

I would like to learn how to create a PHP script that can parse the data for me out of the html results and create the csv files that I need

Can anyone give me some good (for newbie) references on where I can go to learn how to do this? I have been searching for a few days and I have not really been able to find anything that helps me.

Thank you in advance
I can't write the code for you but I suggest you have a look at
striptags() (or is it strip_tags() ?) to remove the html code, then use
something like explode() or implode() to do whatever it is you want to

do...
You could also try making a sample html file available online because
spaces or other delimiters that would signify the begining of a value
and the end of another needs clarifying - As you look at your html
table, you know that the border of the table seperates on value from
another - But if your table cells are plain numeric and there's nothing
else to confuse things, then it shouldn't be too difficult - the biggest
problem is not the data, but the crap that might sit around it...

If you still have difficulties, let us know and I'll try my hand to help
out... but I'd really want to see some sort of sample output to know
what it is you want to chew and spit.

randelld

Jul 17 '05 #4
Cliff Roman wrote:
Imagine that this file looked like this

<H3 style="color : #173D54">
Session: Practice
</H3>
<H3 style="color : #173D54">
Date: 02/18/04
</H3>

Here is the part that I am unsure of.. My first step I guess is learning
how to say basically
After "Session:" The next word or string would = $session (in this case
Practice)

or in a repeating area (like a table)

<TD style="backgrou nd : #CEDAEB">
1
</TD>
<TD style="backgrou nd : #CEDAEB">
79
</TD>

How I would end up saying
After "<TD style="backgrou nd : #CEDAEB">", the next item would = $rank (in
this case 1)
then have it say
After the next "<TD style="backgrou nd : #CEDAEB">" it would = $score

etc

Maybe I am approaching this all wrong, I am unsure

Thanks


Ignore the html tags, but look left to right, like you would be reading
a book and confirm a few things for me...

First, (again, reading from left to right) the start of a new table
begins with "Session:" true?

Second, the next description that is fixed is "Date:" true?

Third, until the next "Session", everything else that follows is
numeric, true?

Fourth, how many columns wide is your table, or is it variable?

Last, are your tables one under the other, or side by side? Or do they
have anything else that might get in the way.

Why?

Well I can try and bash out a script once I know some rough facts. I
can use strip_tags() to get rid of the html, then after that we are left
with a stream of text. We can use explode to put each word/number in to
an element of an array on its own - We can use "Switch:" as a flag to
indicate a new table of scores is starting or ending... and we can have
everything space delimited which makes it easy to read and re-write
everything else...

I'll keep an eye here for your answer and I will *try* to help further,
however I can't guarantee...

randelld
Jul 17 '05 #5
I would need the script to create 3 csv files for me.. if you can show me
an example of the first one then I would be more than happy/willing to work
through it and figure it out. I am just not sure where to start

So let me just look at the first one

Lets say for example the script.php file was in its own directory and the
results were in a /results directory. Lets assume the file is called
results.html

If I remove the html tags it would look like this

Session: Qualifying
P # DRIVER TIME
1 15 J_Doe 22.288
2 2 J_Smith 22.310
3 7 M_Johnson 22.376
etc..

The final result I would need would be something like this.. (in a file
called qual.csv)

1,15,J_Doe,22.2 88
2,2,J_Smith,22. 310
3,7,M_Johnson,2 2.376
etc

I really appreciate the help you have given so far.. like I said, I have no
problem working it out on my own if I can get an example

Thanks,
Cliff

"Reply via newsgroup" <re************ ****@please.com > wrote in message
news:WlWZb.5885 70$ts4.528366@p d7tw3no...
Cliff Roman wrote:
Imagine that this file looked like this

<H3 style="color : #173D54">
Session: Practice
</H3>
<H3 style="color : #173D54">
Date: 02/18/04
</H3>

Here is the part that I am unsure of.. My first step I guess is learning how to say basically
After "Session:" The next word or string would = $session (in this case Practice)

or in a repeating area (like a table)

<TD style="backgrou nd : #CEDAEB">
1
</TD>
<TD style="backgrou nd : #CEDAEB">
79
</TD>

How I would end up saying
After "<TD style="backgrou nd : #CEDAEB">", the next item would = $rank (in this case 1)
then have it say
After the next "<TD style="backgrou nd : #CEDAEB">" it would = $score

etc

Maybe I am approaching this all wrong, I am unsure

Thanks


Ignore the html tags, but look left to right, like you would be reading
a book and confirm a few things for me...

First, (again, reading from left to right) the start of a new table
begins with "Session:" true?

Second, the next description that is fixed is "Date:" true?

Third, until the next "Session", everything else that follows is
numeric, true?

Fourth, how many columns wide is your table, or is it variable?

Last, are your tables one under the other, or side by side? Or do they
have anything else that might get in the way.

Why?

Well I can try and bash out a script once I know some rough facts. I
can use strip_tags() to get rid of the html, then after that we are left
with a stream of text. We can use explode to put each word/number in to
an element of an array on its own - We can use "Switch:" as a flag to
indicate a new table of scores is starting or ending... and we can have
everything space delimited which makes it easy to read and re-write
everything else...

I'll keep an eye here for your answer and I will *try* to help further,
however I can't guarantee...

randelld

Jul 17 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
3675
by: Virginia Kirkendall | last post by:
Hi: I'm new with this & need help creating a XSL table that looks like the following: --------------------------------------------------------- | | | | | | |Title |CrossCut |Institution |START |END | | | | | | | | | |PI | | | | | | | | |
8
5462
by: baustin75 | last post by:
Posted: Mon Oct 03, 2005 1:41 pm Post subject: cannot mail() in ie only when debugging in php designer 2005 -------------------------------------------------------------------------------- Hello, I have a very simple problem but cannot seem to figure it out. I have a very simple php script that sends a test email to myself. When I debug it in PHP designer, it works with no problems, I get the test email. If
3
3491
by: Pir8 | last post by:
I have a complex xml file, which contains stories within a magazine. The structure of the xml file is as follows: <?xml version="1.0" encoding="ISO-8859-1" ?> <magazine> <story> <story_id>112233</story_id> <pub_name>Puleen's Publication</pub_name> <pub_code>PP</pub_code> <edition_date>20031201</edition_date>
6
2531
by: momo | last post by:
Guys I need your help on this. I have this one problem and I admitted I am a novice at this. This is a Code Behind in an aspx page. You will see where I have the plus signs below in SecureQueryString2.vb where I am calling another Class called "InvalidQueryStringException". My problem is where do I put this code so it can be called from SecureQueryString2.vb. I have tried to put it in the same code behind SecureQueryString2.vb and I get...
4
4854
by: Rick Walsh | last post by:
I have an HTML table in the following format: <table> <tr><td>Header 1</td><td>Header 2</td></tr> <tr><td>1</td><td>2</td></tr> <tr><td>3</td><td>4</td></tr> <tr><td>5</td><td>6</td></tr> </table> With an XSLT styles sheet, I can use for-each to grab the values in
0
5557
by: gunimpi | last post by:
http://www.vbforums.com/showthread.php?p=2745431#post2745431 ******************************************************** VB6 OR VBA & Webbrowser DOM Tiny $50 Mini Project Programmer help wanted ******************************************************** For this teeny job, please refer to: http://feeds.reddit.com/feed/8fu/?o=25
1
2377
by: prakash280681 | last post by:
how to get the correct result when we parse the xml code.Because in parsing when it get the syntax like < and & it break it and forward.. i use the code---- <?php $xml_file = "news.xml"; $title_key ="#CHANNEL#ITEM#TITLE"; //XML tag keys
4
1536
by: egonslokar | last post by:
Hello Python Community, It'd be great if someone could provide guidance or sample code for accomplishing the following: I have a single unicode file that has descriptions of hundreds of objects. The file fairly resembles HTML-EXAMPLE pasted below. I need to parse the file in such a way to extract data out of the html and to come up with a tab separated file that would look like OUTPUT-
2
6352
by: embz | last post by:
this post concerns three pages. 1. this page: http://www.katherine-designs.com/sendemail.php i get the following errors: a lot of it seems to deal with the PHP code i inserted to the page. as my PHP skills are close to nil, i'm wary about fiddling with it myself. =\ 2. now this page: http://www.katherine-designs.com/contact.php
0
8266
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8705
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8638
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8365
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8505
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6125
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5574
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4092
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
2626
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.