IceOnFire wrote:
I am working on a script to extract statistics (which is updated daily) from
a website, and insert them into a MySQL database. I want to take this
website:
http://www.usatoday.com/sports/baske...layers0304.htm
and strip off all the HTML tags and etc, make it look like
http://www.enlhoops.com/ratings/parsed.txt
and then insert each players stat line into the database.
I have begun writing the script, getting the file, striping html tags off,
but that doesn't seem to work too well. If anyone can help me get started,
suggest a function or anything else, that would be helpful. Thanks.
Here is some example code I wrote to do a very similar thing for the
BBC's Fantasy Football system (so I can view them on my Nokia 3650
phone). It's not perfect (in fact it's quite dirty) but it does the
trick and it may help get you started:
<?php
print '<?xml version="1.0"?>';
?>
<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.0//EN"
"http://www.wapforum.org/DTD/xhtml-mobile10.dtd" >
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>BBC Fantasy Football</title>
<style type="text/css">
p, body, td, th { font-family: Arial, Helvetica,
Sans-Serif; font-size: medium; }
th { background-color: #efefce; }
td { background-color: #ffffde; }
h1 { font-size: large;?>
</style>
</head>
<body>
<p align='center'><img src="bbcsport_logo.gif" alt="BBC Sport"
/></p>
<h1>Team for <?=$name?></h1>
<div align='center'>
<?php
$page =
file_get_contents("http://bbcfootball.fantasyleague.co.uk/team/teamscreen.asp?pin=$id");
$page = str_replace("\n", "", $page);
if (preg_match("/CURRENT FIRST 11(.*?)<\/table>/m", $page, $matches)) {
print "<table><tr><th>Player</th><th width='20'>P</th><th
width='30'>C</th><th width='20'>W</th><th width='20'>M</th></tr>";
$table = $matches[1];
preg_match_all("/(<tr>.*?<\/tr>)/", $table, $matches);
for ($n=0; $n<count($matches[1]); $n++) {
if (preg_match("/^.*?<td
..*?\/td><td.*?>(\d+).*<\/td><td.*>(\S+)<\/td><td.*>(\S+)<\/td>.*?squad_(\S).gif.*?<td.*>(\S
+)<\/td><td.*>(\S+)<\/td><td.*>(\S+)<\/td><td.*>(\S+)<\/td>/",
$matches[1][$n], $player)) {
switch ($player[4]) {
case "g": $pos='GK'; break;
case "f": $pos='FB'; break;
case "c": $pos='CB'; break;
case "m": $pos='MF'; break;
case "s": $pos='SK'; break;
}
$club = str_replace(" ", "", $player[5]);
print "<tr><td
align='left'>$player[2]$player[3]</td><td align='center'>$pos</td><td
align='center'>$club</t
d><td align='center'>$player[7]</td><td
align='center'>$player[8]</td></tr>";
}
}
print "</table>";
}
else {
print "<p><b>Currently updating...</b></p>";
}
?>
</div>
</body>
</html>
Best of luck,
Andy