473,373 Members | 1,048 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,373 software developers and data experts.

xml parsing script dying with "Premature end of script headers" error

I have been using an xml parsing script to parse a number of rss feeds
and return relevant results to a database. The script has worked well
for a couple of years, despite having very crude error-trapping (if it
finds an error in one of the xml files, the script stops). Recently, the
script has stopped working because one of the xml files is badly formed.

So I decided to rewrite the script with better error trapping; the
script should continue with the well-formed xml files and send me an
email telling me what happened.

The prototype script is failing with a "Premature end of script headers"
error. I am trying to work out if:

- this is a problem with my script, or
- a problem with the web server configuration

I have been over the code with as close as I have to a fine toothcomb,
and I can't see anything which would cause a problem.

Here is my code:

<?php

# code to parse multiple RSS .xml files, identify stories with keywords
in them, and enter those stories into DB

##### the first section of code is unchanged from the previous (working)
version

# SELECT, INSERT user privs for this page
$privs = "insert";

# create list of RSS feeds to parse
$feedsource = array(
"http://news.bbc.co.uk/rss/newsonline_uk_edition/uk/rss091.xml",
"http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/england/rss.xml",
"http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/scotland/rss.xml",
"http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/wales/rss.xml",
"http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/northern_ireland/rss.xml"
);

# these provide db connection and various query functions used below
include("../includes/config.inc");
include("../includes/sql.inc");

$insideitem = FALSE;
$tag = "";
$title = "";
$description = "";
$textdump = "";
$link = "";
$itemcount = FALSE;
$body1 = "";

function startElement($parser, $name, $attrs) {
global $insideitem, $tag, $title, $description, $link;
if ($insideitem) {
$tag = $name;
} elseif ($name == "ITEM") {
$insideitem = TRUE;
}
}

function endElement($parser, $name) {
global $insideitem, $tag, $title, $description, $link, $keywords, $feed;
$numkeywords = count($keywords);
$duplicate = FALSE;

if ($name == "ITEM") {
for($counter=0; $counter < $numkeywords; $counter++) {

# create regex which matches a whole word anywhere in a string
$regex = "/\b(" . $keywords[$counter] . ")\b/";

if(preg_match($regex, $title) || preg_match($regex, $description)) {
# if title or description string of parsed story matches the word
# get all news stories from db
$result = getTotalNews();
while ($row = mysql_fetch_array($result)) {
# loop through each existing news story
if($row[txtLink] == trim($link)) {
# if new link matches existing link, flag as duplicate
$duplicate = TRUE;
}
}
if($duplicate == FALSE) {
$itemcount = TRUE;
$datetime = date("Y-m-d H:i:s");
$title = trim(str_replace("'", "\'", $title));
$description = trim(str_replace("'", "\'", $description));
$link = trim($link);
$result = insertNews($title, $description, $link, $feed, $datetime);
$body1 .= "Item added: ";
$body1 .= $title;
$body1 .= " (link: ";
$body1 .= $link;
$body1 .= ") - ";
$body1 .= $description;
$body1 .= "\n\n";
mail("in*****@invalid.co.uk", "News Item Added: " . $title, $body1,
"FROM: ne******@railwaysarchive.co.uk");
} else {
$duplicate = FALSE;
}
}
}
$title = "";
$description = "";
$link = "";
$insideitem = FALSE;
}
}

function characterData($parser, $data) {
global $insideitem, $tag, $title, $description, $link;
if ($insideitem) {
switch ($tag) {
case "TITLE":
$title .= $data;
break;
case "DESCRIPTION":
$description .= $data;

break;
case "LINK":
$link .= $data;
break;
}
}
}

##### from here onwards the script has been rewritten

# initialise feed counter
$count = 0;
$passed = TRUE;
$body = "RSS parse results:\n";

foreach ($feedsource as $feed) {
# loop through each RSS file in turn
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");

if(fopen("$feed", "r")) {
# if file can be opened

$fp = fopen("$feed", "r");
$body .= "Success opening " . $feed . "\n";

while ($data = fread($fp, 4096)) {
# loop through feed contents

if(xml_parse($xml_parser, $data, feof($fp))) {
# success

$body .= "Success parsing " . $feed . "\n";

} else {
# fail

$body .= "Failed to parse " . $feed . ": XML error " .
xml_error_string(xml_get_error_code($xml_parser)) . " at line " .
xml_get_current_line_number($xml_parser) . "\n";
$passed = FALSE;

}

}

} else {
# failed to open file

$body .= "Failed to open " . $feed . "\n";
$passed = FALSE;

}

# close file
fclose($fp);
# free up xml parser
xml_parser_free($xml_parser);

}

if($passed) {
# if no errors
$passText = "no errors";
} else {
$passText = "ERRORS";
}

$subject = "Newsfeed report: " . $passText . " at " . date("d-m-Y G:i");

$to = "in*****@invalid.com";

mail($to, $subject, $body);

?>
Oct 26 '08 #1
3 4480
GazK wrote:
I have been using an xml parsing script to parse a number of rss feeds
and return relevant results to a database. The script has worked well
for a couple of years, despite having very crude error-trapping (if it
finds an error in one of the xml files, the script stops). Recently, the
script has stopped working because one of the xml files is badly formed.

So I decided to rewrite the script with better error trapping; the
script should continue with the well-formed xml files and send me an
email telling me what happened.

The prototype script is failing with a "Premature end of script headers"
error. I am trying to work out if:

- this is a problem with my script, or
- a problem with the web server configuration

I have been over the code with as close as I have to a fine toothcomb,
and I can't see anything which would cause a problem.

Here is my code:

<?php
[snipped some function and variable declarations]
##### from here onwards the script has been rewritten

# initialise feed counter
$count = 0;
$passed = TRUE;
$body = "RSS parse results:\n";

foreach ($feedsource as $feed) {
# loop through each RSS file in turn
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");

if(fopen("$feed", "r")) {
# if file can be opened

$fp = fopen("$feed", "r");
First of all, this is not a good way to test if fopen() succeeded.

if ($fp = fopen($feed, 'r')) {
...
}

Also, doing "$var" is a bad habit, because you may run into some
unexpected typing troubles:

var_dump("$int"); // string
var_dump($int); // integer
$body .= "Success opening " . $feed . "\n";

while ($data = fread($fp, 4096)) {
# loop through feed contents
Here's where your problem probably lies. You should not parse your RSS
data until you're finished collecting all the data. What happens when
your RSS data exceeds the buffer? The answer is that the
while-statement will start another iteration to get more data,
continuing in this manner until EOF is reached. This will cause
xml_parse() and other xml functions to attempt to operate on the
incomplete RSS feed.

Instead, use file_get_contents(), and eliminate the loop entirely.

Here's a small scale example of what *might* be happening to you, with
your current approach:

<?php
$rss = <<<EORSS
<?xml version="1.0"?>
<rss version="2.0">
<foo>
<bar>baz</bar>
</foo>
</rss>
EORSS;
$file = 'fread_test.rss';
$bufferTooSmall = ceil(strlen($rss) / 2);

// write the data - error checking removed for brevity
file_put_contents($file, $rss);

if ($fp = fopen($file, 'r')) {
$i = 1;
while ($data = fread($fp, $bufferTooSmall)) {
echo "Iteration $i:\n$data\n\n";
$i++;
}
fclose($fp);
}
?>
if(xml_parse($xml_parser, $data, feof($fp))) {
# success

$body .= "Success parsing " . $feed . "\n";

} else {
# fail

$body .= "Failed to parse " . $feed . ": XML error " .
xml_error_string(xml_get_error_code($xml_parser)) . " at line " .
xml_get_current_line_number($xml_parser) . "\n";
$passed = FALSE;

}

}

} else {
# failed to open file

$body .= "Failed to open " . $feed . "\n";
$passed = FALSE;

}

# close file
fclose($fp);
# free up xml parser
xml_parser_free($xml_parser);

}

if($passed) {
# if no errors
$passText = "no errors";
} else {
$passText = "ERRORS";
}

$subject = "Newsfeed report: " . $passText . " at " . date("d-m-Y G:i");

$to = "in*****@invalid.com";

mail($to, $subject, $body);

?>
--
Curtis
$eMail = str_replace('sig.invalid', 'gmail.com', $from);
Oct 27 '08 #2
Curtis wrote:
GazK wrote:
>I have been using an xml parsing script to parse a number of rss feeds
and return relevant results to a database. The script has worked well
for a couple of years, despite having very crude error-trapping (if it
finds an error in one of the xml files, the script stops). Recently,
the script has stopped working because one of the xml files is badly
formed.

So I decided to rewrite the script with better error trapping; the
script should continue with the well-formed xml files and send me an
email telling me what happened.

The prototype script is failing with a "Premature end of script
headers" error. I am trying to work out if:

- this is a problem with my script, or
- a problem with the web server configuration

I have been over the code with as close as I have to a fine toothcomb,
and I can't see anything which would cause a problem.

Here is my code:

<?php

[snipped some function and variable declarations]
>##### from here onwards the script has been rewritten

# initialise feed counter
$count = 0;
$passed = TRUE;
$body = "RSS parse results:\n";

foreach ($feedsource as $feed) {
# loop through each RSS file in turn
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if(fopen("$feed", "r")) {
# if file can be opened
$fp = fopen("$feed", "r");

First of all, this is not a good way to test if fopen() succeeded.

if ($fp = fopen($feed, 'r')) {
...
}

Also, doing "$var" is a bad habit, because you may run into some
unexpected typing troubles:

var_dump("$int"); // string
var_dump($int); // integer
> $body .= "Success opening " . $feed . "\n";
while ($data = fread($fp, 4096)) {
# loop through feed contents

Here's where your problem probably lies. You should not parse your RSS
data until you're finished collecting all the data. What happens when
your RSS data exceeds the buffer? The answer is that the while-statement
will start another iteration to get more data, continuing in this manner
until EOF is reached. This will cause xml_parse() and other xml
functions to attempt to operate on the incomplete RSS feed.

Instead, use file_get_contents(), and eliminate the loop entirely.

Here's a small scale example of what *might* be happening to you, with
your current approach:

<?php
$rss = <<<EORSS
<?xml version="1.0"?>
<rss version="2.0">
<foo>
<bar>baz</bar>
</foo>
</rss>
EORSS;
$file = 'fread_test.rss';
$bufferTooSmall = ceil(strlen($rss) / 2);

// write the data - error checking removed for brevity
file_put_contents($file, $rss);

if ($fp = fopen($file, 'r')) {
$i = 1;
while ($data = fread($fp, $bufferTooSmall)) {
echo "Iteration $i:\n$data\n\n";
$i++;
}
fclose($fp);
}
?>
> if(xml_parse($xml_parser, $data, feof($fp))) {
# success
$body .= "Success parsing " . $feed . "\n";
} else {
# fail
$body .= "Failed to parse " . $feed . ":
XML error " . xml_error_string(xml_get_error_code($xml_parser)) . " at
line " . xml_get_current_line_number($xml_parser) . "\n";
$passed = FALSE;
}
}

} else {
# failed to open file
$body .= "Failed to open " . $feed . "\n";
$passed = FALSE;
}

# close file
fclose($fp);
# free up xml parser
xml_parser_free($xml_parser);
}

if($passed) {
# if no errors
$passText = "no errors";
} else {
$passText = "ERRORS";
}

$subject = "Newsfeed report: " . $passText . " at " . date("d-m-Y G:i");

$to = "in*****@invalid.com";

mail($to, $subject, $body);

?>
Curtis, thanks for the assistance. I will give the file_get_contents()
approach a go - it looks much simpler in any case.

Garry
Oct 27 '08 #3
GazK wrote:
Curtis wrote:
>GazK wrote:
>>I have been using an xml parsing script to parse a number of rss
feeds and return relevant results to a database. The script has
worked well for a couple of years, despite having very crude
error-trapping (if it finds an error in one of the xml files, the
script stops). Recently, the script has stopped working because one
of the xml files is badly formed.

So I decided to rewrite the script with better error trapping; the
script should continue with the well-formed xml files and send me an
email telling me what happened.

The prototype script is failing with a "Premature end of script
headers" error. I am trying to work out if:

- this is a problem with my script, or
- a problem with the web server configuration

I have been over the code with as close as I have to a fine
toothcomb, and I can't see anything which would cause a problem.

Here is my code:

<?php

[snipped some function and variable declarations]
>>##### from here onwards the script has been rewritten

# initialise feed counter
$count = 0;
$passed = TRUE;
$body = "RSS parse results:\n";

foreach ($feedsource as $feed) {
# loop through each RSS file in turn
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if(fopen("$feed", "r")) {
# if file can be opened
$fp = fopen("$feed", "r");

First of all, this is not a good way to test if fopen() succeeded.

if ($fp = fopen($feed, 'r')) {
...
}

Also, doing "$var" is a bad habit, because you may run into some
unexpected typing troubles:

var_dump("$int"); // string
var_dump($int); // integer
>> $body .= "Success opening " . $feed . "\n";
while ($data = fread($fp, 4096)) {
# loop through feed contents

Here's where your problem probably lies. You should not parse your RSS
data until you're finished collecting all the data. What happens when
your RSS data exceeds the buffer? The answer is that the
while-statement will start another iteration to get more data,
continuing in this manner until EOF is reached. This will cause
xml_parse() and other xml functions to attempt to operate on the
incomplete RSS feed.

Instead, use file_get_contents(), and eliminate the loop entirely.

Here's a small scale example of what *might* be happening to you, with
your current approach:

<?php
$rss = <<<EORSS
<?xml version="1.0"?>
<rss version="2.0">
<foo>
<bar>baz</bar>
</foo>
</rss>
EORSS;
$file = 'fread_test.rss';
$bufferTooSmall = ceil(strlen($rss) / 2);

// write the data - error checking removed for brevity
file_put_contents($file, $rss);

if ($fp = fopen($file, 'r')) {
$i = 1;
while ($data = fread($fp, $bufferTooSmall)) {
echo "Iteration $i:\n$data\n\n";
$i++;
}
fclose($fp);
}
?>
>> if(xml_parse($xml_parser, $data, feof($fp))) {
# success
$body .= "Success parsing " . $feed . "\n";
} else {
# fail
$body .= "Failed to parse " . $feed . ":
XML error " . xml_error_string(xml_get_error_code($xml_parser)) . "
at line " . xml_get_current_line_number($xml_parser) . "\n";
$passed = FALSE;
}
}

} else {
# failed to open file
$body .= "Failed to open " . $feed . "\n";
$passed = FALSE;
}

# close file
fclose($fp);
# free up xml parser
xml_parser_free($xml_parser);
}

if($passed) {
# if no errors
$passText = "no errors";
} else {
$passText = "ERRORS";
}

$subject = "Newsfeed report: " . $passText . " at " . date("d-m-Y G:i");

$to = "in*****@invalid.com";

mail($to, $subject, $body);

?>

Curtis, thanks for the assistance. I will give the file_get_contents()
approach a go - it looks much simpler in any case.

Garry
Update - script is now morking much more reliably. Old script has been
binned. Thanks!
Oct 29 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Seagull Manager | last post by:
Running Apache 1.3, PHP 4.3, and WinXP, configured acc. to instructions on apache manual and php manual (as far as I can see), but getting "internal server error" in browser... log says "Premature...
1
by: BKM | last post by:
I'm using a VB6 WebBrowser control to get info from various web pages and, occasionally, my program stops when it finds a script error on the page. It won't resume until I click 'Yes' or 'No' on...
1
by: Jatinder Singh | last post by:
I am running a CGI Programme. which is throwing Premature script error for some inputs. I have checked and couldn't fig out the problem. Even error log is empty. Can anybody help me out of this...
5
by: Tam Inglis | last post by:
I have a web browser contol working sweetly. However when it hits a page with java script that has an error it throws up a dialog box each time prompting me "Do you wish to continue running...
0
by: John Constant | last post by:
Using the MS C++ Example http://support.microsoft.com/default.aspx?scid=kb;en-us;261003 I've successfully managed to trap and log Script Errors that are generated by the WebBrowser (IE 6) which is...
0
by: William D. Sossamon | last post by:
http://support.microsoft.com/default.aspx?kbid=818803 1.. You add the following form to a user control: <form id="Form1" name="Form1" runat="server">2.. You add the user control, for example,...
3
by: Guy Debord | last post by:
Hello all, I know that this is a long shot, but I have a problem which someone reading this group *may* just be able to shed some light on. We have a new internal personnel planner/attendance...
1
by: key3210 | last post by:
Hi there. Totally new to all this. First Timer. Well here is the problem. I recently started having script error messages popping up on my computer as soon as i log on, and it does not matter if I...
6
by: swethak | last post by:
Hi, I displayed the image taken from database.How to raotate that image using javascript.plz tell that how to start the logic.plz tell that some reference websites.
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.