473,473 Members | 1,841 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

xml parsing script dying with "Premature end of script headers" error

I have been using an xml parsing script to parse a number of rss feeds
and return relevant results to a database. The script has worked well
for a couple of years, despite having very crude error-trapping (if it
finds an error in one of the xml files, the script stops). Recently, the
script has stopped working because one of the xml files is badly formed.

So I decided to rewrite the script with better error trapping; the
script should continue with the well-formed xml files and send me an
email telling me what happened.

The prototype script is failing with a "Premature end of script headers"
error. I am trying to work out if:

- this is a problem with my script, or
- a problem with the web server configuration

I have been over the code with as close as I have to a fine toothcomb,
and I can't see anything which would cause a problem.

Here is my code:

<?php

# code to parse multiple RSS .xml files, identify stories with keywords
in them, and enter those stories into DB

##### the first section of code is unchanged from the previous (working)
version

# SELECT, INSERT user privs for this page
$privs = "insert";

# create list of RSS feeds to parse
$feedsource = array(
"http://news.bbc.co.uk/rss/newsonline_uk_edition/uk/rss091.xml",
"http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/england/rss.xml",
"http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/scotland/rss.xml",
"http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/wales/rss.xml",
"http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/northern_ireland/rss.xml"
);

# these provide db connection and various query functions used below
include("../includes/config.inc");
include("../includes/sql.inc");

$insideitem = FALSE;
$tag = "";
$title = "";
$description = "";
$textdump = "";
$link = "";
$itemcount = FALSE;
$body1 = "";

function startElement($parser, $name, $attrs) {
global $insideitem, $tag, $title, $description, $link;
if ($insideitem) {
$tag = $name;
} elseif ($name == "ITEM") {
$insideitem = TRUE;
}
}

function endElement($parser, $name) {
global $insideitem, $tag, $title, $description, $link, $keywords, $feed;
$numkeywords = count($keywords);
$duplicate = FALSE;

if ($name == "ITEM") {
for($counter=0; $counter < $numkeywords; $counter++) {

# create regex which matches a whole word anywhere in a string
$regex = "/\b(" . $keywords[$counter] . ")\b/";

if(preg_match($regex, $title) || preg_match($regex, $description)) {
# if title or description string of parsed story matches the word
# get all news stories from db
$result = getTotalNews();
while ($row = mysql_fetch_array($result)) {
# loop through each existing news story
if($row[txtLink] == trim($link)) {
# if new link matches existing link, flag as duplicate
$duplicate = TRUE;
}
}
if($duplicate == FALSE) {
$itemcount = TRUE;
$datetime = date("Y-m-d H:i:s");
$title = trim(str_replace("'", "\'", $title));
$description = trim(str_replace("'", "\'", $description));
$link = trim($link);
$result = insertNews($title, $description, $link, $feed, $datetime);
$body1 .= "Item added: ";
$body1 .= $title;
$body1 .= " (link: ";
$body1 .= $link;
$body1 .= ") - ";
$body1 .= $description;
$body1 .= "\n\n";
mail("in*****@invalid.co.uk", "News Item Added: " . $title, $body1,
"FROM: ne******@railwaysarchive.co.uk");
} else {
$duplicate = FALSE;
}
}
}
$title = "";
$description = "";
$link = "";
$insideitem = FALSE;
}
}

function characterData($parser, $data) {
global $insideitem, $tag, $title, $description, $link;
if ($insideitem) {
switch ($tag) {
case "TITLE":
$title .= $data;
break;
case "DESCRIPTION":
$description .= $data;

break;
case "LINK":
$link .= $data;
break;
}
}
}

##### from here onwards the script has been rewritten

# initialise feed counter
$count = 0;
$passed = TRUE;
$body = "RSS parse results:\n";

foreach ($feedsource as $feed) {
# loop through each RSS file in turn
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");

if(fopen("$feed", "r")) {
# if file can be opened

$fp = fopen("$feed", "r");
$body .= "Success opening " . $feed . "\n";

while ($data = fread($fp, 4096)) {
# loop through feed contents

if(xml_parse($xml_parser, $data, feof($fp))) {
# success

$body .= "Success parsing " . $feed . "\n";

} else {
# fail

$body .= "Failed to parse " . $feed . ": XML error " .
xml_error_string(xml_get_error_code($xml_parser)) . " at line " .
xml_get_current_line_number($xml_parser) . "\n";
$passed = FALSE;

}

}

} else {
# failed to open file

$body .= "Failed to open " . $feed . "\n";
$passed = FALSE;

}

# close file
fclose($fp);
# free up xml parser
xml_parser_free($xml_parser);

}

if($passed) {
# if no errors
$passText = "no errors";
} else {
$passText = "ERRORS";
}

$subject = "Newsfeed report: " . $passText . " at " . date("d-m-Y G:i");

$to = "in*****@invalid.com";

mail($to, $subject, $body);

?>
Oct 26 '08 #1
3 4488
GazK wrote:
I have been using an xml parsing script to parse a number of rss feeds
and return relevant results to a database. The script has worked well
for a couple of years, despite having very crude error-trapping (if it
finds an error in one of the xml files, the script stops). Recently, the
script has stopped working because one of the xml files is badly formed.

So I decided to rewrite the script with better error trapping; the
script should continue with the well-formed xml files and send me an
email telling me what happened.

The prototype script is failing with a "Premature end of script headers"
error. I am trying to work out if:

- this is a problem with my script, or
- a problem with the web server configuration

I have been over the code with as close as I have to a fine toothcomb,
and I can't see anything which would cause a problem.

Here is my code:

<?php
[snipped some function and variable declarations]
##### from here onwards the script has been rewritten

# initialise feed counter
$count = 0;
$passed = TRUE;
$body = "RSS parse results:\n";

foreach ($feedsource as $feed) {
# loop through each RSS file in turn
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");

if(fopen("$feed", "r")) {
# if file can be opened

$fp = fopen("$feed", "r");
First of all, this is not a good way to test if fopen() succeeded.

if ($fp = fopen($feed, 'r')) {
...
}

Also, doing "$var" is a bad habit, because you may run into some
unexpected typing troubles:

var_dump("$int"); // string
var_dump($int); // integer
$body .= "Success opening " . $feed . "\n";

while ($data = fread($fp, 4096)) {
# loop through feed contents
Here's where your problem probably lies. You should not parse your RSS
data until you're finished collecting all the data. What happens when
your RSS data exceeds the buffer? The answer is that the
while-statement will start another iteration to get more data,
continuing in this manner until EOF is reached. This will cause
xml_parse() and other xml functions to attempt to operate on the
incomplete RSS feed.

Instead, use file_get_contents(), and eliminate the loop entirely.

Here's a small scale example of what *might* be happening to you, with
your current approach:

<?php
$rss = <<<EORSS
<?xml version="1.0"?>
<rss version="2.0">
<foo>
<bar>baz</bar>
</foo>
</rss>
EORSS;
$file = 'fread_test.rss';
$bufferTooSmall = ceil(strlen($rss) / 2);

// write the data - error checking removed for brevity
file_put_contents($file, $rss);

if ($fp = fopen($file, 'r')) {
$i = 1;
while ($data = fread($fp, $bufferTooSmall)) {
echo "Iteration $i:\n$data\n\n";
$i++;
}
fclose($fp);
}
?>
if(xml_parse($xml_parser, $data, feof($fp))) {
# success

$body .= "Success parsing " . $feed . "\n";

} else {
# fail

$body .= "Failed to parse " . $feed . ": XML error " .
xml_error_string(xml_get_error_code($xml_parser)) . " at line " .
xml_get_current_line_number($xml_parser) . "\n";
$passed = FALSE;

}

}

} else {
# failed to open file

$body .= "Failed to open " . $feed . "\n";
$passed = FALSE;

}

# close file
fclose($fp);
# free up xml parser
xml_parser_free($xml_parser);

}

if($passed) {
# if no errors
$passText = "no errors";
} else {
$passText = "ERRORS";
}

$subject = "Newsfeed report: " . $passText . " at " . date("d-m-Y G:i");

$to = "in*****@invalid.com";

mail($to, $subject, $body);

?>
--
Curtis
$eMail = str_replace('sig.invalid', 'gmail.com', $from);
Oct 27 '08 #2
Curtis wrote:
GazK wrote:
>I have been using an xml parsing script to parse a number of rss feeds
and return relevant results to a database. The script has worked well
for a couple of years, despite having very crude error-trapping (if it
finds an error in one of the xml files, the script stops). Recently,
the script has stopped working because one of the xml files is badly
formed.

So I decided to rewrite the script with better error trapping; the
script should continue with the well-formed xml files and send me an
email telling me what happened.

The prototype script is failing with a "Premature end of script
headers" error. I am trying to work out if:

- this is a problem with my script, or
- a problem with the web server configuration

I have been over the code with as close as I have to a fine toothcomb,
and I can't see anything which would cause a problem.

Here is my code:

<?php

[snipped some function and variable declarations]
>##### from here onwards the script has been rewritten

# initialise feed counter
$count = 0;
$passed = TRUE;
$body = "RSS parse results:\n";

foreach ($feedsource as $feed) {
# loop through each RSS file in turn
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if(fopen("$feed", "r")) {
# if file can be opened
$fp = fopen("$feed", "r");

First of all, this is not a good way to test if fopen() succeeded.

if ($fp = fopen($feed, 'r')) {
...
}

Also, doing "$var" is a bad habit, because you may run into some
unexpected typing troubles:

var_dump("$int"); // string
var_dump($int); // integer
> $body .= "Success opening " . $feed . "\n";
while ($data = fread($fp, 4096)) {
# loop through feed contents

Here's where your problem probably lies. You should not parse your RSS
data until you're finished collecting all the data. What happens when
your RSS data exceeds the buffer? The answer is that the while-statement
will start another iteration to get more data, continuing in this manner
until EOF is reached. This will cause xml_parse() and other xml
functions to attempt to operate on the incomplete RSS feed.

Instead, use file_get_contents(), and eliminate the loop entirely.

Here's a small scale example of what *might* be happening to you, with
your current approach:

<?php
$rss = <<<EORSS
<?xml version="1.0"?>
<rss version="2.0">
<foo>
<bar>baz</bar>
</foo>
</rss>
EORSS;
$file = 'fread_test.rss';
$bufferTooSmall = ceil(strlen($rss) / 2);

// write the data - error checking removed for brevity
file_put_contents($file, $rss);

if ($fp = fopen($file, 'r')) {
$i = 1;
while ($data = fread($fp, $bufferTooSmall)) {
echo "Iteration $i:\n$data\n\n";
$i++;
}
fclose($fp);
}
?>
> if(xml_parse($xml_parser, $data, feof($fp))) {
# success
$body .= "Success parsing " . $feed . "\n";
} else {
# fail
$body .= "Failed to parse " . $feed . ":
XML error " . xml_error_string(xml_get_error_code($xml_parser)) . " at
line " . xml_get_current_line_number($xml_parser) . "\n";
$passed = FALSE;
}
}

} else {
# failed to open file
$body .= "Failed to open " . $feed . "\n";
$passed = FALSE;
}

# close file
fclose($fp);
# free up xml parser
xml_parser_free($xml_parser);
}

if($passed) {
# if no errors
$passText = "no errors";
} else {
$passText = "ERRORS";
}

$subject = "Newsfeed report: " . $passText . " at " . date("d-m-Y G:i");

$to = "in*****@invalid.com";

mail($to, $subject, $body);

?>
Curtis, thanks for the assistance. I will give the file_get_contents()
approach a go - it looks much simpler in any case.

Garry
Oct 27 '08 #3
GazK wrote:
Curtis wrote:
>GazK wrote:
>>I have been using an xml parsing script to parse a number of rss
feeds and return relevant results to a database. The script has
worked well for a couple of years, despite having very crude
error-trapping (if it finds an error in one of the xml files, the
script stops). Recently, the script has stopped working because one
of the xml files is badly formed.

So I decided to rewrite the script with better error trapping; the
script should continue with the well-formed xml files and send me an
email telling me what happened.

The prototype script is failing with a "Premature end of script
headers" error. I am trying to work out if:

- this is a problem with my script, or
- a problem with the web server configuration

I have been over the code with as close as I have to a fine
toothcomb, and I can't see anything which would cause a problem.

Here is my code:

<?php

[snipped some function and variable declarations]
>>##### from here onwards the script has been rewritten

# initialise feed counter
$count = 0;
$passed = TRUE;
$body = "RSS parse results:\n";

foreach ($feedsource as $feed) {
# loop through each RSS file in turn
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
if(fopen("$feed", "r")) {
# if file can be opened
$fp = fopen("$feed", "r");

First of all, this is not a good way to test if fopen() succeeded.

if ($fp = fopen($feed, 'r')) {
...
}

Also, doing "$var" is a bad habit, because you may run into some
unexpected typing troubles:

var_dump("$int"); // string
var_dump($int); // integer
>> $body .= "Success opening " . $feed . "\n";
while ($data = fread($fp, 4096)) {
# loop through feed contents

Here's where your problem probably lies. You should not parse your RSS
data until you're finished collecting all the data. What happens when
your RSS data exceeds the buffer? The answer is that the
while-statement will start another iteration to get more data,
continuing in this manner until EOF is reached. This will cause
xml_parse() and other xml functions to attempt to operate on the
incomplete RSS feed.

Instead, use file_get_contents(), and eliminate the loop entirely.

Here's a small scale example of what *might* be happening to you, with
your current approach:

<?php
$rss = <<<EORSS
<?xml version="1.0"?>
<rss version="2.0">
<foo>
<bar>baz</bar>
</foo>
</rss>
EORSS;
$file = 'fread_test.rss';
$bufferTooSmall = ceil(strlen($rss) / 2);

// write the data - error checking removed for brevity
file_put_contents($file, $rss);

if ($fp = fopen($file, 'r')) {
$i = 1;
while ($data = fread($fp, $bufferTooSmall)) {
echo "Iteration $i:\n$data\n\n";
$i++;
}
fclose($fp);
}
?>
>> if(xml_parse($xml_parser, $data, feof($fp))) {
# success
$body .= "Success parsing " . $feed . "\n";
} else {
# fail
$body .= "Failed to parse " . $feed . ":
XML error " . xml_error_string(xml_get_error_code($xml_parser)) . "
at line " . xml_get_current_line_number($xml_parser) . "\n";
$passed = FALSE;
}
}

} else {
# failed to open file
$body .= "Failed to open " . $feed . "\n";
$passed = FALSE;
}

# close file
fclose($fp);
# free up xml parser
xml_parser_free($xml_parser);
}

if($passed) {
# if no errors
$passText = "no errors";
} else {
$passText = "ERRORS";
}

$subject = "Newsfeed report: " . $passText . " at " . date("d-m-Y G:i");

$to = "in*****@invalid.com";

mail($to, $subject, $body);

?>

Curtis, thanks for the assistance. I will give the file_get_contents()
approach a go - it looks much simpler in any case.

Garry
Update - script is now morking much more reliably. Old script has been
binned. Thanks!
Oct 29 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Seagull Manager | last post by:
Running Apache 1.3, PHP 4.3, and WinXP, configured acc. to instructions on apache manual and php manual (as far as I can see), but getting "internal server error" in browser... log says "Premature...
1
by: BKM | last post by:
I'm using a VB6 WebBrowser control to get info from various web pages and, occasionally, my program stops when it finds a script error on the page. It won't resume until I click 'Yes' or 'No' on...
1
by: Jatinder Singh | last post by:
I am running a CGI Programme. which is throwing Premature script error for some inputs. I have checked and couldn't fig out the problem. Even error log is empty. Can anybody help me out of this...
5
by: Tam Inglis | last post by:
I have a web browser contol working sweetly. However when it hits a page with java script that has an error it throws up a dialog box each time prompting me "Do you wish to continue running...
0
by: John Constant | last post by:
Using the MS C++ Example http://support.microsoft.com/default.aspx?scid=kb;en-us;261003 I've successfully managed to trap and log Script Errors that are generated by the WebBrowser (IE 6) which is...
0
by: William D. Sossamon | last post by:
http://support.microsoft.com/default.aspx?kbid=818803 1.. You add the following form to a user control: <form id="Form1" name="Form1" runat="server">2.. You add the user control, for example,...
3
by: Guy Debord | last post by:
Hello all, I know that this is a long shot, but I have a problem which someone reading this group *may* just be able to shed some light on. We have a new internal personnel planner/attendance...
1
by: key3210 | last post by:
Hi there. Totally new to all this. First Timer. Well here is the problem. I recently started having script error messages popping up on my computer as soon as i log on, and it does not matter if I...
6
by: swethak | last post by:
Hi, I displayed the image taken from database.How to raotate that image using javascript.plz tell that how to start the logic.plz tell that some reference websites.
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
1
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
0
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.