473,735 Members | 2,945 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

xml parsing script dying with "Premature end of script headers" error

I have been using an xml parsing script to parse a number of rss feeds
and return relevant results to a database. The script has worked well
for a couple of years, despite having very crude error-trapping (if it
finds an error in one of the xml files, the script stops). Recently, the
script has stopped working because one of the xml files is badly formed.

So I decided to rewrite the script with better error trapping; the
script should continue with the well-formed xml files and send me an
email telling me what happened.

The prototype script is failing with a "Premature end of script headers"
error. I am trying to work out if:

- this is a problem with my script, or
- a problem with the web server configuration

I have been over the code with as close as I have to a fine toothcomb,
and I can't see anything which would cause a problem.

Here is my code:

<?php

# code to parse multiple RSS .xml files, identify stories with keywords
in them, and enter those stories into DB

##### the first section of code is unchanged from the previous (working)
version

# SELECT, INSERT user privs for this page
$privs = "insert";

# create list of RSS feeds to parse
$feedsource = array(
"http://news.bbc.co.uk/rss/newsonline_uk_e dition/uk/rss091.xml",
"http://newsrss.bbc.co. uk/rss/newsonline_uk_e dition/england/rss.xml",
"http://newsrss.bbc.co. uk/rss/newsonline_uk_e dition/scotland/rss.xml",
"http://newsrss.bbc.co. uk/rss/newsonline_uk_e dition/wales/rss.xml",
"http://newsrss.bbc.co. uk/rss/newsonline_uk_e dition/northern_irelan d/rss.xml"
);

# these provide db connection and various query functions used below
include("../includes/config.inc");
include("../includes/sql.inc");

$insideitem = FALSE;
$tag = "";
$title = "";
$description = "";
$textdump = "";
$link = "";
$itemcount = FALSE;
$body1 = "";

function startElement($p arser, $name, $attrs) {
global $insideitem, $tag, $title, $description, $link;
if ($insideitem) {
$tag = $name;
} elseif ($name == "ITEM") {
$insideitem = TRUE;
}
}

function endElement($par ser, $name) {
global $insideitem, $tag, $title, $description, $link, $keywords, $feed;
$numkeywords = count($keywords );
$duplicate = FALSE;

if ($name == "ITEM") {
for($counter=0; $counter < $numkeywords; $counter++) {

# create regex which matches a whole word anywhere in a string
$regex = "/\b(" . $keywords[$counter] . ")\b/";

if(preg_match($ regex, $title) || preg_match($reg ex, $description)) {
# if title or description string of parsed story matches the word
# get all news stories from db
$result = getTotalNews();
while ($row = mysql_fetch_arr ay($result)) {
# loop through each existing news story
if($row[txtLink] == trim($link)) {
# if new link matches existing link, flag as duplicate
$duplicate = TRUE;
}
}
if($duplicate == FALSE) {
$itemcount = TRUE;
$datetime = date("Y-m-d H:i:s");
$title = trim(str_replac e("'", "\'", $title));
$description = trim(str_replac e("'", "\'", $description));
$link = trim($link);
$result = insertNews($tit le, $description, $link, $feed, $datetime);
$body1 .= "Item added: ";
$body1 .= $title;
$body1 .= " (link: ";
$body1 .= $link;
$body1 .= ") - ";
$body1 .= $description;
$body1 .= "\n\n";
mail("in*****@i nvalid.co.uk", "News Item Added: " . $title, $body1,
"FROM: ne******@railwa ysarchive.co.uk");
} else {
$duplicate = FALSE;
}
}
}
$title = "";
$description = "";
$link = "";
$insideitem = FALSE;
}
}

function characterData($ parser, $data) {
global $insideitem, $tag, $title, $description, $link;
if ($insideitem) {
switch ($tag) {
case "TITLE":
$title .= $data;
break;
case "DESCRIPTIO N":
$description .= $data;

break;
case "LINK":
$link .= $data;
break;
}
}
}

##### from here onwards the script has been rewritten

# initialise feed counter
$count = 0;
$passed = TRUE;
$body = "RSS parse results:\n";

foreach ($feedsource as $feed) {
# loop through each RSS file in turn
$xml_parser = xml_parser_crea te();
xml_set_element _handler($xml_p arser, "startEleme nt", "endElement ");
xml_set_charact er_data_handler ($xml_parser, "characterData" );

if(fopen("$feed ", "r")) {
# if file can be opened

$fp = fopen("$feed", "r");
$body .= "Success opening " . $feed . "\n";

while ($data = fread($fp, 4096)) {
# loop through feed contents

if(xml_parse($x ml_parser, $data, feof($fp))) {
# success

$body .= "Success parsing " . $feed . "\n";

} else {
# fail

$body .= "Failed to parse " . $feed . ": XML error " .
xml_error_strin g(xml_get_error _code($xml_pars er)) . " at line " .
xml_get_current _line_number($x ml_parser) . "\n";
$passed = FALSE;

}

}

} else {
# failed to open file

$body .= "Failed to open " . $feed . "\n";
$passed = FALSE;

}

# close file
fclose($fp);
# free up xml parser
xml_parser_free ($xml_parser);

}

if($passed) {
# if no errors
$passText = "no errors";
} else {
$passText = "ERRORS";
}

$subject = "Newsfeed report: " . $passText . " at " . date("d-m-Y G:i");

$to = "in*****@invali d.com";

mail($to, $subject, $body);

?>
Oct 26 '08 #1
3 4512
GazK wrote:
I have been using an xml parsing script to parse a number of rss feeds
and return relevant results to a database. The script has worked well
for a couple of years, despite having very crude error-trapping (if it
finds an error in one of the xml files, the script stops). Recently, the
script has stopped working because one of the xml files is badly formed.

So I decided to rewrite the script with better error trapping; the
script should continue with the well-formed xml files and send me an
email telling me what happened.

The prototype script is failing with a "Premature end of script headers"
error. I am trying to work out if:

- this is a problem with my script, or
- a problem with the web server configuration

I have been over the code with as close as I have to a fine toothcomb,
and I can't see anything which would cause a problem.

Here is my code:

<?php
[snipped some function and variable declarations]
##### from here onwards the script has been rewritten

# initialise feed counter
$count = 0;
$passed = TRUE;
$body = "RSS parse results:\n";

foreach ($feedsource as $feed) {
# loop through each RSS file in turn
$xml_parser = xml_parser_crea te();
xml_set_element _handler($xml_p arser, "startEleme nt", "endElement ");
xml_set_charact er_data_handler ($xml_parser, "characterData" );

if(fopen("$feed ", "r")) {
# if file can be opened

$fp = fopen("$feed", "r");
First of all, this is not a good way to test if fopen() succeeded.

if ($fp = fopen($feed, 'r')) {
...
}

Also, doing "$var" is a bad habit, because you may run into some
unexpected typing troubles:

var_dump("$int" ); // string
var_dump($int); // integer
$body .= "Success opening " . $feed . "\n";

while ($data = fread($fp, 4096)) {
# loop through feed contents
Here's where your problem probably lies. You should not parse your RSS
data until you're finished collecting all the data. What happens when
your RSS data exceeds the buffer? The answer is that the
while-statement will start another iteration to get more data,
continuing in this manner until EOF is reached. This will cause
xml_parse() and other xml functions to attempt to operate on the
incomplete RSS feed.

Instead, use file_get_conten ts(), and eliminate the loop entirely.

Here's a small scale example of what *might* be happening to you, with
your current approach:

<?php
$rss = <<<EORSS
<?xml version="1.0"?>
<rss version="2.0">
<foo>
<bar>baz</bar>
</foo>
</rss>
EORSS;
$file = 'fread_test.rss ';
$bufferTooSmall = ceil(strlen($rs s) / 2);

// write the data - error checking removed for brevity
file_put_conten ts($file, $rss);

if ($fp = fopen($file, 'r')) {
$i = 1;
while ($data = fread($fp, $bufferTooSmall )) {
echo "Iteration $i:\n$data\n\n" ;
$i++;
}
fclose($fp);
}
?>
if(xml_parse($x ml_parser, $data, feof($fp))) {
# success

$body .= "Success parsing " . $feed . "\n";

} else {
# fail

$body .= "Failed to parse " . $feed . ": XML error " .
xml_error_strin g(xml_get_error _code($xml_pars er)) . " at line " .
xml_get_current _line_number($x ml_parser) . "\n";
$passed = FALSE;

}

}

} else {
# failed to open file

$body .= "Failed to open " . $feed . "\n";
$passed = FALSE;

}

# close file
fclose($fp);
# free up xml parser
xml_parser_free ($xml_parser);

}

if($passed) {
# if no errors
$passText = "no errors";
} else {
$passText = "ERRORS";
}

$subject = "Newsfeed report: " . $passText . " at " . date("d-m-Y G:i");

$to = "in*****@invali d.com";

mail($to, $subject, $body);

?>
--
Curtis
$eMail = str_replace('si g.invalid', 'gmail.com', $from);
Oct 27 '08 #2
Curtis wrote:
GazK wrote:
>I have been using an xml parsing script to parse a number of rss feeds
and return relevant results to a database. The script has worked well
for a couple of years, despite having very crude error-trapping (if it
finds an error in one of the xml files, the script stops). Recently,
the script has stopped working because one of the xml files is badly
formed.

So I decided to rewrite the script with better error trapping; the
script should continue with the well-formed xml files and send me an
email telling me what happened.

The prototype script is failing with a "Premature end of script
headers" error. I am trying to work out if:

- this is a problem with my script, or
- a problem with the web server configuration

I have been over the code with as close as I have to a fine toothcomb,
and I can't see anything which would cause a problem.

Here is my code:

<?php

[snipped some function and variable declarations]
>##### from here onwards the script has been rewritten

# initialise feed counter
$count = 0;
$passed = TRUE;
$body = "RSS parse results:\n";

foreach ($feedsource as $feed) {
# loop through each RSS file in turn
$xml_parser = xml_parser_crea te();
xml_set_element _handler($xml_p arser, "startEleme nt", "endElement ");
xml_set_charact er_data_handler ($xml_parser, "characterData" );
if(fopen("$feed ", "r")) {
# if file can be opened
$fp = fopen("$feed", "r");

First of all, this is not a good way to test if fopen() succeeded.

if ($fp = fopen($feed, 'r')) {
...
}

Also, doing "$var" is a bad habit, because you may run into some
unexpected typing troubles:

var_dump("$int" ); // string
var_dump($int); // integer
> $body .= "Success opening " . $feed . "\n";
while ($data = fread($fp, 4096)) {
# loop through feed contents

Here's where your problem probably lies. You should not parse your RSS
data until you're finished collecting all the data. What happens when
your RSS data exceeds the buffer? The answer is that the while-statement
will start another iteration to get more data, continuing in this manner
until EOF is reached. This will cause xml_parse() and other xml
functions to attempt to operate on the incomplete RSS feed.

Instead, use file_get_conten ts(), and eliminate the loop entirely.

Here's a small scale example of what *might* be happening to you, with
your current approach:

<?php
$rss = <<<EORSS
<?xml version="1.0"?>
<rss version="2.0">
<foo>
<bar>baz</bar>
</foo>
</rss>
EORSS;
$file = 'fread_test.rss ';
$bufferTooSmall = ceil(strlen($rs s) / 2);

// write the data - error checking removed for brevity
file_put_conten ts($file, $rss);

if ($fp = fopen($file, 'r')) {
$i = 1;
while ($data = fread($fp, $bufferTooSmall )) {
echo "Iteration $i:\n$data\n\n" ;
$i++;
}
fclose($fp);
}
?>
> if(xml_parse($x ml_parser, $data, feof($fp))) {
# success
$body .= "Success parsing " . $feed . "\n";
} else {
# fail
$body .= "Failed to parse " . $feed . ":
XML error " . xml_error_strin g(xml_get_error _code($xml_pars er)) . " at
line " . xml_get_current _line_number($x ml_parser) . "\n";
$passed = FALSE;
}
}

} else {
# failed to open file
$body .= "Failed to open " . $feed . "\n";
$passed = FALSE;
}

# close file
fclose($fp);
# free up xml parser
xml_parser_free ($xml_parser);
}

if($passed) {
# if no errors
$passText = "no errors";
} else {
$passText = "ERRORS";
}

$subject = "Newsfeed report: " . $passText . " at " . date("d-m-Y G:i");

$to = "in*****@invali d.com";

mail($to, $subject, $body);

?>
Curtis, thanks for the assistance. I will give the file_get_conten ts()
approach a go - it looks much simpler in any case.

Garry
Oct 27 '08 #3
GazK wrote:
Curtis wrote:
>GazK wrote:
>>I have been using an xml parsing script to parse a number of rss
feeds and return relevant results to a database. The script has
worked well for a couple of years, despite having very crude
error-trapping (if it finds an error in one of the xml files, the
script stops). Recently, the script has stopped working because one
of the xml files is badly formed.

So I decided to rewrite the script with better error trapping; the
script should continue with the well-formed xml files and send me an
email telling me what happened.

The prototype script is failing with a "Premature end of script
headers" error. I am trying to work out if:

- this is a problem with my script, or
- a problem with the web server configuration

I have been over the code with as close as I have to a fine
toothcomb, and I can't see anything which would cause a problem.

Here is my code:

<?php

[snipped some function and variable declarations]
>>##### from here onwards the script has been rewritten

# initialise feed counter
$count = 0;
$passed = TRUE;
$body = "RSS parse results:\n";

foreach ($feedsource as $feed) {
# loop through each RSS file in turn
$xml_parser = xml_parser_crea te();
xml_set_element _handler($xml_p arser, "startEleme nt", "endElement ");
xml_set_charact er_data_handler ($xml_parser, "characterData" );
if(fopen("$feed ", "r")) {
# if file can be opened
$fp = fopen("$feed", "r");

First of all, this is not a good way to test if fopen() succeeded.

if ($fp = fopen($feed, 'r')) {
...
}

Also, doing "$var" is a bad habit, because you may run into some
unexpected typing troubles:

var_dump("$int "); // string
var_dump($int) ; // integer
>> $body .= "Success opening " . $feed . "\n";
while ($data = fread($fp, 4096)) {
# loop through feed contents

Here's where your problem probably lies. You should not parse your RSS
data until you're finished collecting all the data. What happens when
your RSS data exceeds the buffer? The answer is that the
while-statement will start another iteration to get more data,
continuing in this manner until EOF is reached. This will cause
xml_parse() and other xml functions to attempt to operate on the
incomplete RSS feed.

Instead, use file_get_conten ts(), and eliminate the loop entirely.

Here's a small scale example of what *might* be happening to you, with
your current approach:

<?php
$rss = <<<EORSS
<?xml version="1.0"?>
<rss version="2.0">
<foo>
<bar>baz</bar>
</foo>
</rss>
EORSS;
$file = 'fread_test.rss ';
$bufferTooSmal l = ceil(strlen($rs s) / 2);

// write the data - error checking removed for brevity
file_put_conte nts($file, $rss);

if ($fp = fopen($file, 'r')) {
$i = 1;
while ($data = fread($fp, $bufferTooSmall )) {
echo "Iteration $i:\n$data\n\n" ;
$i++;
}
fclose($fp);
}
?>
>> if(xml_parse($x ml_parser, $data, feof($fp))) {
# success
$body .= "Success parsing " . $feed . "\n";
} else {
# fail
$body .= "Failed to parse " . $feed . ":
XML error " . xml_error_strin g(xml_get_error _code($xml_pars er)) . "
at line " . xml_get_current _line_number($x ml_parser) . "\n";
$passed = FALSE;
}
}

} else {
# failed to open file
$body .= "Failed to open " . $feed . "\n";
$passed = FALSE;
}

# close file
fclose($fp);
# free up xml parser
xml_parser_free ($xml_parser);
}

if($passed) {
# if no errors
$passText = "no errors";
} else {
$passText = "ERRORS";
}

$subject = "Newsfeed report: " . $passText . " at " . date("d-m-Y G:i");

$to = "in*****@invali d.com";

mail($to, $subject, $body);

?>

Curtis, thanks for the assistance. I will give the file_get_conten ts()
approach a go - it looks much simpler in any case.

Garry
Update - script is now morking much more reliably. Old script has been
binned. Thanks!
Oct 29 '08 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
7605
by: Seagull Manager | last post by:
Running Apache 1.3, PHP 4.3, and WinXP, configured acc. to instructions on apache manual and php manual (as far as I can see), but getting "internal server error" in browser... log says "Premature end of script headers" for my simple test scripts PHP runs fine from the command line, incidentally I tried changing the doc_root in php.ini from blank to the path of htdocs -- no difference.
1
7279
by: BKM | last post by:
I'm using a VB6 WebBrowser control to get info from various web pages and, occasionally, my program stops when it finds a script error on the page. It won't resume until I click 'Yes' or 'No' on the script error message box. However, if I open an IE6 browser outside the program and manually go to the same page I don't get the Script Error message box. I do get a yellow exclamation mark symbol with the message "Done, but with errors on the...
1
1530
by: Jatinder Singh | last post by:
I am running a CGI Programme. which is throwing Premature script error for some inputs. I have checked and couldn't fig out the problem. Even error log is empty. Can anybody help me out of this or can I use try except to catch the Error and how? plz get back soon .Its urgent -- Regards,
5
2744
by: Tam Inglis | last post by:
I have a web browser contol working sweetly. However when it hits a page with java script that has an error it throws up a dialog box each time prompting me "Do you wish to continue running scripts on this page" How can i supress this message... its gotta be something simple and ive been hunting for the answer for ages... any help appreciated. Tam
0
3580
by: John Constant | last post by:
Using the MS C++ Example http://support.microsoft.com/default.aspx?scid=kb;en-us;261003 I've successfully managed to trap and log Script Errors that are generated by the WebBrowser (IE 6) which is hosted by a C# application, see below. However I've completely failed to force the browser to continue processing script which *should* be controlled by returning a true VT_BOOL via pvaOut. I've tried returning bool, System.Boolean even...
0
1040
by: William D. Sossamon | last post by:
http://support.microsoft.com/default.aspx?kbid=818803 1.. You add the following form to a user control: <form id="Form1" name="Form1" runat="server">2.. You add the user control, for example, Workspace1, to a page. When you do this, an auto-generated script that is similar to the following script is added to the page: <form name="Workspace1:Form1" method="post" action="formtest2.aspx" id="Workspace1_Form1">The script error occurs because...
3
2615
by: Guy Debord | last post by:
Hello all, I know that this is a long shot, but I have a problem which someone reading this group *may* just be able to shed some light on. We have a new internal personnel planner/attendance system in place. It uses a web interface to allow members of staff to select their site location for any week, request leave and record absences. The server-side scripting is composed of VB/ASP and Javascript which
1
1179
by: key3210 | last post by:
Hi there. Totally new to all this. First Timer. Well here is the problem. I recently started having script error messages popping up on my computer as soon as i log on, and it does not matter if I click yes or no, it will not go away! So I have to "x" out of it numerous times in order for it to finally vanish. However, another of the same bundle of script error messages appears in the corner of the screen, and i have to go through the same...
6
3227
by: swethak | last post by:
Hi, I displayed the image taken from database.How to raotate that image using javascript.plz tell that how to start the logic.plz tell that some reference websites.
0
8959
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8784
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9462
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9326
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8199
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6049
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4821
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2738
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2187
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.