473,398 Members | 2,404 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,398 software developers and data experts.

read and parsing an XML file using the content

Hi all,

I've to read and parse an XML file to save the datas in a database.

Unfortunately it appens that the datas are wrong. I mean it seems they are
not well readed...sometimes only part of it it's ok.

It's there any way to read an XML file sequentially using the content
instead of a $data = fread($fp, 4096); wich may be the source of the problem
?

I get wrong datas when the files are big. When they are little I seem not to
have problems.

Also I've memory limit and also execution time limit.

Thanks for helping.

Bob
Mar 20 '08 #1
8 2163
If you're parsing chunks of data at a time, it's possible your block of
4096 bytes is breaking in the middle of a tag. In fact, much more than
possible - it's be highly unlikely it would end right at a 4096 byte
boundary.
So how to fix it ? I can't read the entire file due to memory limit of my
server (I can't change server)
Mar 27 '08 #2
Bob Bedford wrote:
>If you're parsing chunks of data at a time, it's possible your block of
4096 bytes is breaking in the middle of a tag. In fact, much more than
possible - it's be highly unlikely it would end right at a 4096 byte
boundary.

So how to fix it ? I can't read the entire file due to memory limit of my
server (I can't change server)
How big is your xml file?

If you can't read the entire file in, and can't change hosts, you'll
have to parse it manually. That will be a lot of work.

Why can't you change servers?

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Mar 27 '08 #3
How big is your xml file?
around 2-3MB. I've created a php script that split bigger files in chunk of
2-3MB. I've already checked the splitted files and they are OK.

Also I can't change for further reasons, but mainly as the service offered
is quite the same as a dedicated server (50GB space, unlimited emails,
unlimited traffic but with execution time and memory limit only) and I can't
move to a dedicated server since I know nothing on managing such dedicated
or virtual server and I've no time to learn and manage it. Also it will be a
problem since security is very important for me and that's well done by the
actual ISP.

Anyway the problem is that I treat the xml content when I reach the closing
tag and it seems to treat the content even if it doesn't reach this closing
tag....like if it arrives at the end of file and no reaching the end tag, it
executes the code anyway....it is the case or the end tag must absolutely be
reached ?

Thanks for your help.

Bob
>
If you can't read the entire file in, and can't change hosts, you'll have
to parse it manually. That will be a lot of work.

Why can't you change servers?

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================


Mar 27 '08 #4
Bob Bedford wrote:
>How big is your xml file?
around 2-3MB. I've created a php script that split bigger files in chunk of
2-3MB. I've already checked the splitted files and they are OK.

Also I can't change for further reasons, but mainly as the service offered
is quite the same as a dedicated server (50GB space, unlimited emails,
unlimited traffic but with execution time and memory limit only) and I can't
move to a dedicated server since I know nothing on managing such dedicated
or virtual server and I've no time to learn and manage it. Also it will be a
problem since security is very important for me and that's well done by the
actual ISP.
I would still look for another hosting company. There are others around
with high limits, and ones which will allow for larger memory limits.
Or, go to a managed dedicated server or vps.
Anyway the problem is that I treat the xml content when I reach the closing
tag and it seems to treat the content even if it doesn't reach this closing
tag....like if it arrives at the end of file and no reaching the end tag, it
executes the code anyway....it is the case or the end tag must absolutely be
reached ?

Thanks for your help.

Bob
If the file is incomplete, the parser will consider it as malformed xml.
It will do it's best with the xml, but results probably will not be
what you want.

So you're left with handling the file on another system or parsing the
file with your own code.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Mar 27 '08 #5
Bob Bedford wrote:
So how to fix it ? I can't read the entire file due to memory limit of
my server (I can't change server)
How are you parsing the file?

Try a stream-based XML parser like expat <http://www.php.net/xml>. This
reads the XML file as a stream of tags and content, calling your functions
when it encounters things that you're interested in. It shouldn't use up
as much memory as DOM-like XML parsers, which read the whole file into RAM
before they parse it.

The first example at the following URL is a good starting point:

http://uk.php.net/manual/en/function.xml-set-object.php

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 1 day, 39 min.]

Best... News... Story... Ever!
http://tobyinkster.co.uk/blog/2008/03/23/hypnotist/
Mar 27 '08 #6
Toby A Inkster wrote:
Bob Bedford wrote:
>So how to fix it ? I can't read the entire file due to memory limit of
my server (I can't change server)

How are you parsing the file?

Try a stream-based XML parser like expat <http://www.php.net/xml>. This
reads the XML file as a stream of tags and content, calling your functions
when it encounters things that you're interested in. It shouldn't use up
as much memory as DOM-like XML parsers, which read the whole file into RAM
before they parse it.

The first example at the following URL is a good starting point:

http://uk.php.net/manual/en/function.xml-set-object.php
Hi, Toby,

Have you found it saves much memory? In my experience the difference
isn't all that much. It looks like the xml parser caches a lot of the
file in memory.

Or maybe it was just the structure of the xml files I was using which
caused the extra memory usage.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Mar 27 '08 #7

"Toby A Inkster" <us**********@tobyinkster.co.uka écrit dans le message de
news: dr************@ophelia.g5n.co.uk...
Bob Bedford wrote:
>So how to fix it ? I can't read the entire file due to memory limit of
my server (I can't change server)

How are you parsing the file?

Try a stream-based XML parser like expat <http://www.php.net/xml>. This
reads the XML file as a stream of tags and content, calling your functions
when it encounters things that you're interested in. It shouldn't use up
as much memory as DOM-like XML parsers, which read the whole file into RAM
before they parse it.

The first example at the following URL is a good starting point:

http://uk.php.net/manual/en/function.xml-set-object.php
Hi Toby,

the XML parser code I use:

$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "FDataHandler");

while (!feof($fp)){
$data = fread($fp, 4096);
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
fclose($fp);
xml_parser_free($xml_parser);

It's like it parse and executes then endElement function when it reads the
4096 bytes and doesn't find the closing tags.

Bob
Mar 27 '08 #8
Bob Bedford wrote:
while (!feof($fp)){
$data = fread($fp, 4096);
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
Basic idea is to catch the failure of xml_parse(), and if !feof() then
read another few bytes from the file, append them to $data and then try
parsing again.

Untested code:

while (!feof($fp))
{
$data = fread($fp, 4096);
if (!xml_parse($xml_parser, $data, feof($fp)))
{
if (feof($fp))
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));

while (!feof($fp) && !xml_parse($xml_parser, $data, feof($fp)))
$data .= fread($fp, 10);
}
}
--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 1 day, 2:11.]

Best... News... Story... Ever!
http://tobyinkster.co.uk/blog/2008/03/23/hypnotist/
Mar 27 '08 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Girish | last post by:
Hi All, I have written a component(ATL COM) that wraps Xerces C++ parser. I am firing necessary events for each of the notifications that I have handled for the Content and Error handler. The...
2
by: cricfan | last post by:
I'm parsing a text file to extract word definitions. For example the input text file contains the following content: di.va.gate \'di_--v*-.ga_-t\ vb pas.sim \'pas-*m\ adv : here and there :...
4
by: pisscot | last post by:
Lind Apr 13, 6:32 am show options Newsgroups: comp.lang.c From: piss...@gmail.com (Lind) - Find messages by this author Date: 13 Apr 2005 06:32:15 -0700 Local: Wed,Apr 13 2005 6:32 am...
2
by: Mark | last post by:
Hi there, I have two xml files, one is a master file and the other is just a fragment of xml. Master xml file uses 'DOCTYPE' to define the other file as an entity. Then, the master uses entity...
8
by: H | last post by:
Now, I'm here with another newbie question .... I want to read a text file, string by string (to do some things with some words etc etc), but I can't seem to find a way to do this String by...
1
by: Mark | last post by:
Hi there, I have two xml files, one is a master file and the other is just a fragment of xml. Master xml file uses 'DOCTYPE' to define the other file as an entity. Then, the master uses entity...
5
by: PulsarSL | last post by:
Hey everyone, What function can I use to grab a .csv file from a remote webserver for use in my PHP script. I want to do some parsing and statistical analysis on it (server side). What PHP...
5
by: bmichel | last post by:
Hey, What I'm doing is the following: - Load XML data a file - Parsing the XML data - Printing some parsed content The problem is that the script execution is stopping before all the...
1
by: avpkills2002 | last post by:
I seem to be getting this weird problem in Internet explorer. I have written a code for parsing a XML file and displaying the output. The code works perfectly fine with ffx(Firefox).However is not...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.