473,803 Members | 2,909 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

read and parsing an XML file using the content

Hi all,

I've to read and parse an XML file to save the datas in a database.

Unfortunately it appens that the datas are wrong. I mean it seems they are
not well readed...someti mes only part of it it's ok.

It's there any way to read an XML file sequentially using the content
instead of a $data = fread($fp, 4096); wich may be the source of the problem
?

I get wrong datas when the files are big. When they are little I seem not to
have problems.

Also I've memory limit and also execution time limit.

Thanks for helping.

Bob
Mar 20 '08 #1
8 2194
If you're parsing chunks of data at a time, it's possible your block of
4096 bytes is breaking in the middle of a tag. In fact, much more than
possible - it's be highly unlikely it would end right at a 4096 byte
boundary.
So how to fix it ? I can't read the entire file due to memory limit of my
server (I can't change server)
Mar 27 '08 #2
Bob Bedford wrote:
>If you're parsing chunks of data at a time, it's possible your block of
4096 bytes is breaking in the middle of a tag. In fact, much more than
possible - it's be highly unlikely it would end right at a 4096 byte
boundary.

So how to fix it ? I can't read the entire file due to memory limit of my
server (I can't change server)
How big is your xml file?

If you can't read the entire file in, and can't change hosts, you'll
have to parse it manually. That will be a lot of work.

Why can't you change servers?

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===

Mar 27 '08 #3
How big is your xml file?
around 2-3MB. I've created a php script that split bigger files in chunk of
2-3MB. I've already checked the splitted files and they are OK.

Also I can't change for further reasons, but mainly as the service offered
is quite the same as a dedicated server (50GB space, unlimited emails,
unlimited traffic but with execution time and memory limit only) and I can't
move to a dedicated server since I know nothing on managing such dedicated
or virtual server and I've no time to learn and manage it. Also it will be a
problem since security is very important for me and that's well done by the
actual ISP.

Anyway the problem is that I treat the xml content when I reach the closing
tag and it seems to treat the content even if it doesn't reach this closing
tag....like if it arrives at the end of file and no reaching the end tag, it
executes the code anyway....it is the case or the end tag must absolutely be
reached ?

Thanks for your help.

Bob
>
If you can't read the entire file in, and can't change hosts, you'll have
to parse it manually. That will be a lot of work.

Why can't you change servers?

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===


Mar 27 '08 #4
Bob Bedford wrote:
>How big is your xml file?
around 2-3MB. I've created a php script that split bigger files in chunk of
2-3MB. I've already checked the splitted files and they are OK.

Also I can't change for further reasons, but mainly as the service offered
is quite the same as a dedicated server (50GB space, unlimited emails,
unlimited traffic but with execution time and memory limit only) and I can't
move to a dedicated server since I know nothing on managing such dedicated
or virtual server and I've no time to learn and manage it. Also it will be a
problem since security is very important for me and that's well done by the
actual ISP.
I would still look for another hosting company. There are others around
with high limits, and ones which will allow for larger memory limits.
Or, go to a managed dedicated server or vps.
Anyway the problem is that I treat the xml content when I reach the closing
tag and it seems to treat the content even if it doesn't reach this closing
tag....like if it arrives at the end of file and no reaching the end tag, it
executes the code anyway....it is the case or the end tag must absolutely be
reached ?

Thanks for your help.

Bob
If the file is incomplete, the parser will consider it as malformed xml.
It will do it's best with the xml, but results probably will not be
what you want.

So you're left with handling the file on another system or parsing the
file with your own code.

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===

Mar 27 '08 #5
Bob Bedford wrote:
So how to fix it ? I can't read the entire file due to memory limit of
my server (I can't change server)
How are you parsing the file?

Try a stream-based XML parser like expat <http://www.php.net/xml>. This
reads the XML file as a stream of tags and content, calling your functions
when it encounters things that you're interested in. It shouldn't use up
as much memory as DOM-like XML parsers, which read the whole file into RAM
before they parse it.

The first example at the following URL is a good starting point:

http://uk.php.net/manual/en/function.xml-set-object.php

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 1 day, 39 min.]

Best... News... Story... Ever!
http://tobyinkster.co.uk/blog/2008/03/23/hypnotist/
Mar 27 '08 #6
Toby A Inkster wrote:
Bob Bedford wrote:
>So how to fix it ? I can't read the entire file due to memory limit of
my server (I can't change server)

How are you parsing the file?

Try a stream-based XML parser like expat <http://www.php.net/xml>. This
reads the XML file as a stream of tags and content, calling your functions
when it encounters things that you're interested in. It shouldn't use up
as much memory as DOM-like XML parsers, which read the whole file into RAM
before they parse it.

The first example at the following URL is a good starting point:

http://uk.php.net/manual/en/function.xml-set-object.php
Hi, Toby,

Have you found it saves much memory? In my experience the difference
isn't all that much. It looks like the xml parser caches a lot of the
file in memory.

Or maybe it was just the structure of the xml files I was using which
caused the extra memory usage.

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===

Mar 27 '08 #7

"Toby A Inkster" <us**********@t obyinkster.co.u ka écrit dans le message de
news: dr************@ ophelia.g5n.co. uk...
Bob Bedford wrote:
>So how to fix it ? I can't read the entire file due to memory limit of
my server (I can't change server)

How are you parsing the file?

Try a stream-based XML parser like expat <http://www.php.net/xml>. This
reads the XML file as a stream of tags and content, calling your functions
when it encounters things that you're interested in. It shouldn't use up
as much memory as DOM-like XML parsers, which read the whole file into RAM
before they parse it.

The first example at the following URL is a good starting point:

http://uk.php.net/manual/en/function.xml-set-object.php
Hi Toby,

the XML parser code I use:

$xml_parser = xml_parser_crea te();
xml_set_element _handler($xml_p arser, "startEleme nt", "endElement ");
xml_set_charact er_data_handler ($xml_parser, "FDataHandler") ;

while (!feof($fp)){
$data = fread($fp, 4096);
if (!xml_parse($xm l_parser, $data, feof($fp))) {
die(sprintf("XM L error: %s at line %d",
xml_error_strin g(xml_get_error _code($xml_pars er)),
xml_get_current _line_number($x ml_parser)));
}
}
fclose($fp);
xml_parser_free ($xml_parser);

It's like it parse and executes then endElement function when it reads the
4096 bytes and doesn't find the closing tags.

Bob
Mar 27 '08 #8
Bob Bedford wrote:
while (!feof($fp)){
$data = fread($fp, 4096);
if (!xml_parse($xm l_parser, $data, feof($fp))) {
die(sprintf("XM L error: %s at line %d",
xml_error_strin g(xml_get_error _code($xml_pars er)),
xml_get_current _line_number($x ml_parser)));
}
}
Basic idea is to catch the failure of xml_parse(), and if !feof() then
read another few bytes from the file, append them to $data and then try
parsing again.

Untested code:

while (!feof($fp))
{
$data = fread($fp, 4096);
if (!xml_parse($xm l_parser, $data, feof($fp)))
{
if (feof($fp))
die(sprintf("XM L error: %s at line %d",
xml_error_strin g(xml_get_error _code($xml_pars er)),
xml_get_current _line_number($x ml_parser)));

while (!feof($fp) && !xml_parse($xml _parser, $data, feof($fp)))
$data .= fread($fp, 10);
}
}
--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 1 day, 2:11.]

Best... News... Story... Ever!
http://tobyinkster.co.uk/blog/2008/03/23/hypnotist/
Mar 27 '08 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
3072
by: Girish | last post by:
Hi All, I have written a component(ATL COM) that wraps Xerces C++ parser. I am firing necessary events for each of the notifications that I have handled for the Content and Error handler. The events can then I am able to parse XML input in the form of files. I also have provided support for parsing of XML content in the form of string data. I am able to do so by creating a MemBufInputSource object using the XML content provided to the...
2
2140
by: cricfan | last post by:
I'm parsing a text file to extract word definitions. For example the input text file contains the following content: di.va.gate \'di_--v*-.ga_-t\ vb pas.sim \'pas-*m\ adv : here and there : THROUGHOUT I am trying to obtain words between two literal backslashes (\ .. \). I am not able to match words between two literal backslashes using the regxp - re.compile(r'\\*\\').
4
1864
by: pisscot | last post by:
Lind Apr 13, 6:32 am show options Newsgroups: comp.lang.c From: piss...@gmail.com (Lind) - Find messages by this author Date: 13 Apr 2005 06:32:15 -0700 Local: Wed,Apr 13 2005 6:32 am Subject: how to read it out using c++ Reply | Reply to Author | Forward | Print | Individual Message | Show original | Remove | Report Abuse
2
14600
by: Mark | last post by:
Hi there, I have two xml files, one is a master file and the other is just a fragment of xml. Master xml file uses 'DOCTYPE' to define the other file as an entity. Then, the master uses entity references that are supposed to be expanded into real content at parsing time. Examples are provided below. When I open master xml file in InternetExplorer , IE shows correct content. All the entities are transformed into right xml. So far I have...
8
1543
by: H | last post by:
Now, I'm here with another newbie question .... I want to read a text file, string by string (to do some things with some words etc etc), but I can't seem to find a way to do this String by String. Is there anyway, like String s = something.ReadString() ? Or what may be a fine way to do this ? Only thing I can some up with is to read 1 char at a time, and look if the next char is a space-sign, and that way "make" the Strings myself....
1
1936
by: Mark | last post by:
Hi there, I have two xml files, one is a master file and the other is just a fragment of xml. Master xml file uses 'DOCTYPE' to define the other file as an entity. Then, the master uses entity references that are supposed to be expanded into real content at parsing time. Examples are provided below. When I open master xml file in InternetExplorer , IE shows correct content. All the entities are transformed into right xml. So far I have...
5
5248
by: PulsarSL | last post by:
Hey everyone, What function can I use to grab a .csv file from a remote webserver for use in my PHP script. I want to do some parsing and statistical analysis on it (server side). What PHP function is used for this? I can't seem to find it in Google or the online PHP manual, because I don't really know what keywords to use.
5
5519
by: bmichel | last post by:
Hey, What I'm doing is the following: - Load XML data a file - Parsing the XML data - Printing some parsed content The problem is that the script execution is stopping before all the content is parsed and printed. Maybe the PHP is out of memory after a while. That would make sense
1
4856
by: avpkills2002 | last post by:
I seem to be getting this weird problem in Internet explorer. I have written a code for parsing a XML file and displaying the output. The code works perfectly fine with ffx(Firefox).However is not working in Internet Explorer.(I m using Internet Explorer 6.0). The code is as follows: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http:// www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html...
0
10309
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10289
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9119
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7600
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5496
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5625
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4274
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3795
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2968
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.