473,386 Members | 1,799 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Parsing Large XML files

How can I parse a large XML file that is to large for memory? I am
currently using php 5.0.3 and the libxml parser, I would like to read
it incrementally from a file, but the parser gets the entire contents
from as String?

Jul 17 '05 #1
6 3886
doug wrote:
How can I parse a large XML file that is to large for memory? I am
currently using php 5.0.3 and the libxml parser, I would like to read
it incrementally from a file, but the parser gets the entire contents
from as String?


The standard XML solution to this problem is to use a SAX parser instead of
a DOM one. However, there doesn't seem to be a SAX parser in the PHP XML
library. One solution appears to be:

http://www.engageinteractive.com/mam...tent&task=view
&id=3628&Itemid=10159

Google might help find others. Or maybe use an external SAX based tool to
boil the XML down to something a bit smaller that you can manipulate from
PHP?

--
The email address used to post is a spam pit. Contact me at
http://www.derekfountain.org : <a
href="http://www.derekfountain.org/">Derek Fountain</a>
Jul 17 '05 #2
"doug" <dd*****@shortbus.net> wrote in message
news:11**********************@g14g2000cwa.googlegr oups.com...
How can I parse a large XML file that is to large for memory? I am
currently using php 5.0.3 and the libxml parser, I would like to read
it incrementally from a file, but the parser gets the entire contents
from as String?


Use the expat functions instead.

http://us4.php.net/manual/en/ref.xml.php
Jul 17 '05 #3
How do I use this SAX parser to read XML directly from a file?

Can you show me an instantiation example?

Jul 17 '05 #4
So in the example below, every line is parsed individually with no
contents stored in memory but the current line?
<?php
$file = "data.xml";
$depth = array();

function startElement($parser, $name, $attrs)
{
global $depth;
for ($i = 0; $i < $depth[$parser]; $i++) {
echo " ";
}
echo "$name\n";
$depth[$parser]++;
}

function endElement($parser, $name)
{
global $depth;
$depth[$parser]--;
}

$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}

while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
?>

Jul 17 '05 #5
doug wrote:
How do I use this SAX parser to read XML directly from a file?
Erm, read the docs that come with it! Sounds like you need to read up on SAX
first.
Can you show me an instantiation example?


No, I've never used it. You're actually going to have to do some work here
yourself!

If it were me, and the application allowed it, I'd be writing a separate
utility to do the XML grunt work. PHP, for all it's strengths, isn't ideal
for heavy duty XML parsing.

--
The email address used to post is a spam pit. Contact me at
http://www.derekfountain.org : <a
href="http://www.derekfountain.org/">Derek Fountain</a>
Jul 17 '05 #6
"doug" <dd*****@shortbus.net> wrote in message
news:11**********************@l41g2000cwc.googlegr oups.com...
So in the example below, every line is parsed individually with no
contents stored in memory but the current line?


The expat parser reuses the same buffers in calls to the handlers. If PHP
garbage collection works correctly, then the XML data should not take up
memory if your handlers don't save it.
Jul 17 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

19
by: Alex Mizrahi | last post by:
Hello, All! i have 3mb long XML document with about 150000 lines (i think it has about 200000 elements there) which i want to parse to DOM to work with. first i thought there will be no...
3
by: Girish | last post by:
Hi All, I have written a component(ATL COM) that wraps Xerces C++ parser. I am firing necessary events for each of the notifications that I have handled for the Content and Error handler. The...
1
by: Paul | last post by:
I have users who want to search 6 different large flat xml documents I can only fit 3 of these documents into memory at one time So I continually have to swap XML documents in and out of memory...
9
by: PedroX | last post by:
Hello: I need to parse some large XML files, and save the data in an Access DB. I was using MSXML 2 and ASP, but it turns out to be extremely slow when then XML documents are like 10 mb in...
3
by: Kevin | last post by:
Does anyone have a suggestion for parsing large files line by line without loading the entire file into memory first? I don't want to use file() because the files I'm working with may be...
1
by: Christoph Bisping | last post by:
Hello! Maybe someone is able to give me a little hint on this: I've written a vb.net app which is mainly an interpreter for specialized CAD/CAM files. These files mainly contain simple movement...
8
by: Eric Anderson | last post by:
I have some files that sit on a FTP server. These files contain data stored in a tab-separated format. I need to download these files and insert/update them in a MySQL database. My current basic...
3
by: toton | last post by:
Hi, I have some ascii files, which are having some formatted text. I want to read some section only from the total file. For that what I am doing is indexing the sections (denoted by .START in...
1
by: Robert Neville | last post by:
Basically, I want to create a table in html, xml, or xslt; with any number of regular expressions; a script (Perl or Python) which reads each table row (regex and replacement); and performs the...
22
by: JJ | last post by:
Whats the best way for me to pull out records from a tab delimited text file? Or rather HOW do I parse the text, knowing that the tabs are field delimiters and a return (I image) signifies a new...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.