473,799 Members | 3,267 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Parsing Large XML files

How can I parse a large XML file that is to large for memory? I am
currently using php 5.0.3 and the libxml parser, I would like to read
it incrementally from a file, but the parser gets the entire contents
from as String?

Jul 17 '05 #1
6 3904
doug wrote:
How can I parse a large XML file that is to large for memory? I am
currently using php 5.0.3 and the libxml parser, I would like to read
it incrementally from a file, but the parser gets the entire contents
from as String?


The standard XML solution to this problem is to use a SAX parser instead of
a DOM one. However, there doesn't seem to be a SAX parser in the PHP XML
library. One solution appears to be:

http://www.engageinteractive.com/mam...tent&task=view
&id=3628&Itemid =10159

Google might help find others. Or maybe use an external SAX based tool to
boil the XML down to something a bit smaller that you can manipulate from
PHP?

--
The email address used to post is a spam pit. Contact me at
http://www.derekfountain.org : <a
href="http://www.derekfounta in.org/">Derek Fountain</a>
Jul 17 '05 #2
"doug" <dd*****@shortb us.net> wrote in message
news:11******** **************@ g14g2000cwa.goo glegroups.com.. .
How can I parse a large XML file that is to large for memory? I am
currently using php 5.0.3 and the libxml parser, I would like to read
it incrementally from a file, but the parser gets the entire contents
from as String?


Use the expat functions instead.

http://us4.php.net/manual/en/ref.xml.php
Jul 17 '05 #3
How do I use this SAX parser to read XML directly from a file?

Can you show me an instantiation example?

Jul 17 '05 #4
So in the example below, every line is parsed individually with no
contents stored in memory but the current line?
<?php
$file = "data.xml";
$depth = array();

function startElement($p arser, $name, $attrs)
{
global $depth;
for ($i = 0; $i < $depth[$parser]; $i++) {
echo " ";
}
echo "$name\n";
$depth[$parser]++;
}

function endElement($par ser, $name)
{
global $depth;
$depth[$parser]--;
}

$xml_parser = xml_parser_crea te();
xml_set_element _handler($xml_p arser, "startEleme nt", "endElement ");
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}

while ($data = fread($fp, 4096)) {
if (!xml_parse($xm l_parser, $data, feof($fp))) {
die(sprintf("XM L error: %s at line %d",
xml_error_strin g(xml_get_error _code($xml_pars er)),
xml_get_current _line_number($x ml_parser)));
}
}
xml_parser_free ($xml_parser);
?>

Jul 17 '05 #5
doug wrote:
How do I use this SAX parser to read XML directly from a file?
Erm, read the docs that come with it! Sounds like you need to read up on SAX
first.
Can you show me an instantiation example?


No, I've never used it. You're actually going to have to do some work here
yourself!

If it were me, and the application allowed it, I'd be writing a separate
utility to do the XML grunt work. PHP, for all it's strengths, isn't ideal
for heavy duty XML parsing.

--
The email address used to post is a spam pit. Contact me at
http://www.derekfountain.org : <a
href="http://www.derekfounta in.org/">Derek Fountain</a>
Jul 17 '05 #6
"doug" <dd*****@shortb us.net> wrote in message
news:11******** **************@ l41g2000cwc.goo glegroups.com.. .
So in the example below, every line is parsed individually with no
contents stored in memory but the current line?


The expat parser reuses the same buffers in calls to the handlers. If PHP
garbage collection works correctly, then the XML data should not take up
memory if your handlers don't save it.
Jul 17 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

19
2032
by: Alex Mizrahi | last post by:
Hello, All! i have 3mb long XML document with about 150000 lines (i think it has about 200000 elements there) which i want to parse to DOM to work with. first i thought there will be no problems, but there were.. first i tried Python.. there's special interest group that wants to "make Python become the premier language for XML processing" so i thought there will be no problems with this document.. i used xml.dom.minidom to parse it.....
3
3072
by: Girish | last post by:
Hi All, I have written a component(ATL COM) that wraps Xerces C++ parser. I am firing necessary events for each of the notifications that I have handled for the Content and Error handler. The events can then I am able to parse XML input in the form of files. I also have provided support for parsing of XML content in the form of string data. I am able to do so by creating a MemBufInputSource object using the XML content provided to the...
1
1506
by: Paul | last post by:
I have users who want to search 6 different large flat xml documents I can only fit 3 of these documents into memory at one time So I continually have to swap XML documents in and out of memory Is it best to use DOM or SAX? or maybe something else? Using SAX seems like the technology of choice for large xml files because there is no need to put the xml into memory. But under load
9
4055
by: PedroX | last post by:
Hello: I need to parse some large XML files, and save the data in an Access DB. I was using MSXML 2 and ASP, but it turns out to be extremely slow when then XML documents are like 10 mb in size. It's taking over an hour to parse such sizes!? I don't really need to use ASP or a web server at all because I am parsing all in my own computer. Is there any executable that can do this parsing faster than the way I was doing it?
3
1802
by: Kevin | last post by:
Does anyone have a suggestion for parsing large files line by line without loading the entire file into memory first? I don't want to use file() because the files I'm working with may be multi-gigabyte so loading them into arrays would be pretty memory intensive. In Perl, I can do: "while($line=<HANDLE>){do something with $line}". Is there an equivalent function in PHP? Thanks,
1
2465
by: Christoph Bisping | last post by:
Hello! Maybe someone is able to give me a little hint on this: I've written a vb.net app which is mainly an interpreter for specialized CAD/CAM files. These files mainly contain simple movement and drawing instructions like "move to's" and "change color's" optionally followed by one or more numeric (int or float) arguments. My problem is that the parsing algorithm I've currently implemented is extremely slow.
8
2944
by: Eric Anderson | last post by:
I have some files that sit on a FTP server. These files contain data stored in a tab-separated format. I need to download these files and insert/update them in a MySQL database. My current basic strategy is to do the following: 1) Login to the ftp server using the FTP library in PHP 2) Create a variable that acts like a file handle using Stream_Var in PEAR. 3) Use ftp_fget() to read a remote file into this variable (this is so I don't...
3
4387
by: toton | last post by:
Hi, I have some ascii files, which are having some formatted text. I want to read some section only from the total file. For that what I am doing is indexing the sections (denoted by .START in the file) with the location. And for a particular section I parse only that section. The file is something like, .... DATAS
1
1768
by: Robert Neville | last post by:
Basically, I want to create a table in html, xml, or xslt; with any number of regular expressions; a script (Perl or Python) which reads each table row (regex and replacement); and performs the replacement on any file name, folder, or text file (e.g. css, php, html). For example, I often rename my mp3 (files); the folder holding the mp3 files; and replace these renamed values in a playlist/m3u/xml file. The table should hold clean...
22
2979
by: JJ | last post by:
Whats the best way for me to pull out records from a tab delimited text file? Or rather HOW do I parse the text, knowing that the tabs are field delimiters and a return (I image) signifies a new record ? JJ
0
9546
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10260
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10030
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9078
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7570
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5467
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5590
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4146
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2941
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.