How can I parse a large XML file that is to large for memory? I am
currently using php 5.0.3 and the libxml parser, I would like to read
it incrementally from a file, but the parser gets the entire contents
from as String? 6 3904
doug wrote: How can I parse a large XML file that is to large for memory? I am currently using php 5.0.3 and the libxml parser, I would like to read it incrementally from a file, but the parser gets the entire contents from as String?
The standard XML solution to this problem is to use a SAX parser instead of
a DOM one. However, there doesn't seem to be a SAX parser in the PHP XML
library. One solution appears to be: http://www.engageinteractive.com/mam...tent&task=view
&id=3628&Itemid =10159
Google might help find others. Or maybe use an external SAX based tool to
boil the XML down to something a bit smaller that you can manipulate from
PHP?
--
The email address used to post is a spam pit. Contact me at http://www.derekfountain.org : <a
href="http://www.derekfounta in.org/">Derek Fountain</a>
"doug" <dd*****@shortb us.net> wrote in message
news:11******** **************@ g14g2000cwa.goo glegroups.com.. . How can I parse a large XML file that is to large for memory? I am currently using php 5.0.3 and the libxml parser, I would like to read it incrementally from a file, but the parser gets the entire contents from as String?
Use the expat functions instead. http://us4.php.net/manual/en/ref.xml.php
How do I use this SAX parser to read XML directly from a file?
Can you show me an instantiation example?
So in the example below, every line is parsed individually with no
contents stored in memory but the current line?
<?php
$file = "data.xml";
$depth = array();
function startElement($p arser, $name, $attrs)
{
global $depth;
for ($i = 0; $i < $depth[$parser]; $i++) {
echo " ";
}
echo "$name\n";
$depth[$parser]++;
}
function endElement($par ser, $name)
{
global $depth;
$depth[$parser]--;
}
$xml_parser = xml_parser_crea te();
xml_set_element _handler($xml_p arser, "startEleme nt", "endElement ");
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xm l_parser, $data, feof($fp))) {
die(sprintf("XM L error: %s at line %d",
xml_error_strin g(xml_get_error _code($xml_pars er)),
xml_get_current _line_number($x ml_parser)));
}
}
xml_parser_free ($xml_parser);
?>
doug wrote: How do I use this SAX parser to read XML directly from a file?
Erm, read the docs that come with it! Sounds like you need to read up on SAX
first.
Can you show me an instantiation example?
No, I've never used it. You're actually going to have to do some work here
yourself!
If it were me, and the application allowed it, I'd be writing a separate
utility to do the XML grunt work. PHP, for all it's strengths, isn't ideal
for heavy duty XML parsing.
--
The email address used to post is a spam pit. Contact me at http://www.derekfountain.org : <a
href="http://www.derekfounta in.org/">Derek Fountain</a>
"doug" <dd*****@shortb us.net> wrote in message
news:11******** **************@ l41g2000cwc.goo glegroups.com.. . So in the example below, every line is parsed individually with no contents stored in memory but the current line?
The expat parser reuses the same buffers in calls to the handlers. If PHP
garbage collection works correctly, then the XML data should not take up
memory if your handlers don't save it. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Alex Mizrahi |
last post by:
Hello, All!
i have 3mb long XML document with about 150000 lines (i think it has about
200000 elements there) which i want to parse to DOM to work with.
first i thought there will be no problems, but there were..
first i tried Python.. there's special interest group that wants to "make
Python become the premier language for XML processing" so i thought there
will be no problems with this document..
i used xml.dom.minidom to parse it.....
|
by: Girish |
last post by:
Hi All,
I have written a component(ATL COM) that wraps Xerces C++ parser.
I am firing necessary events for each of the notifications that I have
handled for the Content and Error handler. The events can then I am
able to parse XML input in the form of files.
I also have provided support for parsing of XML content in the form of
string data. I am able to do so by creating a MemBufInputSource object
using the XML content provided to the...
|
by: Paul |
last post by:
I have users who want to search 6 different large flat xml documents
I can only fit 3 of these documents into memory at one time
So I continually have to swap XML documents in and out of memory
Is it best to use DOM or SAX? or maybe something else?
Using SAX seems like the technology of choice for large xml files
because there is no need to put the xml into memory. But under load
|
by: PedroX |
last post by:
Hello:
I need to parse some large XML files, and save the data in an Access DB. I
was using MSXML 2 and ASP, but it turns out to be extremely slow when then
XML documents are like 10 mb in size. It's taking over an hour to parse such
sizes!?
I don't really need to use ASP or a web server at all because I am parsing
all in my own computer. Is there any executable that can do this parsing
faster than the way I was doing it?
|
by: Kevin |
last post by:
Does anyone have a suggestion for parsing large files line by line without
loading the entire file into memory first? I don't want to use file()
because the files I'm working with may be multi-gigabyte so loading them
into arrays would be pretty memory intensive.
In Perl, I can do: "while($line=<HANDLE>){do something with $line}".
Is there an equivalent function in PHP?
Thanks,
| |
by: Christoph Bisping |
last post by:
Hello!
Maybe someone is able to give me a little hint on this:
I've written a vb.net app which is mainly an interpreter for specialized
CAD/CAM files.
These files mainly contain simple movement and drawing instructions like
"move to's" and "change color's" optionally followed by one or more numeric
(int or float) arguments. My problem is that the parsing algorithm I've
currently implemented is extremely slow.
|
by: Eric Anderson |
last post by:
I have some files that sit on a FTP server. These files contain data
stored in a tab-separated format. I need to download these files and
insert/update them in a MySQL database. My current basic strategy is to
do the following:
1) Login to the ftp server using the FTP library in PHP
2) Create a variable that acts like a file handle using Stream_Var in PEAR.
3) Use ftp_fget() to read a remote file into this variable (this is so I
don't...
|
by: toton |
last post by:
Hi,
I have some ascii files, which are having some formatted text. I want
to read some section only from the total file.
For that what I am doing is indexing the sections (denoted by .START
in the file) with the location.
And for a particular section I parse only that section.
The file is something like,
.... DATAS
|
by: Robert Neville |
last post by:
Basically, I want to create a table in html, xml, or xslt; with any
number of regular expressions; a script (Perl or Python) which reads
each table row (regex and replacement); and performs the replacement
on any file name, folder, or text file (e.g. css, php, html). For
example, I often rename my mp3 (files); the folder holding the mp3
files; and replace these renamed values in a playlist/m3u/xml file.
The table should hold clean...
|
by: JJ |
last post by:
Whats the best way for me to pull out records from a tab delimited text
file?
Or rather HOW do I parse the text, knowing that the tabs are field
delimiters and a return (I image) signifies a new record
?
JJ
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |