473,395 Members | 1,649 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

split large xml files

Hi all,

I've an XML file that takes more than the hosting time limit to be readed by
a PHP script.

What I'd like to do is split the large XML file (can be more than 30MB) in
little parts and keep the header for every file.

Here is the idea:

<total>
<head>
</head>
<info>
</info>
<info>
</info>
<info>
</info>
....
</total>

The only change is the amount of "info" available. What I'd like is to split
the file to create littles ones whit the same <head></headdatas but each
with less <infotags (say limited to 3 for every file).

It's there any simple way ? This will only be done if the file is bigger
than 1MB

Bob
Aug 10 '07 #1
6 8524
$xml = simplexml_load_file($xmlFile);
And take it from there. Have a quick read of the simplexml docs. You
should
have your solution in very little time.
Thanks for replying....
after a quick search, I've to say I'm still in PHP 4 !!! damn !!!
Aug 10 '07 #2
Hem, what to say more than thank you !!!

I'll implement it...thanks
Aug 10 '07 #3
On 10.08.2007 11:21 David Gillen wrote:
Bob Bedford said:
>Hi all,

I've an XML file that takes more than the hosting time limit to be readed by
a PHP script.

What I'd like to do is split the large XML file (can be more than 30MB) in
little parts and keep the header for every file.

Here is the idea:

<total>
<head>
</head>
<info>
</info>
<info>
</info>
<info>
</info>
...
</total>

The only change is the amount of "info" available. What I'd like is to split
the file to create littles ones whit the same <head></headdatas but each
with less <infotags (say limited to 3 for every file).

It's there any simple way ? This will only be done if the file is bigger
than 1MB
$xml = simplexml_load_file($xmlFile);
And take it from there. Have a quick read of the simplexml docs. You should
have your solution in very little time.

Didn't test it, but I doubt simplexml would be able to load a 30MB xml
file. I think OP's best option is to use the tool that can read and
parse in small chunks, like expat (see
http://www.php.net/manual/en/function.xml-parse.php)
--
gosha bine

makrell ~ http://www.tagarga.com/blok/makrell
php done right ;) http://code.google.com/p/pihipi
Aug 10 '07 #4
On Aug 10, 2:34 am, "Bob Bedford" <b...@bedford.comwrote:
$xml = simplexml_load_file($xmlFile);
And take it from there. Have a quick read of the simplexml docs. You
should
have your solution in very little time.

Thanks for replying....
after a quick search, I've to say I'm still in PHP 4 !!! damn !!!
If you have files that big, simple xml is not an option, because the
memory will run out, and simple xml reads the whole file in memory and
makes a copy of it. What you really want is xml parsing in "streaming"
or "pull parsing" mode. You can read about it here:

http://www.ibm.com/developerworks/xm...nxw06XMLReader

However, I guess this is also not very helpful since you're running
PHP 4 and XMLReader has been introduced in PHP5. I am fighting this at
this moment also (with no solution yet), as I have to parse huge ONIX
files from book publishers (some are 90 Mb!). Let me know if you get
lucky.

Aug 10 '07 #5
..oO(Pavel Lepin)
>And your point is..?
Exactly what I said. The posted code doesn't follow any coding
guidelines and is _very_ hard to read and understand.

Micha
Aug 14 '07 #6

Michael Fesser <ne*****@gmx.dewrote in
<45********************************@4ax.com>:
.oO(Pavel Lepin)
>>And your point is..?

Exactly what I said. The posted code doesn't follow any
coding guidelines
The code I posted follows the PHP coding style guidelines
(the variant for short code snippets in our dev dept's CMS)
of the organisation I'm working for. I don't think I should
snap out of my habits (that weren't all that easy to
develop to boot, since the coding style I personally prefer
uses *way* more whitespace that the snippet in my OP) just
for the sake of your ease of understanding. Not only you
aren't signing my paychecks, other people might actually
find the code easier to read in the style I used, so no
reason to give you any preference.
and is _very_ hard to read and understand.
I find the coding style promoted by Zend IDE ugly and hard
to parse even with syntax highlighting, let alone by naked
eye. It's a matter of perception, and if you believe
there's any sort of consensus on preferable coding style
even in PHP community alone, you're sadly mistaken.

--
"Patience is a minor form of despair, disguised as
virtue." -- Ambrose Bierce
Aug 14 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
by: Martin Dieringer | last post by:
I am trying to split a file by a fixed string. The file is too large to just read it into a string and split this. I could probably use a lexer but there maybe anything more simple? thanks m.
1
by: mia456789 | last post by:
I hv a mysql db in my RH linux , there is a very large table in the db , the file size is about 2G , how can I split the file into two files - two files physically and one file logically ? is ...
2
by: damian | last post by:
I want to split a large csv file into smaller files. How can i go about this?.. thank you !
2
by: jeremy.figgins | last post by:
Hi, I have a class that is fairly large and I would like to split the file into two files, but still use only one class. What is the best way to accomplish this? Thanks!
1
by: Chris Ashley | last post by:
I am working with some very large bitmap files (1700 * 60000) and need to split them into vertical strips. This is because GDI+ seems to load the entire file into memory and crashes with an out of...
2
by: Curious Joe | last post by:
I have some files that are anywhere from 3GB to 9GB and I need to split them down to a series of smaller files similar to what the "split" command in linux can do. Unfortunately, I do not have...
6
by: ivan.perak | last post by:
Hello, im a beginner in VB.NET... The thing i would like to do is as it follows.... I have a text file (list of names, every name to the next line) which is about 350000 lines long. I would...
1
by: JayDog | last post by:
I have a large data file that I split into smaller more manageable chunks (went from a 12.86 GB file to 500 MB - 1.6 GB chunks). I now want to add to the PERL script and go back through those more...
7
by: John Smith | last post by:
Hi, I am very new to C# and NET framework. I am trying to hash (using MD5CryptoServiceProvider) a source that is split into several files. Now when the source is in one file I can produce the...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.