473,322 Members | 1,523 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

How to edit a large xml file (250MB)?

How can I edit an xml file which has 250MB? I tried to use UltraEdit, Visual
Studio, Eclipse, Stylus Studio and XMLSpy editors but these programs can't
read this file because it is too big. SmEdit reads only the first MB of the
file and doesn't support UTF-8 (I need program which supports it). Now I use
XVI32 which is hexadecimal editor, but it can be useful only is editing
small number of characters - deleting and inserting characters to large
files is very tiring.

I don't need xml editor. It can be any text editor without xml validation
etc. I don't know how such a program should work, but in my opinion there
should be such a program.

Aug 23 '06 #1
10 10004
setar wrote:
How can I edit an xml file which has 250MB?
Don't make XML files that are 250MB in size.

Editing is simple. So if you can't even edit it, how are you going to
process it? If you run XPath on it, what do you think performance will
be like?

There are (rare) times when XML works in these volumes, but in general
it doesn't. If you're looking for a stream-based format (easy to work
with in huge volumes) then XML's single root element constraint works
against you. If you're trying to build a database, then XML's lack of
efficient querying is a performance hit. If you want 250MB files as an
encapsulated data format (maybe ETL on a database) then it's workable,
but the document lifecycle is a fairly short
create-transfer-load-delete.

So if your application requires a 250MB data entity, then think
carefully about the tools you're using. Life might be simpler that way.

I also have lots of 250MB files around, but I don't edit them by hand.
I have computers to do that sort of thing for me instead.

Aug 23 '06 #2
setar wrote:
I don't need xml editor. It can be any text editor without xml validation
etc. I don't know how such a program should work, but in my opinion there
should be such a program.
Use vim, the improved vi editor. I have edited such
large XML files with vi several times and you hardly
notice the difference between 10 MB and 200 MB files.
Current versions of vim (when configured properly)
can also edit any UTF-8 characters, for example Japanese.
Aug 23 '06 #3
setar wrote:
How can I edit an xml file which has 250MB?
Emacs also supports UTF-8, of course.

How much swap space have you got? That's what's going to control your
maximum buffer size, assuming you've got a reasonably intelligent editor
implementation.

Another alternative is a stream editor -- the Unix tool "sed" or
something equivalent. Downside of that is that it isn't interactive; you
have to essentially write a program that tells it how to find the points
you want changed and what you want done with them.

If you'd rather stay in the XML world, you could find or write a stream
editor based on SAX streams; this is one of the classic situations where
SAX can have advantages over DOM-based processing.

Or find/write a tool that will handle your document in chunks, either
text-based or SAX-based. Again, that presumes that what you're doing
divides up nicely.

Which of these approaches/tools makes the most sense depends on exactly
what you're trying to do to the file.
--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Aug 23 '06 #4
setar schreef:
How can I edit an xml file which has 250MB? I tried to use UltraEdit, Visual
Studio, Eclipse, Stylus Studio and XMLSpy editors but these programs can't
read this file because it is too big. SmEdit reads only the first MB of the
file and doesn't support UTF-8 (I need program which supports it). Now I use
XVI32 which is hexadecimal editor, but it can be useful only is editing
small number of characters - deleting and inserting characters to large
files is very tiring.

I don't need xml editor. It can be any text editor without xml validation
etc. I don't know how such a program should work, but in my opinion there
should be such a program.

Use a native XML-Database to store your xml data, and edit it using
XQuery,
there already exists databases that supports xml file sizes into the
multiple GB range:

http://exist.sourceforge.net/
http://xml.apache.org/xindice/
Aug 23 '06 #5
Tjerk Wolterink wrote:
Use a native XML-Database to store your xml data, and edit it using XQuery,
there already exists databases that supports xml file sizes into the
multiple GB range:

http://exist.sourceforge.net/
http://xml.apache.org/xindice/
IBM's DB2 now has a native-XML data format, making it a world-class XML
database as well as a world-class relational database.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Aug 23 '06 #6
In case you haven't got the hang of vim yet :-) ...

If you're on Windows you could try TextPad (you can get a full-featured
evaluation version to test) or EmEditor (free standard version with
most features). Obviously your system's resources will determine
whether this works for you and how well, but I can open a 250MB text
file with those text editors and it looks as though I could edit.
Performance seems better on EmEditor, TextPad doesn't have full Unicode
display support but seems like it might cope... That said, I've never
opened such large files except out of curiosity...

Also check that you aren't using UTF-16 as a file encoding --
conversion to UTF-8 could save you some space.

XML editors will obviously have problems opening such large files
because they have to parse the file (some XML editors have an option
which you can set so that files aren't automatically parsed on
opening). One good open-source XML editor which aims at efficiency is
XML Copy Editor which you'll find on sourceforge. It won't manage files
of that size, though.

Tim

setar wrote:
How can I edit an xml file which has 250MB? I tried to use UltraEdit, Visual
Studio, Eclipse, Stylus Studio and XMLSpy editors but these programs can't
read this file because it is too big. SmEdit reads only the first MB of the
file and doesn't support UTF-8 (I need program which supports it). Now I use
XVI32 which is hexadecimal editor, but it can be useful only is editing
small number of characters - deleting and inserting characters to large
files is very tiring.

I don't need xml editor. It can be any text editor without xml validation
etc. I don't know how such a program should work, but in my opinion there
should be such a program.
Aug 23 '06 #7

User "Andy Dingley" wrote:
Don't make XML files that are 250MB in size.
It isn't file created by me. File contains about 100'000 records which I
import to my program. Everything is working. Unfortunately several records
in the file have errors which I want to correct. I don't want to write
additional code to be able to correct imported data. I prefer to make some
changes in source file. Of course I could write code for editing imported
data, but I don't need this functionality except for correcting mentioned
errors. I also have no access to editor which exported mentioned xml file.

User "Juergen Kahrs" wrote:
Use vim, the improved vi editor. I have edited such
large XML files with vi several times ....
Thanks! I've checked it and it's good solution for me.
With this configuration:
- set enc=utf-8 (UTF-8 encoding)
- set undolevels=-1 (maybe with this vim is faster ...)
efficiencies for subtasks of editing in gvim are:
- opening 250MB xml file: 15 seconds
- searching word (case sensitive): to 20 seconds (depending on its place
in file)
In my opinion it could be better because for example in Total
Commander's default viewer it takes only 2 seconds!
But it is acceptable, because I want only to make a few dozen of
changes.
- going to specified line of the file by specifying line number or by
draging vertical slider by mouse: veeeery long, so don't do this!
- making small changes (for example inserting and deleting some lines of
text; writing something): fluently
- writing changes to file (for example when we will do all changes): 15
seconds
I have Athlon 2500 with 1GB RAM. gvim uses only 300MB, so 512MB of RAM were
free.

User "Juergen Kahrs" wrote:
... and you hardly
notice the difference between 10 MB and 200 MB files.
Current versions of vim (when configured properly)
can also edit any UTF-8 characters, for example Japanese.
I can notice difference between searches which take 2 seconds and 20
seconds:) But you are right that "making small changes (for example
inserting and deleting some lines of text; writing something)" is very fast.

User "Joe Kesselman" wrote:
>Ather alternative is a stream editor -- the Unix tool "sed" or
something equivalent. Downside of that is that it isn't interactive; you
have to essentially write a program that tells it how to find the points
you want changed and what you want done with them.
I would prefer something interactive, because every change will be different
.... I dont want to write a program every time ...
>Or find/write a tool that will handle your document in chunks, either
text-based or SAX-based. Again, that presumes that what you're doing
divides up nicely.
Unfortunatelly I can't find such a tool ...

User ac*******@yahoo.co.uk wrote:
>If you're on Windows you could try TextPad (you can get a full-featured
evaluation version to test) or EmEditor (free standard version with
most features).
Here are statistics with default configuration: ;)
- opening 250MB xml file: 70 seconds
- searching word at end of file: 45 seconds
- draging vertical slider by mouse: fluently:)
- making small changes (for example inserting and deleting some lines of
text; writing something): sometimes 0.5 second, sometimes 30 seconds :(((
30 seconds is long, but maybe it will be acceptable for someone ...
- writing changes to file (for example when we will do all changes): not
tested;)

P.S. Sorry for errors, my English isn't good.

Aug 23 '06 #8
setar wrote:
efficiencies for subtasks of editing in gvim are:
- opening 250MB xml file: 15 seconds
7 seconds on my AMD Sempron 2800+ (SuSE Linux 10.1).
- searching word (case sensitive): to 20 seconds (depending on its place
in file)
18 seconds on my PC for searching until end of file.
- going to specified line of the file by specifying line number or by
draging vertical slider by mouse: veeeery long, so don't do this!
You shouldnt use gvim but the original vim on Linux.
Going to line number 5000000 works instantly on my PC.
- writing changes to file (for example when we will do all changes): 15
seconds
15 seconds also on my PC.
I have Athlon 2500 with 1GB RAM. gvim uses only 300MB, so 512MB of RAM were
free.
300 MB used by vim on my PC also.
I can notice difference between searches which take 2 seconds and 20
seconds:) But you are right that "making small changes (for example
inserting and deleting some lines of text; writing something)" is very fast.
That's true, I also noticed a "slight" difference.
>Or find/write a tool that will handle your document in chunks, either
text-based or SAX-based. Again, that presumes that what you're doing
divides up nicely.

Unfortunatelly I can't find such a tool ...
Before you choose a tool you have to find out if you
can assume that XML files are well-formed. If they _are_
well-formed, than you can choose among a large set of
tools on the marke. Otherwise, you have to use an editor.

I guess you are better off using vim.
But if you consider using a tool, have a look at this one:

http://home.vrweb.de/~juergen.kahrs/gawk/XML/

Good luck.
Aug 23 '06 #9
setar wrote:
How can I edit an xml file which has 250MB? I tried to use UltraEdit, Visual
Studio, Eclipse, Stylus Studio and XMLSpy editors but these programs can't
read this file because it is too big. SmEdit reads only the first MB of the
file and doesn't support UTF-8 (I need program which supports it). Now I use
XVI32 which is hexadecimal editor, but it can be useful only is editing
small number of characters - deleting and inserting characters to large
files is very tiring.
Emacs. With psgml and xxml and onsgmls if you want DTD validation.

///Peter
Aug 23 '06 #10

User "Peter Flynn" wrote:
Emacs. With psgml and xxml and onsgmls if you want DTD validation.
I installed GNU Emacs 21.3 on Windows XP. Emacs displays this message while
opening file:
"find-file-noselect-1: Maximum buffer size exceeded"
and doesn't load file.
I've found this information on gnu.emacs.help news group written by Stefan
Monnier on 11 January 2005:

--------------------------------------------------
Emacs 21.3.1 did not open a 150Mb text file in windows XP. Is there
are way to make emacs open larger files ?
On 32bit systems, the maximum file size in Emacs-21.3 is 128MB.
In Emacs-CVS, it's been pushed to 256MB.
It can be fairly easily be pushed further to 512MB, tho the corresponding
patch is not in Emacs-CVS.

If that's not good enough:
1 - use a 64bit system (with an Emacs compiled accordingly).
2 - split your file into smaller chunks.
3 - use XEmacs whose max is 1GB.
--------------------------------------------------

So ... did you mean using XEmacs?
Aug 24 '06 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Robert May | last post by:
Hi, I am trying to execute some code compiled by g++ on Linux and have found that after some time, the program allocates a huge amount of swap space (250MB on my machine which has 512MB...
4
by: news | last post by:
Our production database in an exported textfil runs about 60 MB. Compressed that's about 9 MB. I'm trying to import the export into another machine running FC3 and mySQL 11.18, and it appears as...
4
by: Kevin Myers | last post by:
Hello, Please forgive my reposting of this note with hopefully a more relevant subject line. On an Access 2000 form under Windows 2000 I would like to use a Kodak Image Edit Control to...
1
by: peg | last post by:
I have an app that i need to create a way for a user to either add or edit records that have a good 50 fields. Due to the large amount of fields I can't just use a Datagrid and use the edit...
4
by: moondaddy | last post by:
I need to edit the text in many files so I'm writing a small routine to do this. First I have a method that loops through all the files in a directory and passes the full file path to another...
6
by: gonzlobo | last post by:
I've been using Python for a few days. It's such the perfect language for parsing data! I really like it so far, but I'm having a hard time reading a file, reading the first few hex characters &...
2
by: SEO London | last post by:
Hi all, I have been trying to parse a 250MB xml file using PHP5's XMLReader. This is working perfectly for small files but stops mid process (without any error messages) after about 10 seconds...
3
by: phk | last post by:
Hi, I want to directly edit a large text file without creating a new one Can anybody help ? Kind regards, Paul
2
by: AtTheEnd | last post by:
Hi folks, I've searched high and low for solution. Here's the skinny: MySQL Server 5.0 on Windows Server 2003 I have a 156MB sql file I'm trying to import using mysql -uusername -p database <...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.