Hi,
I create application which transform huge XML files (~ 150 Mb) to CVS files.
And I am facing strange problem. First 1000 rows parsed in 1 sec after 20000
rows speed down to 100 rows per sec, after 70000 rows speed down to 20 rows
per sec ( I should parse ~ 2 500 000 rows).
For me it looks like a GC problem, but I have no Idea how to fix it :(
Any ideas are welcome.
--
Thanks,
Maxim 18 2023
Maxim Kazitov wrote: I create application which transform huge XML files (~ 150 Mb) to CVS files. And I am facing strange problem. First 1000 rows parsed in 1 sec after 20000 rows speed down to 100 rows per sec, after 70000 rows speed down to 20 rows per sec ( I should parse ~ 2 500 000 rows).
For me it looks like a GC problem, but I have no Idea how to fix it :(
If you do this transform by reading the whole file into a
representation of the XML file and then generating CVS, you are
imposing serious memory pressure. If you can read an XML element and
write a CVS element, without each iteration adding (much, if at all)
to your working set, you might go much faster.
If you do need to build a representation of the whole file, and each
XML attribute name and value is a distinct string, you can often save
a lot by "interning" string values, eliminating duplicate string
values.
It's also entirely possible that this has nothing to do with the GC.
What you describe is compatible with some code that's walking a linked
list that keeps growing ....
-- www.midnightbeach.com
Hi Jon,
I use XmlTextReader, so I don't read all XML in once, during the parsing
I build small Xml Documents (one XmlDocument per row), and apply a set of
XPath's to each document. I have a couple of Hashtables in my code, but they
pretty small.
Thanks,
Max
"Jon Shemitz" <jo*@midnightbe ach.com> wrote in message
news:42******** *******@midnigh tbeach.com... Maxim Kazitov wrote:
I create application which transform huge XML files (~ 150 Mb) to CVS files. And I am facing strange problem. First 1000 rows parsed in 1 sec after 20000 rows speed down to 100 rows per sec, after 70000 rows speed down to 20 rows per sec ( I should parse ~ 2 500 000 rows).
For me it looks like a GC problem, but I have no Idea how to fix it :(
If you do this transform by reading the whole file into a representation of the XML file and then generating CVS, you are imposing serious memory pressure. If you can read an XML element and write a CVS element, without each iteration adding (much, if at all) to your working set, you might go much faster.
If you do need to build a representation of the whole file, and each XML attribute name and value is a distinct string, you can often save a lot by "interning" string values, eliminating duplicate string values.
It's also entirely possible that this has nothing to do with the GC. What you describe is compatible with some code that's walking a linked list that keeps growing ....
--
www.midnightbeach.com
Are you creating an XmlDocument or reusing the same one? You should ensure
that you are simply using the same one and Loading the XML string into the
same one.
I ran into memory issues when I used XmlDocument instances a lot.
"Maxim Kazitov" <mv*****@tut.by > wrote in message
news:%2******** ********@TK2MSF TNGP14.phx.gbl. .. Hi Jon,
I use XmlTextReader, so I don't read all XML in once, during the
parsing I build small Xml Documents (one XmlDocument per row), and apply a set of XPath's to each document. I have a couple of Hashtables in my code, but
they pretty small.
Thanks, Max
"Jon Shemitz" <jo*@midnightbe ach.com> wrote in message news:42******** *******@midnigh tbeach.com... Maxim Kazitov wrote:
I create application which transform huge XML files (~ 150 Mb) to CVS files. And I am facing strange problem. First 1000 rows parsed in 1 sec after 20000 rows speed down to 100 rows per sec, after 70000 rows speed down to 20 rows per sec ( I should parse ~ 2 500 000 rows).
For me it looks like a GC problem, but I have no Idea how to fix it :(
If you do this transform by reading the whole file into a representation of the XML file and then generating CVS, you are imposing serious memory pressure. If you can read an XML element and write a CVS element, without each iteration adding (much, if at all) to your working set, you might go much faster.
If you do need to build a representation of the whole file, and each XML attribute name and value is a distinct string, you can often save a lot by "interning" string values, eliminating duplicate string values.
It's also entirely possible that this has nothing to do with the GC. What you describe is compatible with some code that's walking a linked list that keeps growing ....
--
www.midnightbeach.com
Maxim,
Probably is the reason what you use to build your CSV files.
When you create them as long Strings first in memory, than the problem is
clear.
Can you show that?
Cor
On Mon, 28 Mar 2005 00:01:40 -0500, "Maxim Kazitov" <mv*****@tut.by >
wrote: I use XmlTextReader, so I don't read all XML in once, during the parsing I build small Xml Documents (one XmlDocument per row), and apply a set of XPath's to each document. I have a couple of Hashtables in my code, but they pretty small.
1. Make sure that you "let go" of each XmlDocument when you no longer
use it. All references must have gone out of scope, or set to null
references, or reassigned to the new XmlDocument. The old documents
must not stay around in memory.
2. Call System.GC.Colle ct() immediately before you create a new
XmlDocument. Microsoft pretends this can't happen but I've seen it
myself that the garbage collector's performance can completely break
down if you repeatedly allocate large pools of objects without manual
Collect calls in-between.
-- http://www.kynosarges.de
"Maxim Kazitov" <mv*****@tut.by > wrote in message
news:u0******** *****@TK2MSFTNG P15.phx.gbl... Hi,
I create application which transform huge XML files (~ 150 Mb) to CVS files. And I am facing strange problem. First 1000 rows parsed in 1 sec after 20000 rows speed down to 100 rows per sec, after 70000 rows speed down to 20 rows per sec ( I should parse ~ 2 500 000 rows).
For me it looks like a GC problem, but I have no Idea how to fix it :(
Any ideas are welcome.
-- Thanks, Maxim
I could be wrong, but It looks like you are using more memoy than physically
available and as result the system starts paging and finaly starts
thrashing. That would mean you are holding references to objects that could
otherwise be collected by the GC, so it's not a GC problem it's a design
problem.
I suggest you start looking at the memory consumption using Perfmon (GC GEN
0, 1 and 2 memory counters) and the paging activity.
If it looks like I'm right you should check your object allocation pattern,
check wheter you are holding references that could otherwise be released,
for instance references stored in arrays/collections that are no longer
needed should be set to null.
Willy.
You should also use the StringBuilder to build your output string. If
you are using string concatenation, you are creating many string
instances, and that is very ineffecient. If you are concatenating a
string in a loop, always use stringbuilder.
Maxim,
Do you need the XmlDocument? Have you considered using XPathDocument class
instead. I don't know if its more memory friendly then XmlDocument, I do
know it is faster then XmlDocument...
Have you used PerfMon or CLR Profiler to see what is the life time of your
objects? I would use PerfMon first as Willy suggests, & if it suggests a
memory problem, then use CLR Profiler to identify specific problems...
Info on the CLR Profiler: http://msdn.microsoft.com/library/de...nethowto13.asp http://msdn.microsoft.com/library/de...anagedapps.asp
Hope this helps
Jay
"Maxim Kazitov" <mv*****@tut.by > wrote in message
news:%2******** ********@TK2MSF TNGP14.phx.gbl. .. Hi Jon,
I use XmlTextReader, so I don't read all XML in once, during the parsing I build small Xml Documents (one XmlDocument per row), and apply a set of XPath's to each document. I have a couple of Hashtables in my code, but they pretty small.
Thanks, Max
"Jon Shemitz" <jo*@midnightbe ach.com> wrote in message news:42******** *******@midnigh tbeach.com... Maxim Kazitov wrote:
I create application which transform huge XML files (~ 150 Mb) to CVS files. And I am facing strange problem. First 1000 rows parsed in 1 sec after 20000 rows speed down to 100 rows per sec, after 70000 rows speed down to 20 rows per sec ( I should parse ~ 2 500 000 rows).
For me it looks like a GC problem, but I have no Idea how to fix it :(
If you do this transform by reading the whole file into a representation of the XML file and then generating CVS, you are imposing serious memory pressure. If you can read an XML element and write a CVS element, without each iteration adding (much, if at all) to your working set, you might go much faster.
If you do need to build a representation of the whole file, and each XML attribute name and value is a distinct string, you can often save a lot by "interning" string values, eliminating duplicate string values.
It's also entirely possible that this has nothing to do with the GC. What you describe is compatible with some code that's walking a linked list that keeps growing ....
--
www.midnightbeach.com
I already use StringBuilder
"Pat A" <pw*******@gmai l.com> wrote in message
news:11******** *************@o 13g2000cwo.goog legroups.com... You should also use the StringBuilder to build your output string. If you are using string concatenation, you are creating many string instances, and that is very ineffecient. If you are concatenating a string in a loop, always use stringbuilder. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Collin VanDyck |
last post by:
I have a basic understanding of this, so forgive me if I am overly
simplistic in my explanation of my problem..
I am trying to get a Java/Xalan transform to pass through a numeric
character reference (i.e. ) and it seems to be converting the
character to its UNICODE representation.
Take this source XML document:
<?xml version="1.0" encoding="UTF-8"?>
|
by: DrTebi |
last post by:
Hello,
I have the following problem:
I used to "encode" my email address within links, in order to avoid (most)
email spiders. So I had a link like this:
<a
href="mailto:DrTebi@yahoo.com">DrTebi</a>
This would work like a regular mailto link in any browser, but wouldn't be
visible to spiders if they don't have a function to decode it.
|
by: Thomas Scheffler |
last post by:
Hi,
I runned in trouble using XALAN for XSL-Transformation.
The following snipplet show what I mean:
<a href="http://blah.com/?test=test&test2=test2">Test1&</a>
<a href="http://blah.com/?test=test&amp;test2=test2">Test2&amp;</a>
This results in the following HTML Code:
|
by: Luklrc |
last post by:
Hi,
I'm having to create a querysting with javascript. My problem is that
javscript turns the "&" characher into "&" when it gets used as a
querystring in the url EG:
/mypage.asp?value1=1&value2=4& ...
which of course means nothing to asp.
|
by: johkar |
last post by:
When the output method is set to xml, even though I have CDATA around
my JavaScript, the operaters of && and < are converted to XML character
entities which causes errors in my JavaScript. I know that I could
externalize my JavaScript, but that will not be practical throughout
this application. Is there any way to get around this issue? Xalan
processor. Stripped down stylesheet below along with XHTML output.
<?xml version='1.0'?>...
| |
by: Nathan Sokalski |
last post by:
I add a JavaScript event handler to some of my Webcontrols using the
Attributes.Add() method as follows:
Dim jscode as String = "return (event.keyCode>=65&&event.keyCode<=90);"
TextBox2.Attributes.Add("onKeyPress", jscode)
You will notice that jscode contains the JavaScript Logical And operator
(&&). However, ASP.NET renders this as && in the code that is
|
by: Jeremy |
last post by:
How can one stop a browser from converting
&
to
& ?
We have a textarea in our system wehre a user can type in some html code
and have it saved to the database. When the data is retireved and
|
by: Arne |
last post by:
A lot of Firefox users I know, says they have problems with validation
where the ampersand sign has to be written as & to be valid. I don't
have Firefox my self and don't wont to install it only because of this,
so I hope some of you gurus can enlighten me with this :)
In what circumstances can the "&" in the source code be involuntary
changed to "&" by a browser when or other software, when editing and
uploading the file to the web...
|
by: InvalidLastName |
last post by:
We have been used XslTransform. .NET 1.1, for transform XML document, Dataset with xsl to HTML. Some of these html contents contain javascript and links. For example:
// javascript
if (a > b)
.....
// xsl contents
abc.aspx?p1=v1&p2=<xsl:value-of select="$v2" />
|
by: John Nagle |
last post by:
I've been parsing existing HTML with BeautifulSoup, and occasionally
hit content which has something like "Design & Advertising", that is,
an "&" instead of an "&". Is there some way I can get BeautifulSoup
to clean those up? There are various parsing options related to "&"
handling, but none of them seem to do quite the right thing.
If I write the BeautifulSoup parse tree back out with "prettify",
the loose "&" is still in there. So...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
| |
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |