473,804 Members | 2,148 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

C# & GC

Hi,

I create application which transform huge XML files (~ 150 Mb) to CVS files.
And I am facing strange problem. First 1000 rows parsed in 1 sec after 20000
rows speed down to 100 rows per sec, after 70000 rows speed down to 20 rows
per sec ( I should parse ~ 2 500 000 rows).

For me it looks like a GC problem, but I have no Idea how to fix it :(

Any ideas are welcome.

--
Thanks,
Maxim
Jul 21 '05 #1
18 2023
Maxim Kazitov wrote:
I create application which transform huge XML files (~ 150 Mb) to CVS files.
And I am facing strange problem. First 1000 rows parsed in 1 sec after 20000
rows speed down to 100 rows per sec, after 70000 rows speed down to 20 rows
per sec ( I should parse ~ 2 500 000 rows).

For me it looks like a GC problem, but I have no Idea how to fix it :(


If you do this transform by reading the whole file into a
representation of the XML file and then generating CVS, you are
imposing serious memory pressure. If you can read an XML element and
write a CVS element, without each iteration adding (much, if at all)
to your working set, you might go much faster.

If you do need to build a representation of the whole file, and each
XML attribute name and value is a distinct string, you can often save
a lot by "interning" string values, eliminating duplicate string
values.

It's also entirely possible that this has nothing to do with the GC.
What you describe is compatible with some code that's walking a linked
list that keeps growing ....

--

www.midnightbeach.com
Jul 21 '05 #2
Hi Jon,

I use XmlTextReader, so I don't read all XML in once, during the parsing
I build small Xml Documents (one XmlDocument per row), and apply a set of
XPath's to each document. I have a couple of Hashtables in my code, but they
pretty small.
Thanks,
Max
"Jon Shemitz" <jo*@midnightbe ach.com> wrote in message
news:42******** *******@midnigh tbeach.com...
Maxim Kazitov wrote:
I create application which transform huge XML files (~ 150 Mb) to CVS
files.
And I am facing strange problem. First 1000 rows parsed in 1 sec after
20000
rows speed down to 100 rows per sec, after 70000 rows speed down to 20
rows
per sec ( I should parse ~ 2 500 000 rows).

For me it looks like a GC problem, but I have no Idea how to fix it :(


If you do this transform by reading the whole file into a
representation of the XML file and then generating CVS, you are
imposing serious memory pressure. If you can read an XML element and
write a CVS element, without each iteration adding (much, if at all)
to your working set, you might go much faster.

If you do need to build a representation of the whole file, and each
XML attribute name and value is a distinct string, you can often save
a lot by "interning" string values, eliminating duplicate string
values.

It's also entirely possible that this has nothing to do with the GC.
What you describe is compatible with some code that's walking a linked
list that keeps growing ....

--

www.midnightbeach.com

Jul 21 '05 #3
Are you creating an XmlDocument or reusing the same one? You should ensure
that you are simply using the same one and Loading the XML string into the
same one.

I ran into memory issues when I used XmlDocument instances a lot.

"Maxim Kazitov" <mv*****@tut.by > wrote in message
news:%2******** ********@TK2MSF TNGP14.phx.gbl. ..
Hi Jon,

I use XmlTextReader, so I don't read all XML in once, during the parsing I build small Xml Documents (one XmlDocument per row), and apply a set of
XPath's to each document. I have a couple of Hashtables in my code, but they pretty small.
Thanks,
Max
"Jon Shemitz" <jo*@midnightbe ach.com> wrote in message
news:42******** *******@midnigh tbeach.com...
Maxim Kazitov wrote:
I create application which transform huge XML files (~ 150 Mb) to CVS
files.
And I am facing strange problem. First 1000 rows parsed in 1 sec after
20000
rows speed down to 100 rows per sec, after 70000 rows speed down to 20
rows
per sec ( I should parse ~ 2 500 000 rows).

For me it looks like a GC problem, but I have no Idea how to fix it :(


If you do this transform by reading the whole file into a
representation of the XML file and then generating CVS, you are
imposing serious memory pressure. If you can read an XML element and
write a CVS element, without each iteration adding (much, if at all)
to your working set, you might go much faster.

If you do need to build a representation of the whole file, and each
XML attribute name and value is a distinct string, you can often save
a lot by "interning" string values, eliminating duplicate string
values.

It's also entirely possible that this has nothing to do with the GC.
What you describe is compatible with some code that's walking a linked
list that keeps growing ....

--

www.midnightbeach.com


Jul 21 '05 #4
Maxim,

Probably is the reason what you use to build your CSV files.
When you create them as long Strings first in memory, than the problem is
clear.

Can you show that?

Cor
Jul 21 '05 #5
On Mon, 28 Mar 2005 00:01:40 -0500, "Maxim Kazitov" <mv*****@tut.by >
wrote:
I use XmlTextReader, so I don't read all XML in once, during the parsing
I build small Xml Documents (one XmlDocument per row), and apply a set of
XPath's to each document. I have a couple of Hashtables in my code, but they
pretty small.


1. Make sure that you "let go" of each XmlDocument when you no longer
use it. All references must have gone out of scope, or set to null
references, or reassigned to the new XmlDocument. The old documents
must not stay around in memory.

2. Call System.GC.Colle ct() immediately before you create a new
XmlDocument. Microsoft pretends this can't happen but I've seen it
myself that the garbage collector's performance can completely break
down if you repeatedly allocate large pools of objects without manual
Collect calls in-between.
--
http://www.kynosarges.de
Jul 21 '05 #6

"Maxim Kazitov" <mv*****@tut.by > wrote in message
news:u0******** *****@TK2MSFTNG P15.phx.gbl...
Hi,

I create application which transform huge XML files (~ 150 Mb) to CVS
files. And I am facing strange problem. First 1000 rows parsed in 1 sec
after 20000 rows speed down to 100 rows per sec, after 70000 rows speed
down to 20 rows per sec ( I should parse ~ 2 500 000 rows).

For me it looks like a GC problem, but I have no Idea how to fix it :(

Any ideas are welcome.

--
Thanks,
Maxim


I could be wrong, but It looks like you are using more memoy than physically
available and as result the system starts paging and finaly starts
thrashing. That would mean you are holding references to objects that could
otherwise be collected by the GC, so it's not a GC problem it's a design
problem.
I suggest you start looking at the memory consumption using Perfmon (GC GEN
0, 1 and 2 memory counters) and the paging activity.
If it looks like I'm right you should check your object allocation pattern,
check wheter you are holding references that could otherwise be released,
for instance references stored in arrays/collections that are no longer
needed should be set to null.

Willy.

Jul 21 '05 #7
You should also use the StringBuilder to build your output string. If
you are using string concatenation, you are creating many string
instances, and that is very ineffecient. If you are concatenating a
string in a loop, always use stringbuilder.

Jul 21 '05 #8
Maxim,
Do you need the XmlDocument? Have you considered using XPathDocument class
instead. I don't know if its more memory friendly then XmlDocument, I do
know it is faster then XmlDocument...

Have you used PerfMon or CLR Profiler to see what is the life time of your
objects? I would use PerfMon first as Willy suggests, & if it suggests a
memory problem, then use CLR Profiler to identify specific problems...

Info on the CLR Profiler:
http://msdn.microsoft.com/library/de...nethowto13.asp

http://msdn.microsoft.com/library/de...anagedapps.asp

Hope this helps
Jay
"Maxim Kazitov" <mv*****@tut.by > wrote in message
news:%2******** ********@TK2MSF TNGP14.phx.gbl. ..
Hi Jon,

I use XmlTextReader, so I don't read all XML in once, during the
parsing I build small Xml Documents (one XmlDocument per row), and apply a
set of XPath's to each document. I have a couple of Hashtables in my code,
but they pretty small.
Thanks,
Max
"Jon Shemitz" <jo*@midnightbe ach.com> wrote in message
news:42******** *******@midnigh tbeach.com...
Maxim Kazitov wrote:
I create application which transform huge XML files (~ 150 Mb) to CVS
files.
And I am facing strange problem. First 1000 rows parsed in 1 sec after
20000
rows speed down to 100 rows per sec, after 70000 rows speed down to 20
rows
per sec ( I should parse ~ 2 500 000 rows).

For me it looks like a GC problem, but I have no Idea how to fix it :(


If you do this transform by reading the whole file into a
representation of the XML file and then generating CVS, you are
imposing serious memory pressure. If you can read an XML element and
write a CVS element, without each iteration adding (much, if at all)
to your working set, you might go much faster.

If you do need to build a representation of the whole file, and each
XML attribute name and value is a distinct string, you can often save
a lot by "interning" string values, eliminating duplicate string
values.

It's also entirely possible that this has nothing to do with the GC.
What you describe is compatible with some code that's walking a linked
list that keeps growing ....

--

www.midnightbeach.com


Jul 21 '05 #9
I already use StringBuilder

"Pat A" <pw*******@gmai l.com> wrote in message
news:11******** *************@o 13g2000cwo.goog legroups.com...
You should also use the StringBuilder to build your output string. If
you are using string concatenation, you are creating many string
instances, and that is very ineffecient. If you are concatenating a
string in a loop, always use stringbuilder.

Jul 21 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
8581
by: Collin VanDyck | last post by:
I have a basic understanding of this, so forgive me if I am overly simplistic in my explanation of my problem.. I am trying to get a Java/Xalan transform to pass through a numeric character reference (i.e.  ) and it seems to be converting the character to its UNICODE representation. Take this source XML document: <?xml version="1.0" encoding="UTF-8"?>
1
11445
by: DrTebi | last post by:
Hello, I have the following problem: I used to "encode" my email address within links, in order to avoid (most) email spiders. So I had a link like this: <a href="mailto:DrTebi@yahoo.com">DrTebi</a> This would work like a regular mailto link in any browser, but wouldn't be visible to spiders if they don't have a function to decode it.
0
2425
by: Thomas Scheffler | last post by:
Hi, I runned in trouble using XALAN for XSL-Transformation. The following snipplet show what I mean: <a href="http://blah.com/?test=test&amp;test2=test2">Test1&amp;</a> <a href="http://blah.com/?test=test&amp;amp;test2=test2">Test2&amp;amp;</a> This results in the following HTML Code:
4
3036
by: Luklrc | last post by:
Hi, I'm having to create a querysting with javascript. My problem is that javscript turns the "&" characher into "&amp;" when it gets used as a querystring in the url EG: /mypage.asp?value1=1&amp;value2=4&amp; ... which of course means nothing to asp.
4
3230
by: johkar | last post by:
When the output method is set to xml, even though I have CDATA around my JavaScript, the operaters of && and < are converted to XML character entities which causes errors in my JavaScript. I know that I could externalize my JavaScript, but that will not be practical throughout this application. Is there any way to get around this issue? Xalan processor. Stripped down stylesheet below along with XHTML output. <?xml version='1.0'?>...
8
2819
by: Nathan Sokalski | last post by:
I add a JavaScript event handler to some of my Webcontrols using the Attributes.Add() method as follows: Dim jscode as String = "return (event.keyCode>=65&&event.keyCode<=90);" TextBox2.Attributes.Add("onKeyPress", jscode) You will notice that jscode contains the JavaScript Logical And operator (&&). However, ASP.NET renders this as &amp;&amp; in the code that is
11
6447
by: Jeremy | last post by:
How can one stop a browser from converting &amp; to & ? We have a textarea in our system wehre a user can type in some html code and have it saved to the database. When the data is retireved and
14
5942
by: Arne | last post by:
A lot of Firefox users I know, says they have problems with validation where the ampersand sign has to be written as &amp; to be valid. I don't have Firefox my self and don't wont to install it only because of this, so I hope some of you gurus can enlighten me with this :) In what circumstances can the "&amp;" in the source code be involuntary changed to "&" by a browser when or other software, when editing and uploading the file to the web...
12
10124
by: InvalidLastName | last post by:
We have been used XslTransform. .NET 1.1, for transform XML document, Dataset with xsl to HTML. Some of these html contents contain javascript and links. For example: // javascript if (a &gt; b) ..... // xsl contents abc.aspx?p1=v1&amp;p2=<xsl:value-of select="$v2" />
7
4633
by: John Nagle | last post by:
I've been parsing existing HTML with BeautifulSoup, and occasionally hit content which has something like "Design & Advertising", that is, an "&" instead of an "&amp;". Is there some way I can get BeautifulSoup to clean those up? There are various parsing options related to "&" handling, but none of them seem to do quite the right thing. If I write the BeautifulSoup parse tree back out with "prettify", the loose "&" is still in there. So...
0
9594
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10595
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10343
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10341
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9171
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7634
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5530
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4308
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
3001
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.