473,382 Members | 1,689 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

String Interning and Thread Locking

I've spent some time recently looking into optimizing some memory usage in
our products. Much of this was doing through the use of string Interning. I
spent the time and checked numbers in both x86 and x64, and have published
the results here:
http://www.coversant.com/dotnetnuke/...=88&EntryID=24

The benefits for our SoapBox suite of products are pretty compelling, memory
wise.

Before I roll the changes into our products, I have a very real concern:

The Intern Pool is (according to Richter) Process wide, and therefore shared
by any number of AppDomains. By implication, this also means all the threads
in the process share a single intern pool.

Our products are very heavily multi-threaded and I'm worried that accessing
the Intern Pool from a number of threads is going to introduce signifigant
locking that I don't have any control over.

Does anyone know what locking algorithms the Intern pool is using? Is it
Monitor semantics, Reader-Writer Lock semantics, or (I hope not) a remoting
construct to all cross appdomain synchronization?

I don't have any visability into the Intern pool, and without that
visability I'm very hesitant to use it. I poked around with Reflector, but
it ends up in an Internal Call method pretty quickly.

--
Chris Mullins, MCSD.NET, MCPD:Enterprise
http://www.coversant.net/blogs/cmullins
Nov 8 '06 #1
5 1675
Hi Chris,

There's a good chance that you can see for yourself in our shared source
release
(http://www.microsoft.com/downloads/d...displaylang=en)
by following that internal call or just searching for a string literal
implementation.

If we don't provide the implementation, I encourage you to write a
multi-threaded test with a representative number of readers and writers. I
believe you will find things very performant, and to check you could compare
against your own implementation of any of those locking schemes.

Hope this helps,
Dave

"Chris Mullins" wrote:
I've spent some time recently looking into optimizing some memory usage in
our products. Much of this was doing through the use of string Interning. I
spent the time and checked numbers in both x86 and x64, and have published
the results here:
http://www.coversant.com/dotnetnuke/...=88&EntryID=24

The benefits for our SoapBox suite of products are pretty compelling, memory
wise.

Before I roll the changes into our products, I have a very real concern:

The Intern Pool is (according to Richter) Process wide, and therefore shared
by any number of AppDomains. By implication, this also means all the threads
in the process share a single intern pool.

Our products are very heavily multi-threaded and I'm worried that accessing
the Intern Pool from a number of threads is going to introduce signifigant
locking that I don't have any control over.

Does anyone know what locking algorithms the Intern pool is using? Is it
Monitor semantics, Reader-Writer Lock semantics, or (I hope not) a remoting
construct to all cross appdomain synchronization?

I don't have any visability into the Intern pool, and without that
visability I'm very hesitant to use it. I poked around with Reflector, but
it ends up in an Internal Call method pretty quickly.

--
Chris Mullins, MCSD.NET, MCPD:Enterprise
http://www.coversant.net/blogs/cmullins
Nov 13 '06 #2
My issue is that there's no guarantee the Shared Source implementation, and
the production .Net implementation are the same.

Likewise, unless it's documented somewhere in the ECMA standards, there's no
guarantee it won't change out from under me.

Rather than relying on the built-in string interning, I'm just using a
custom implementation. This will also allow me to sweep the intern table
from time to time, and remove unused strings. In a long running server
process, the idea of a string table that grows without bounds and provides
no mechanism for removing values just doesn't have very much appeal.

--
Chris Mullins, MCSD.NET, MCPD:Enterprise
http://www.coversant.net/blogs/cmullins

"Dave Hiniker - MSFT" <Da*************@discussions.microsoft.comwrote in
message news:68**********************************@microsof t.com...
Hi Chris,

There's a good chance that you can see for yourself in our shared source
release
(http://www.microsoft.com/downloads/d...displaylang=en)
by following that internal call or just searching for a string literal
implementation.

If we don't provide the implementation, I encourage you to write a
multi-threaded test with a representative number of readers and writers.
I
believe you will find things very performant, and to check you could
compare
against your own implementation of any of those locking schemes.

Hope this helps,
Dave

"Chris Mullins" wrote:
>I've spent some time recently looking into optimizing some memory usage
in
our products. Much of this was doing through the use of string Interning.
I
spent the time and checked numbers in both x86 and x64, and have
published
the results here:
http://www.coversant.com/dotnetnuke/...=88&EntryID=24

The benefits for our SoapBox suite of products are pretty compelling,
memory
wise.

Before I roll the changes into our products, I have a very real concern:

The Intern Pool is (according to Richter) Process wide, and therefore
shared
by any number of AppDomains. By implication, this also means all the
threads
in the process share a single intern pool.

Our products are very heavily multi-threaded and I'm worried that
accessing
the Intern Pool from a number of threads is going to introduce
signifigant
locking that I don't have any control over.

Does anyone know what locking algorithms the Intern pool is using? Is it
Monitor semantics, Reader-Writer Lock semantics, or (I hope not) a
remoting
construct to all cross appdomain synchronization?

I don't have any visability into the Intern pool, and without that
visability I'm very hesitant to use it. I poked around with Reflector,
but
it ends up in an Internal Call method pretty quickly.

--
Chris Mullins, MCSD.NET, MCPD:Enterprise
http://www.coversant.net/blogs/cmullins

Nov 14 '06 #3
Chris,

I'm certainly not an expert at the internal workings of the CLR, but if
I were absolutely forced to make a guess I would say that the intern
pool probably uses some sort of very fast low-lock (or even lock-free)
hashtable implementation. Though, a monitor like implementation seems
reasonable. I think a reader-writer lock would be terribly slow in
this situation since the operation is not bound to IO or other blocking
resource. A remoting construct seems unlikely since the call quickly
dives into the internal CLR code wher it's not unreasonable to think
the standard AppDomain access rules do not apply.

Brian

Chris Mullins wrote:
I've spent some time recently looking into optimizing some memory usage in
our products. Much of this was doing through the use of string Interning. I
spent the time and checked numbers in both x86 and x64, and have published
the results here:
http://www.coversant.com/dotnetnuke/...=88&EntryID=24

The benefits for our SoapBox suite of products are pretty compelling, memory
wise.

Before I roll the changes into our products, I have a very real concern:

The Intern Pool is (according to Richter) Process wide, and therefore shared
by any number of AppDomains. By implication, this also means all the threads
in the process share a single intern pool.

Our products are very heavily multi-threaded and I'm worried that accessing
the Intern Pool from a number of threads is going to introduce signifigant
locking that I don't have any control over.

Does anyone know what locking algorithms the Intern pool is using? Is it
Monitor semantics, Reader-Writer Lock semantics, or (I hope not) a remoting
construct to all cross appdomain synchronization?

I don't have any visability into the Intern pool, and without that
visability I'm very hesitant to use it. I poked around with Reflector, but
it ends up in an Internal Call method pretty quickly.

--
Chris Mullins, MCSD.NET, MCPD:Enterprise
http://www.coversant.net/blogs/cmullins
Nov 14 '06 #4
Hi Brian,

I'm curious, what makes you think about the lock constructs that way?

For example, in a number of cases, lock-free is slower than locking. It's
only under certain conditions that these two reverse roles, and lock-free
becomes the quicker of the two.

Also, I'm curious as to why you thin reader-writer lock would be slower than
a monitor? There's nothing inheriently I/O related to the reader-writer lock
that I know of. For an intern pool, I would imagine 90%+ of the operations
would be reads against a hashtable of some sort - which would imply a
reader-writer lock as an applicable algorithm. What here am I missing?

I agree with your assessment of "standard AppDomain access rules do not
apply", but it's always good to know for sure.

--
Chris Mullins

"Brian Gideon" <br*********@yahoo.comwrote
Chris,

I'm certainly not an expert at the internal workings of the CLR, but if
I were absolutely forced to make a guess I would say that the intern
pool probably uses some sort of very fast low-lock (or even lock-free)
hashtable implementation. Though, a monitor like implementation seems
reasonable. I think a reader-writer lock would be terribly slow in
this situation since the operation is not bound to IO or other blocking
resource. A remoting construct seems unlikely since the call quickly
dives into the internal CLR code wher it's not unreasonable to think
the standard AppDomain access rules do not apply.

Brian

Chris Mullins wrote:
>I've spent some time recently looking into optimizing some memory usage
in
our products. Much of this was doing through the use of string Interning.
I
spent the time and checked numbers in both x86 and x64, and have
published
the results here:
http://www.coversant.com/dotnetnuke/...=88&EntryID=24

The benefits for our SoapBox suite of products are pretty compelling,
memory
wise.

Before I roll the changes into our products, I have a very real concern:

The Intern Pool is (according to Richter) Process wide, and therefore
shared
by any number of AppDomains. By implication, this also means all the
threads
in the process share a single intern pool.

Our products are very heavily multi-threaded and I'm worried that
accessing
the Intern Pool from a number of threads is going to introduce
signifigant
locking that I don't have any control over.

Does anyone know what locking algorithms the Intern pool is using? Is it
Monitor semantics, Reader-Writer Lock semantics, or (I hope not) a
remoting
construct to all cross appdomain synchronization?

I don't have any visability into the Intern pool, and without that
visability I'm very hesitant to use it. I poked around with Reflector,
but
it ends up in an Internal Call method pretty quickly.

--
Chris Mullins, MCSD.NET, MCPD:Enterprise
http://www.coversant.net/blogs/cmullins

Nov 14 '06 #5
Chris,

Honestly, it's nothing more than a guess. And I'm more than willing to
be very wrong about it.

You bring up a good point that low-lock strategies can be slower in
some scenarios. I hadn't originally considered that.

Similarly, the reader-writer lock would only be faster in some
scenarios as well. It's been my experience (admittedly limited) that
reader-writer locks work best in scenarios where the critical section
spends some time in a wait state. In the case of the intern pool I
would be surprised if a reader-writer lock were faster than a monitor
even if reads outnumbered writes 10 to 1. It might be beneficial for
me to reexamine the ReaderWriterLock and compare it against the Monitor
to see what scenarios it performs better and what the break even point
is.

Brian

Chris Mullins wrote:
Hi Brian,

I'm curious, what makes you think about the lock constructs that way?

For example, in a number of cases, lock-free is slower than locking. It's
only under certain conditions that these two reverse roles, and lock-free
becomes the quicker of the two.

Also, I'm curious as to why you thin reader-writer lock would be slower than
a monitor? There's nothing inheriently I/O related to the reader-writer lock
that I know of. For an intern pool, I would imagine 90%+ of the operations
would be reads against a hashtable of some sort - which would imply a
reader-writer lock as an applicable algorithm. What here am I missing?

I agree with your assessment of "standard AppDomain access rules do not
apply", but it's always good to know for sure.

--
Chris Mullins
Nov 15 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

14
by: Bob | last post by:
I have a function that takes in a list of IDs (hundreds) as input parameter and needs to pass the data to another step as a comma delimited string. The source can easily create this list of IDs in...
10
by: Daniel | last post by:
how to make two references to one string that stay refered to the same string reguardless of the changing value in the string?
5
by: Lloyd Dupont | last post by:
in an ASP.NET page I have some static method which get value for the cache or, if the cache is empty, query the database, put the value in the cache and return it. because ASP.NET is thread...
26
by: anonieko | last post by:
In the past I always used "" everywhere for empty string in my code without a problem. Now, do you think I should use String.Empty instead of "" (at all times) ? Let me know your thoughts.
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.