469,950 Members | 1,872 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,950 developers. It's quick & easy.

String constructor returning interned string?

I've just noticed something rather odd and disturbing. The following
code displays "True":

using System;

class Test
{
public static void Main(string[] args)
{
string x = new string ("".ToCharArray());
string y = new string ("".ToCharArray());
Console.WriteLine (object.ReferenceEquals (x, y));
}
}

In other words, new string(...) is *not* returning a new string
reference.

This worries me - not so much for the specific example, but for the
precedent set. What other new ... expressions might return non-new
references? This could have significant implications in multi-
threading, where you may rely on two references being different for
locking purposes.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 15 '05 #1
4 1508

"Jon Skeet" <sk***@pobox.com> wrote in message
news:MP************************@news.microsoft.com ...
I've just noticed something rather odd and disturbing. The following
code displays "True":

using System;

class Test
{
public static void Main(string[] args)
{
string x = new string ("".ToCharArray());
string y = new string ("".ToCharArray());
Console.WriteLine (object.ReferenceEquals (x, y));
}
}

In other words, new string(...) is *not* returning a new string
reference.

This worries me - not so much for the specific example, but for the
precedent set. What other new ... expressions might return non-new
references? This could have significant implications in multi-
threading, where you may rely on two references being different for
locking purposes.

This is odd...quite honestly. However, have you checked to make sure the JIT
isn't making some kind of very clever optimzation here? Perhaps realizing
your creating two strings from the same source(the char array of an already
interned string), and not creating the new object but instead setting both x
& y to the same reference?
Otherwise this is a quite disturbing find, indeed.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 15 '05 #2
Daniel O'Connell <onyxkirx@--NOSPAM--comcast.net> wrote:
This worries me - not so much for the specific example, but for the
precedent set. What other new ... expressions might return non-new
references? This could have significant implications in multi-
threading, where you may rely on two references being different for
locking purposes.
This is odd...quite honestly. However, have you checked to make sure the JIT
isn't making some kind of very clever optimzation here? Perhaps realizing
your creating two strings from the same source(the char array of an already
interned string), and not creating the new object but instead setting both x
& y to the same reference?


It only happens with the empty string, as far as I can see.
Otherwise this is a quite disturbing find, indeed.


I think it's disturbing either way - basically, if you rely on the new
operator always returning a previously unknown reference, you've got
problems.

However, I've looked at the String(char[]) docs, and the remarks say:

<quote>
If value is a null reference (Nothing in Visual Basic) or contains no
element, an Empty instance is initialized.
</quote>

I suspect if I hadn't known what that meant beforehand (due to seeing
this) I wouldn't have understood it.

I'd have thought this would actually take more work, and that there
wouldn't really be that much benefit in it. I just wonder where else
this might be lurking...

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Nov 15 '05 #3
There's more on .NET's string interning over on Chris Brumme's blog:

http://blogs.gotdotnet.com/cbrumme/P...3-3d7a0dbba270

He even has some example code snippets to illustrate how it can bite you
when calling into unmanaged code.

/kel

Jon Skeet wrote:
Daniel O'Connell <onyxkirx@--NOSPAM--comcast.net> wrote:
This worries me - not so much for the specific example, but for the
precedent set. What other new ... expressions might return non-new
references? This could have significant implications in multi-
threading, where you may rely on two references being different for
locking purposes.


This is odd...quite honestly. However, have you checked to make sure the JIT
isn't making some kind of very clever optimzation here? Perhaps realizing
your creating two strings from the same source(the char array of an already
interned string), and not creating the new object but instead setting both x
& y to the same reference?

It only happens with the empty string, as far as I can see.

Otherwise this is a quite disturbing find, indeed.

I think it's disturbing either way - basically, if you rely on the new
operator always returning a previously unknown reference, you've got
problems.

However, I've looked at the String(char[]) docs, and the remarks say:

<quote>
If value is a null reference (Nothing in Visual Basic) or contains no
element, an Empty instance is initialized.
</quote>

I suspect if I hadn't known what that meant beforehand (due to seeing
this) I wouldn't have understood it.

I'd have thought this would actually take more work, and that there
wouldn't really be that much benefit in it. I just wonder where else
this might be lurking...


Nov 15 '05 #4
Hi, Jon,
Null and Empty string are the only cases.
This is to avoid too many instances of empty strings in CLR.
But I agree we probably should always create new strings in new String(...)

Gang Peng
[MS]

"Jon Skeet" <sk***@pobox.com> wrote in message
news:MP************************@news.microsoft.com ...
Daniel O'Connell <onyxkirx@--NOSPAM--comcast.net> wrote:
This worries me - not so much for the specific example, but for the
precedent set. What other new ... expressions might return non-new
references? This could have significant implications in multi-
threading, where you may rely on two references being different for
locking purposes.


This is odd...quite honestly. However, have you checked to make sure the JIT isn't making some kind of very clever optimzation here? Perhaps realizing your creating two strings from the same source(the char array of an already interned string), and not creating the new object but instead setting both x & y to the same reference?


It only happens with the empty string, as far as I can see.
Otherwise this is a quite disturbing find, indeed.


I think it's disturbing either way - basically, if you rely on the new
operator always returning a previously unknown reference, you've got
problems.

However, I've looked at the String(char[]) docs, and the remarks say:

<quote>
If value is a null reference (Nothing in Visual Basic) or contains no
element, an Empty instance is initialized.
</quote>

I suspect if I hadn't known what that meant beforehand (due to seeing
this) I wouldn't have understood it.

I'd have thought this would actually take more work, and that there
wouldn't really be that much benefit in it. I just wonder where else
this might be lurking...

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Nov 15 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Byron Morgan | last post: by
11 posts views Thread by Zeng | last post: by
7 posts views Thread by Dale | last post: by
34 posts views Thread by Larry Hastings | last post: by
35 posts views Thread by Smithers | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.