473,407 Members | 2,326 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,407 software developers and data experts.

String.Replace Anomoly

Normally, I never have any problems with String.Replace(). However, I
found that I need to replace multiple instances of the character
"ª" (\xAA) with a # symbol. The input file is a simple one line
file. I read in the file into a string called strLine. Then when I
do a a simple replace ... here is what I have tried:

strLine = strLine.Replace("ª", "#"); // Doesn't replace ...

strLine = strLine.Replace(@"ª", "#"); // Doesn't replace ...

strLine = strLine.Replace(@"\xAA", "#"); // Doesn't replace ...

if (strLine.Contains(@"\xAA")) MessageBox.Show("found one"); // No
message box ...

if (strLine.Contains("ª")) MessageBox.Show("found one"); // No message
box ...

if (strLine.Contains(@"ª")) MessageBox.Show("found one"); // No
message box ...

Any ideas either what I'm doing wrong, or a better way to try to
replace is persistent character that just won't go away?

Sep 18 '07 #1
17 4715
Levidikus wrote:
Normally, I never have any problems with String.Replace(). However, I
found that I need to replace multiple instances of the character
"ª" (\xAA) with a # symbol. The input file is a simple one line
file. I read in the file into a string called strLine. Then when I
do a a simple replace ... here is what I have tried:

strLine = strLine.Replace("ª", "#"); // Doesn't replace ...

strLine = strLine.Replace(@"ª", "#"); // Doesn't replace ...

strLine = strLine.Replace(@"\xAA", "#"); // Doesn't replace ...

if (strLine.Contains(@"\xAA")) MessageBox.Show("found one"); // No
message box ...

if (strLine.Contains("ª")) MessageBox.Show("found one"); // No message
box ...

if (strLine.Contains(@"ª")) MessageBox.Show("found one"); // No
message box ...

Any ideas either what I'm doing wrong, or a better way to try to
replace is persistent character that just won't go away?
I think that you have tried every possible combination except the one
that works... Try this:

strLine = strLine.Replace("\xAA", "#");

--
Göran Andersson
_____
http://www.guffa.com
Sep 18 '07 #2
Look like there is a problem with your strLine.

Following works as expected

string test = "\xAA ª";
MessageBox.Show(
String.Concat(test,Environment.NewLine,
test.Replace("ª", "#"),
Environment.NewLine,
test.Replace("\xAA", "#")));

Sep 18 '07 #3

On Sep 18, 2:51 pm, Levidikus <james.h...@doveq.netwrote:
found that I need to replace multiple instances of the character
"ª" (\xAA) with a # symbol. The input file is a simple one line
The following works for me:

string feminineIndicatorChar = Char.ConvertFromUtf32(0xaa);
string b = feminineIndicatorChar.Replace(feminineIndicatorCha r,
"#"); // b == "#"

Are you sure your strLine really contains the feminine ordinal
indicator? Can you check in a debugger?

Sep 18 '07 #4
On Sep 18, 1:51 pm, Levidikus <james.h...@doveq.netwrote:
Normally, I never have any problems with String.Replace(). However, I
found that I need to replace multiple instances of the character
"ª" (\xAA) with a # symbol. The input file is a simple one line
file. I read in the file into a string called strLine. Then when I
do a a simple replace ... here is what I have tried:
<snip>
strLine = strLine.Replace(@"\xAA", "#"); // Doesn't replace ...
Your mistake is using a verbatim string literal. That is looking for a
substring of backslash, x, A, A. You want it to look for the string
represented by Unicode U+00AA, i.e. '\xAA'. In other words, you *want*
the character escaping which verbatim string literals remove. Just get
rid of the @ and it will be fine.

I would warn against using \x though - because the number of
characters used varies. For instance:

\xAAOkay - does what you want
\xAABad - doesn't do what you want (it'll be U+AABA and then 'd')

Use \u00aa instead - then there's no ambiguity.

Jon

Sep 18 '07 #5

On Sep 18, 3:28 pm, Roman Wagner <roman.wag...@gmail.comwrote:
Look like there is a problem with your strLine.
Actually, no; there is a problem with the OP's understanding of the @
character when used on strings.

Sep 18 '07 #6
On Sep 18, 8:51 am, Levidikus <james.h...@doveq.netwrote:
Normally, I never have any problems with String.Replace(). However, I
found that I need to replace multiple instances of the character
"ª" (\xAA) with a # symbol. The input file is a simple one line
file. I read in the file into a string called strLine. Then when I
do a a simple replace ... here is what I have tried:

strLine = strLine.Replace("ª", "#"); // Doesn't replace ...

strLine = strLine.Replace(@"ª", "#"); // Doesn't replace ...

strLine = strLine.Replace(@"\xAA", "#"); // Doesn't replace ...

if (strLine.Contains(@"\xAA")) MessageBox.Show("found one"); // No
message box ...

if (strLine.Contains("ª")) MessageBox.Show("found one"); // No message
box ...

if (strLine.Contains(@"ª")) MessageBox.Show("found one"); // No
message box ...

Any ideas either what I'm doing wrong, or a better way to try to
replace is persistent character that just won't go away?
*HOW* are you reading your text file? You need to match the encoding
with the encoding of the file. In this case, you'll probably need to
read the file with UTF7 encoding unless there are the encoding
specifiers at the beginning of the file.

I tried a file (and used File.ReadAllText()) with only 0xAA characters
and it would not run the replace unless I read the file UTF7...

Sep 18 '07 #7
On Sep 18, 3:07 pm, Doug Semler <dougsem...@gmail.comwrote:
*HOW* are you reading your text file? You need to match the encoding
with the encoding of the file. In this case, you'll probably need to
read the file with UTF7 encoding unless there are the encoding
specifiers at the beginning of the file.

I tried a file (and used File.ReadAllText()) with only 0xAA characters
and it would not run the replace unless I read the file UTF7...
UTF-7 is *very* rarely used - basically it's used in mail and that's
virtually it, as far as I'm aware. How did you save your file?

That isn't the problem in this case, however.

Jon

Sep 18 '07 #8
On Sep 18, 10:14 am, "Jon Skeet [C# MVP]" <sk...@pobox.comwrote:
On Sep 18, 3:07 pm, Doug Semler <dougsem...@gmail.comwrote:
*HOW* are you reading your text file? You need to match the encoding
with the encoding of the file. In this case, you'll probably need to
read the file with UTF7 encoding unless there are the encoding
specifiers at the beginning of the file.
I tried a file (and used File.ReadAllText()) with only 0xAA characters
and it would not run the replace unless I read the file UTF7...

UTF-7 is *very* rarely used - basically it's used in mail and that's
virtually it, as far as I'm aware. How did you save your file?

That isn't the problem in this case, however.

Jon
THen why couldn't I get the string replace to work if I didn't specify
the encoding as UTF7?
Sep 18 '07 #9
Thank you very much for all the valuable information!

I am reading the file in using a standard StreamReader, without any
special flags.

Sep 18 '07 #10
On Sep 18, 3:41 pm, Doug Semler <dougsem...@gmail.comwrote:
P.S. Not specifying an encoding gives me the ASCII result.
Specifying Encoding.Default (which resolves to SBCSCodePageEncoding)
gives me the correct behavior.
And that's because Encoding.Default uses the same as what "ANSI" means
in Notepad. UTF-7 just *happened* to work - and I suspect it shouldn't
really have done.

When you don't specify an encoding, almost everything in .NET assumes
UTF-8.

Jon

Sep 18 '07 #11
On Sep 18, 11:56 am, "Jon Skeet [C# MVP]" <sk...@pobox.comwrote:
On Sep 18, 3:41 pm, Doug Semler <dougsem...@gmail.comwrote:
P.S. Not specifying an encoding gives me the ASCII result.
Specifying Encoding.Default (which resolves to SBCSCodePageEncoding)
gives me the correct behavior.

And that's because Encoding.Default uses the same as what "ANSI" means
in Notepad. UTF-7 just *happened* to work - and I suspect it shouldn't
really have done.

When you don't specify an encoding, almost everything in .NET assumes
UTF-8.
Right. But my entire point is that the OP needs to specify the
correct encoding when opening the file. If he doesn't do that, NONE
of the (correct) solutions pointed out earlier will work. In this
case Encoding.Default (if you say UTF7 is wrong) needs to be passed to
the StreamReader constructor.

Sep 18 '07 #12

On Sep 18, 4:41 pm, Doug Semler <dougsem...@gmail.comwrote:
P.S. Not specifying an encoding gives me the ASCII result.
Specifying Encoding.Default (which resolves to SBCSCodePageEncoding)
gives me the correct behavior.
The default single-byte character set code page encoding (==
SBCSCodePageEncoding) is there to provide an encoding-less encoding,
as far as I can tell. It is to encodings what InvariantCulture is to
cultures: you can use it if you don't care about the encoding and
nobody but you will read what you've written using it. (In other
words; if and only if you wrote the file using Encoding.Default on the
same OS installation, it's safe to use Encoding.Default to read it
back.)

Sep 18 '07 #13
UL-Tomten <to****@gmail.comwrote:
On Sep 18, 4:41 pm, Doug Semler <dougsem...@gmail.comwrote:
P.S. Not specifying an encoding gives me the ASCII result.
Specifying Encoding.Default (which resolves to SBCSCodePageEncoding)
gives me the correct behavior.

The default single-byte character set code page encoding (==
SBCSCodePageEncoding) is there to provide an encoding-less encoding,
as far as I can tell. It is to encodings what InvariantCulture is to
cultures: you can use it if you don't care about the encoding and
nobody but you will read what you've written using it. (In other
words; if and only if you wrote the file using Encoding.Default on the
same OS installation, it's safe to use Encoding.Default to read it
back.)
The bit in brakcets is right - but it's *not* the same as saying it's
an "encoding-less encoding".

An encoding is basically a mapping between byte sequences and character
sequences. 8859-1 is as close to an "encoding-less encoding" as you'll
get, as it maps bytes 0-255 to Unicode 0-255; Encoding.Default doesn't
necessarily do that (and indeed doesn't in most environments).

For instance, on my box byte 128 converts to U+20AC (the Euro symbol).

Use of Encoding.Default should be regarded as "legacy" really - few
things should just use the default encoding for the OS.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Sep 19 '07 #14

On Sep 19, 8:43 am, Jon Skeet [C# MVP] <sk...@pobox.comwrote:
The default single-byte character set code page encoding (==
SBCSCodePageEncoding) is there to provide an encoding-less encoding,
as far as I can tell. It is to encodings what InvariantCulture is to
cultures: you can use it if you don't care about the encoding and
nobody but you will read what you've written using it. (In other
words; if and only if you wrote the file using Encoding.Default on the
same OS installation, it's safe to use Encoding.Default to read it
back.)

The bit in brakcets is right - but it's *not* the same as saying it's
an "encoding-less encoding".
Well, since an encoding by definition specifies encoding rules, I
thought that much was obvious... =]

Maybe there should have been an Encoding.InvariantEncoding instead of
an Encoding.Default, to communicate that the resulting bits are
unknown at compile-time, and perhaps avoid the temptation of using it
for text others might read back.

Sep 19 '07 #15

On Sep 19, 8:43 am, Jon Skeet [C# MVP] <sk...@pobox.comwrote:
8859-1 is as close to an "encoding-less encoding" as you'll
get, as it maps bytes 0-255 to Unicode 0-255;
I've always thought of that more as a curse than a blessing.

Sep 19 '07 #16
On Sep 19, 8:11 am, UL-Tomten <tom...@gmail.comwrote:
The bit in brakcets is right - but it's *not* the same as saying it's
an "encoding-less encoding".

Well, since an encoding by definition specifies encoding rules, I
thought that much was obvious... =]
But there's such a thing as a "trivial" encoding, which pretty much
sums up ISO-8859-1.
Maybe there should have been an Encoding.InvariantEncoding instead of
an Encoding.Default, to communicate that the resulting bits are
unknown at compile-time, and perhaps avoid the temptation of using it
for text others might read back.
InvariantEncoding sounds like it would do the same on all boxes,
regardless of environment though. Encoding.Default *isn't* invariant -
it varies by environment. I'd have preferred
Encoding.OperatingSystemDefault or something similar. Certainly it
gets confusing that Encoding.Default isn't the encoding which is used
by default by most .NET classes :)

Jon

Sep 19 '07 #17
[snip]

Thank you again for all of the outstanding responses.

The file that I am working with is originated on a solaris 8 unix
system. How would I go about identifying the correct "encoding"?
Also, with the File.ReadAllText(), would I even need a streamreader
for that?

Once again, thanks for all the feedback!

James
Oct 13 '07 #18

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: higabe | last post by:
Three questions 1) I have a string function that works perfectly but according to W3C.org web site is syntactically flawed because it contains the characters </ in sequence. So how am I...
3
by: MLH | last post by:
Most of you will recognize that I'm talking about Rick Fisher's product. I bought the version for Access 2.0 thirteen years ago and fell in love with it immediately. Most of you know, by virtue of...
32
by: tshad | last post by:
Can you do a search for more that one string in another string? Something like: someString.IndexOf("something1","something2","something3",0) or would you have to do something like: if...
9
by: Crirus | last post by:
dim pp as string pp="{X=356, Y=256}{X=356, Y=311.2285}{X=311.2285, Y=356}{X=256, Y=356}{X=200.7715, Y=356}{X=156, Y=311.2285}{X=156, Y=256}{X=156, Y=200.7715}{X=200.7715, Y=156}{X=256,...
9
by: Peter Row | last post by:
Hi, I know this has been asked before, but reading the threads it is still not entirely clear. Deciding which .Replace( ) to use when. Typically if I create a string in a loop I always use a...
87
by: Robert Seacord | last post by:
The SEI has published CMU/SEI-2006-TR-006 "Specifications for Managed Strings" and released a "proof-of-concept" implementation of the managed string library. The specification, source code for...
10
by: Samuel Karl Peterson | last post by:
Greetings Pythonistas. I have recently discovered a strange anomoly with string.replace. It seemingly, randomly does not deal with characters of ordinal value 127. I ran into this problem while...
15
by: morleyc | last post by:
Hi, i would like to remove a number of characters from my string (\t \r \n which are throughout the string), i know regex can do this but i have no idea how. Any pointers much appreciated. Chris
3
by: kronus | last post by:
I'm receiving an xml file that has a child called modified and it represents a date value in the form of a string -- Nov 14, 2008 -- and in my app, I have items associated with each object and I'm...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.