Simple Encoding Question | | |
Hello,
I am playing a little with Encoding, and I have what is possibly (forgive
me) a newbie-type question.
I have a function that takes a string and a codepage (based upon the basic
MSDN help - look for "Using Unicode Encoding").
Anyway, I want to pass the contents of a textbox "as is", but always get a
literal string. Using the example from MSDN, say I want to pass in this
string:
"\u307b,\u308b,\u305a,\u3042,\u306d"
If I pass in Textbox.Text, I get
@"\u307b,\u308b,\u305a,\u3042,\u306d", which is not what I want.
In other words, instead of "Unicode character 307b, unicode 308b", etc., I
get "slash u three..."
Thanks,
pagates | | | | re: Simple Encoding Question
pagates <pagates@discussions.microsoft.com> wrote:[color=blue]
> I am playing a little with Encoding, and I have what is possibly (forgive
> me) a newbie-type question.
>
> I have a function that takes a string and a codepage (based upon the basic
> MSDN help - look for "Using Unicode Encoding").
>
> Anyway, I want to pass the contents of a textbox "as is", but always get a
> literal string. Using the example from MSDN, say I want to pass in this
> string:
> "\u307b,\u308b,\u305a,\u3042,\u306d"
> If I pass in Textbox.Text, I get
> @"\u307b,\u308b,\u305a,\u3042,\u306d", which is not what I want.
>
> In other words, instead of "Unicode character 307b, unicode 308b", etc., I
> get "slash u three..."[/color]
Could you post a short but complete program which demonstrates the
problem?
See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.
Here's a sample program which seems to go against your post:
using System.Windows.Forms;
using System.Drawing;
using System;
public class Test
{
static void Main()
{
Form f = new Form();
f.Size = new Size(200, 200);
TextBox tb = new TextBox();
tb.Text = "\u307b,\u308b,\u305a,\u3042,\u306d";
f.Controls.Add(tb);
Application.Run(f);
}
}
While that only displays boxes and commas on my box, it makes the point
that it's *not* displaying "\u307b (etc)".
--
Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too | | | | re: Simple Encoding Question
Hi Jon,
Thanks for the reply, but you have my problem in reverse. Here is a SBCP
that demonstrates what I'm trying to achieve: -
using System.Windows.Forms;
-
using System.Drawing;
-
using System.Text;
-
-
public class frmTest : Form
-
{
-
private Button btn;
-
private TextBox tb;
-
private ListView lv;
-
private ColumnHeader colByte;
-
private ColumnHeader colChar;
-
-
public frmTest()
-
{
-
InitializeComponent();
-
}
-
-
private void InitializeComponent()
-
{
-
btn = new Button();
-
lv = new ListView();
-
colByte = new ColumnHeader();
-
colChar = new ColumnHeader();
-
tb = new TextBox();
-
-
// btnEncode
-
btn.Location = new Point(424, 8);
-
btn.Size = new Size(64, 24);
-
btn.Text = "Encode";
-
btn.Click += new System.EventHandler(btn_Click);
-
-
// lv
-
lv.Columns.AddRange(new ColumnHeader[] { colByte,colChar });
-
lv.Location = new Point(0, 80);
-
lv.Size = new Size(496, 240);
-
lv.View = System.Windows.Forms.View.Details;
-
-
// colByte, colChar
-
colByte.Text = "Byte";
-
colByte.Width = 33;
-
colChar.Text = "Character";
-
colChar.Width = 58;
-
-
// tb
-
tb.Location = new Point(8, 8);
-
tb.Size = new Size(408, 21);
-
tb.Text = "This is the text that will be encoded.";
-
-
// frmTest
-
ClientSize = new Size(496, 318);
-
Controls.Add(tb);
-
Controls.Add(lv);
-
Controls.Add(btn);
-
}
-
-
private void PrintCPBytes(string str, int codePage)
-
{
-
Encoding targetEncoding;
-
byte[] encodedChars;
-
-
targetEncoding = Encoding.GetEncoding(codePage);
-
-
// Gets the byte representation of the specified string.
-
encodedChars = targetEncoding.GetBytes(str);
-
-
for (int i = 0; i < encodedChars.Length; i++)
-
{
-
ListViewItem lItem = new ListViewItem(i.ToString());
-
lItem.SubItems.Add(encodedChars[i].ToString());
-
lv.Items.Add(lItem);
-
}
-
-
}
-
-
private void btn_Click(object sender, System.EventArgs e)
-
{
-
lv.Items.Clear();
-
PrintCPBytes(tb.Text, 1252); // 1252 is Latin, 932 is Japanese
-
PrintCPBytes(tb.Text, 932); // 1252 is Latin, 932 is Japanese
-
}
-
-
static void Main()
-
{
-
Application.Run(new frmTest());
-
}
-
}
-
I'd like to put "\u307b" (etc) into the TextBox, and apply that to the
PrintCPBytes function.
Thanks,
pagates
"Jon Skeet [C# MVP]" wrote:
[color=blue]
> pagates <pagates@discussions.microsoft.com> wrote:[color=green]
> > I am playing a little with Encoding, and I have what is possibly (forgive
> > me) a newbie-type question.
> >
> > I have a function that takes a string and a codepage (based upon the basic
> > MSDN help - look for "Using Unicode Encoding").
> >
> > Anyway, I want to pass the contents of a textbox "as is", but always get a
> > literal string. Using the example from MSDN, say I want to pass in this
> > string:
> > "\u307b,\u308b,\u305a,\u3042,\u306d"
> > If I pass in Textbox.Text, I get
> > @"\u307b,\u308b,\u305a,\u3042,\u306d", which is not what I want.
> >
> > In other words, instead of "Unicode character 307b, unicode 308b", etc., I
> > get "slash u three..."[/color]
>
> Could you post a short but complete program which demonstrates the
> problem?
>
> See http://www.pobox.com/~skeet/csharp/complete.html for details of
> what I mean by that.
>
> Here's a sample program which seems to go against your post:
>
> using System.Windows.Forms;
> using System.Drawing;
> using System;
>
> public class Test
> {
> static void Main()
> {
> Form f = new Form();
> f.Size = new Size(200, 200);
> TextBox tb = new TextBox();
> tb.Text = "\u307b,\u308b,\u305a,\u3042,\u306d";
> f.Controls.Add(tb);
> Application.Run(f);
> }
> }
>
> While that only displays boxes and commas on my box, it makes the point
> that it's *not* displaying "\u307b (etc)".
>
> --
> Jon Skeet - <skeet@pobox.com>
> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
> If replying to the group, please do not mail me too
>[/color] | | | | re: Simple Encoding Question
On Mon, 10 Oct 2005 10:29:04 -0700, "pagates"
<pagates@discussions.microsoft.com> wrote:[color=blue]
>Anyway, I want to pass the contents of a textbox "as is", but always get a
>literal string. Using the example from MSDN, say I want to pass in this
>string:
> "\u307b,\u308b,\u305a,\u3042,\u306d"
>
>If I pass in Textbox.Text, I get
> @"\u307b,\u308b,\u305a,\u3042,\u306d", which is not what I want.
>
>In other words, instead of "Unicode character 307b, unicode 308b", etc., I
>get "slash u three..."[/color]
So you basically want to apply escape character parsing on the textbox
string. I don't know if the framework has any such function. Someone
else may answer that, otherwise here is a method that implements the
basic, including the unicode escape code parsing that you want. If you
need to be able to handle other escape codes you have to add it
manually:
static string ParseBackSlashString(string s)
{
StringBuilder sb = new StringBuilder(s.Length);
for (int i = 0; i < s.Length; i++)
{
if (s[i] == '\\')
{
i++;
if (i >= s.Length) //There must be a character after
backslash
throw new ApplicationException("String may not end
with a \\.");
switch (s[i])
{
case '\\':
sb.Append('\\');
break;
case 'u':
if (i + 4 >= s.Length)
throw new ApplicationException("Unrecognized
escape sequence.");
else
{
int value = 0;
for (int j = 1; j <= 4; j++)
{
char c = s[i+j];
if (c >= '0' && c <= '9')
value += (int)Math.Pow(16, 4-j)*
(c-'0');
else if (c >= 'a' && c <= 'f')
value += (int)Math.Pow(16, 4 - j) * (c
- 'a' + 10);
else if (c >= 'A' && c <= 'F')
value += (int)Math.Pow(16, 4 - j) * (c
- 'A' + 10);
else
throw new
ApplicationException("Unrecognized escape sequence.");
}
sb.Append((char)value);
i += 4;
}
break;
default:
throw new ApplicationException("Unrecognized
escape sequence.");
}
}
else //This is the default when there isn't a backslash
{
sb.Append(s[i]);
}
}
return sb.ToString();
}
--
Marcus Andrén | | | | re: Simple Encoding Question
pagates <pagates@discussions.microsoft.com> wrote:
<snip>
[color=blue]
> I'd like to put "\u307b" (etc) into the TextBox, and apply that to the
> PrintCPBytes function.[/color]
In that case, you'll have to parse the text. The TextBox itself doesn't
(and shouldn't!) care about C# escaping rules.
You'll need to look for \u in a string, and then parse the next 4
characters as a hex number (eg using Convert.ToInt32(string,int),
specifying 16 as the base).
--
Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too | | | | re: Simple Encoding Question
> if (i + 4 >= s.Length)[color=blue]
> throw new ApplicationException("Unrecognized
>escape sequence.");
> else
> {
> int value = 0;
> for (int j = 1; j <= 4; j++)
> {
> char c = s[i+j];
> if (c >= '0' && c <= '9')
> value += (int)Math.Pow(16, 4-j)*
>(c-'0');
> else if (c >= 'a' && c <= 'f')
> value += (int)Math.Pow(16, 4 - j) * (c
>- 'a' + 10);
> else if (c >= 'A' && c <= 'F')
> value += (int)Math.Pow(16, 4 - j) * (c
>- 'A' + 10);
> else
> throw new
>ApplicationException("Unrecognized escape sequence.");
> }
> sb.Append((char)value);
> i += 4;
> }
> break;[/color]
After having read Jon Skeet's reply I would just like to mention that
the following code would be more readable using the Convert.ToInt32
method that he mentioned:
try
{
int value = Convert.ToInt32(s.Substring(i + 1, 4), 16);
sb.Append((char)value);
i += 4;
break;
}
catch (System.FormatException)
{
throw new ApplicationException("Unrecognized escape sequence.");
}
catch (System.ArgumentOutOfRangeException)
{
throw new ApplicationException("Unrecognized escape sequence.");
} | | | | re: Simple Encoding Question
Jon,
The C# compiler already does this. In the interest of "reusable code",
don't you think, since the code is already written, it would have been nice
to have this method available/exposed? --I don't even know where to look
(using Roeder's Reflector) to see if this is possible or not...
In addition to this specific case, I find, sometimes, I find myself writing
code to reproduce something the framework already does. (Of course, I can't
remember exactly what it was I was trying to do.)
Scott
"Jon Skeet [C# MVP]" <skeet@pobox.com> wrote in message
news:MPG.1db4f2c9b57e4dc698c8b4@msnews.microsoft.c om...[color=blue]
> pagates <pagates@discussions.microsoft.com> wrote:
>
> <snip>
>[color=green]
>> I'd like to put "\u307b" (etc) into the TextBox, and apply that to the
>> PrintCPBytes function.[/color]
>
> In that case, you'll have to parse the text. The TextBox itself doesn't
> (and shouldn't!) care about C# escaping rules.
>
> You'll need to look for \u in a string, and then parse the next 4
> characters as a hex number (eg using Convert.ToInt32(string,int),
> specifying 16 as the base).
>
> --
> Jon Skeet - <skeet@pobox.com>
> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
> If replying to the group, please do not mail me too[/color] | | | | re: Simple Encoding Question
Scott Coonce <sdcoonce@gmail.HEY_YOU.com> wrote:[color=blue]
> The C# compiler already does this. In the interest of "reusable code",
> don't you think, since the code is already written, it would have been nice
> to have this method available/exposed? --I don't even know where to look
> (using Roeder's Reflector) to see if this is possible or not...[/color]
Hmmm... I'm not at all sure. It's quite possible that it does this
escaping within an internal structure which shouldn't be exposed, or at
the same time as maintaining other state. There may be some way of
doing it at the moment using the compiler services, but it's likely to
be much more tortuous than just writing the code by hand.
--
Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too | | | | re: Simple Encoding Question
Thanks, all. I was afraid that I was going to have to parse it, but I wanted
to make sure I wasn't reinventing the .NET wheel.
Thanks again,
pagates
"Marcus Andrin" wrote:
[color=blue]
> After having read Jon Skeet's reply I would just like to mention that
> the following code would be more readable using the Convert.ToInt32
> method that he mentioned:
>
> try
> {
> int value = Convert.ToInt32(s.Substring(i + 1, 4), 16);
> sb.Append((char)value);
> i += 4;
> break;
> }
> catch (System.FormatException)
> {
> throw new ApplicationException("Unrecognized escape sequence.");
> }
> catch (System.ArgumentOutOfRangeException)
> {
> throw new ApplicationException("Unrecognized escape sequence.");
> }
>[/color] |  | Similar C# / C Sharp bytes | | | /bytes/about
We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights.
Get the best answers to your questions from over 226,471 network members.
|