473,320 Members | 1,823 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Simple Encoding Question

Hello,

I am playing a little with Encoding, and I have what is possibly (forgive
me) a newbie-type question.

I have a function that takes a string and a codepage (based upon the basic
MSDN help - look for "Using Unicode Encoding").

Anyway, I want to pass the contents of a textbox "as is", but always get a
literal string. Using the example from MSDN, say I want to pass in this
string:
"\u307b,\u308b,\u305a,\u3042,\u306d"

If I pass in Textbox.Text, I get
@"\u307b,\u308b,\u305a,\u3042,\u306d", which is not what I want.

In other words, instead of "Unicode character 307b, unicode 308b", etc., I
get "slash u three..."

Thanks,
pagates
Nov 17 '05 #1
8 1853
pagates <pa*****@discussions.microsoft.com> wrote:
I am playing a little with Encoding, and I have what is possibly (forgive
me) a newbie-type question.

I have a function that takes a string and a codepage (based upon the basic
MSDN help - look for "Using Unicode Encoding").

Anyway, I want to pass the contents of a textbox "as is", but always get a
literal string. Using the example from MSDN, say I want to pass in this
string:
"\u307b,\u308b,\u305a,\u3042,\u306d"
If I pass in Textbox.Text, I get
@"\u307b,\u308b,\u305a,\u3042,\u306d", which is not what I want.

In other words, instead of "Unicode character 307b, unicode 308b", etc., I
get "slash u three..."


Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.

Here's a sample program which seems to go against your post:

using System.Windows.Forms;
using System.Drawing;
using System;

public class Test
{
static void Main()
{
Form f = new Form();
f.Size = new Size(200, 200);
TextBox tb = new TextBox();
tb.Text = "\u307b,\u308b,\u305a,\u3042,\u306d";
f.Controls.Add(tb);
Application.Run(f);
}
}

While that only displays boxes and commas on my box, it makes the point
that it's *not* displaying "\u307b (etc)".

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #2
Hi Jon,

Thanks for the reply, but you have my problem in reverse. Here is a SBCP
that demonstrates what I'm trying to achieve:

Expand|Select|Wrap|Line Numbers
  1. using System.Windows.Forms;
  2. using System.Drawing;
  3. using System.Text;
  4.  
  5. public class frmTest : Form
  6. {
  7. private Button   btn;
  8. private TextBox  tb;
  9. private ListView lv;
  10. private ColumnHeader colByte;
  11. private ColumnHeader colChar;
  12.  
  13. public frmTest()
  14. {
  15. InitializeComponent();
  16. }
  17.  
  18. private void InitializeComponent()
  19. {
  20. btn = new Button();
  21. lv = new ListView();
  22. colByte = new ColumnHeader();
  23. colChar = new ColumnHeader();
  24. tb = new TextBox();
  25.  
  26. // btnEncode
  27. btn.Location = new Point(424, 8);
  28. btn.Size = new Size(64, 24);
  29. btn.Text = "Encode";
  30. btn.Click += new System.EventHandler(btn_Click);
  31.  
  32. // lv
  33. lv.Columns.AddRange(new ColumnHeader[] { colByte,colChar });
  34. lv.Location = new Point(0, 80);
  35. lv.Size = new Size(496, 240);
  36. lv.View = System.Windows.Forms.View.Details;
  37.  
  38. // colByte, colChar
  39. colByte.Text = "Byte";
  40. colByte.Width = 33;
  41. colChar.Text = "Character";
  42. colChar.Width = 58;
  43.  
  44. // tb
  45. tb.Location = new Point(8, 8);
  46. tb.Size = new Size(408, 21);
  47. tb.Text = "This is the text that will be encoded.";
  48.  
  49. // frmTest
  50. ClientSize = new Size(496, 318);
  51. Controls.Add(tb);
  52. Controls.Add(lv);
  53. Controls.Add(btn);
  54. }
  55.  
  56. private void PrintCPBytes(string str, int codePage)
  57. {
  58. Encoding targetEncoding;
  59. byte[] encodedChars;
  60.  
  61. targetEncoding = Encoding.GetEncoding(codePage);
  62.  
  63. // Gets the byte representation of the specified string.
  64. encodedChars = targetEncoding.GetBytes(str);
  65.  
  66. for (int i = 0; i < encodedChars.Length; i++)
  67. {
  68. ListViewItem lItem = new ListViewItem(i.ToString());
  69. lItem.SubItems.Add(encodedChars[i].ToString());
  70. lv.Items.Add(lItem);
  71. }
  72.  
  73. }
  74.  
  75. private void btn_Click(object sender, System.EventArgs e)
  76. {
  77. lv.Items.Clear();
  78. PrintCPBytes(tb.Text, 1252);   // 1252 is Latin, 932 is Japanese
  79. PrintCPBytes(tb.Text, 932);    // 1252 is Latin, 932 is Japanese
  80. }
  81.  
  82. static void Main()
  83. {
  84. Application.Run(new frmTest());
  85. }
  86. }
  87.  
I'd like to put "\u307b" (etc) into the TextBox, and apply that to the
PrintCPBytes function.

Thanks,
pagates

"Jon Skeet [C# MVP]" wrote:
pagates <pa*****@discussions.microsoft.com> wrote:
I am playing a little with Encoding, and I have what is possibly (forgive
me) a newbie-type question.

I have a function that takes a string and a codepage (based upon the basic
MSDN help - look for "Using Unicode Encoding").

Anyway, I want to pass the contents of a textbox "as is", but always get a
literal string. Using the example from MSDN, say I want to pass in this
string:
"\u307b,\u308b,\u305a,\u3042,\u306d"
If I pass in Textbox.Text, I get
@"\u307b,\u308b,\u305a,\u3042,\u306d", which is not what I want.

In other words, instead of "Unicode character 307b, unicode 308b", etc., I
get "slash u three..."


Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.

Here's a sample program which seems to go against your post:

using System.Windows.Forms;
using System.Drawing;
using System;

public class Test
{
static void Main()
{
Form f = new Form();
f.Size = new Size(200, 200);
TextBox tb = new TextBox();
tb.Text = "\u307b,\u308b,\u305a,\u3042,\u306d";
f.Controls.Add(tb);
Application.Run(f);
}
}

While that only displays boxes and commas on my box, it makes the point
that it's *not* displaying "\u307b (etc)".

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #3
On Mon, 10 Oct 2005 10:29:04 -0700, "pagates"
<pa*****@discussions.microsoft.com> wrote:
Anyway, I want to pass the contents of a textbox "as is", but always get a
literal string. Using the example from MSDN, say I want to pass in this
string:
"\u307b,\u308b,\u305a,\u3042,\u306d"

If I pass in Textbox.Text, I get
@"\u307b,\u308b,\u305a,\u3042,\u306d", which is not what I want.

In other words, instead of "Unicode character 307b, unicode 308b", etc., I
get "slash u three..."


So you basically want to apply escape character parsing on the textbox
string. I don't know if the framework has any such function. Someone
else may answer that, otherwise here is a method that implements the
basic, including the unicode escape code parsing that you want. If you
need to be able to handle other escape codes you have to add it
manually:
static string ParseBackSlashString(string s)
{
StringBuilder sb = new StringBuilder(s.Length);
for (int i = 0; i < s.Length; i++)
{
if (s[i] == '\\')
{
i++;
if (i >= s.Length) //There must be a character after
backslash
throw new ApplicationException("String may not end
with a \\.");
switch (s[i])
{
case '\\':
sb.Append('\\');
break;
case 'u':
if (i + 4 >= s.Length)
throw new ApplicationException("Unrecognized
escape sequence.");
else
{
int value = 0;
for (int j = 1; j <= 4; j++)
{
char c = s[i+j];
if (c >= '0' && c <= '9')
value += (int)Math.Pow(16, 4-j)*
(c-'0');
else if (c >= 'a' && c <= 'f')
value += (int)Math.Pow(16, 4 - j) * (c
- 'a' + 10);
else if (c >= 'A' && c <= 'F')
value += (int)Math.Pow(16, 4 - j) * (c
- 'A' + 10);
else
throw new
ApplicationException("Unrecognized escape sequence.");
}
sb.Append((char)value);
i += 4;
}
break;
default:
throw new ApplicationException("Unrecognized
escape sequence.");
}

}
else //This is the default when there isn't a backslash
{
sb.Append(s[i]);
}
}
return sb.ToString();
}

--
Marcus Andrén
Nov 17 '05 #4
pagates <pa*****@discussions.microsoft.com> wrote:

<snip>
I'd like to put "\u307b" (etc) into the TextBox, and apply that to the
PrintCPBytes function.


In that case, you'll have to parse the text. The TextBox itself doesn't
(and shouldn't!) care about C# escaping rules.

You'll need to look for \u in a string, and then parse the next 4
characters as a hex number (eg using Convert.ToInt32(string,int),
specifying 16 as the base).

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #5
> if (i + 4 >= s.Length)
throw new ApplicationException("Unrecognized
escape sequence.");
else
{
int value = 0;
for (int j = 1; j <= 4; j++)
{
char c = s[i+j];
if (c >= '0' && c <= '9')
value += (int)Math.Pow(16, 4-j)*
(c-'0');
else if (c >= 'a' && c <= 'f')
value += (int)Math.Pow(16, 4 - j) * (c
- 'a' + 10);
else if (c >= 'A' && c <= 'F')
value += (int)Math.Pow(16, 4 - j) * (c
- 'A' + 10);
else
throw new
ApplicationException("Unrecognized escape sequence.");
}
sb.Append((char)value);
i += 4;
}
break;


After having read Jon Skeet's reply I would just like to mention that
the following code would be more readable using the Convert.ToInt32
method that he mentioned:

try
{
int value = Convert.ToInt32(s.Substring(i + 1, 4), 16);
sb.Append((char)value);
i += 4;
break;
}
catch (System.FormatException)
{
throw new ApplicationException("Unrecognized escape sequence.");
}
catch (System.ArgumentOutOfRangeException)
{
throw new ApplicationException("Unrecognized escape sequence.");
}
Nov 17 '05 #6
Jon,

The C# compiler already does this. In the interest of "reusable code",
don't you think, since the code is already written, it would have been nice
to have this method available/exposed? --I don't even know where to look
(using Roeder's Reflector) to see if this is possible or not...

In addition to this specific case, I find, sometimes, I find myself writing
code to reproduce something the framework already does. (Of course, I can't
remember exactly what it was I was trying to do.)

Scott

"Jon Skeet [C# MVP]" <sk***@pobox.com> wrote in message
news:MP************************@msnews.microsoft.c om...
pagates <pa*****@discussions.microsoft.com> wrote:

<snip>
I'd like to put "\u307b" (etc) into the TextBox, and apply that to the
PrintCPBytes function.


In that case, you'll have to parse the text. The TextBox itself doesn't
(and shouldn't!) care about C# escaping rules.

You'll need to look for \u in a string, and then parse the next 4
characters as a hex number (eg using Convert.ToInt32(string,int),
specifying 16 as the base).

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #7
Scott Coonce <sd******@gmail.HEY_YOU.com> wrote:
The C# compiler already does this. In the interest of "reusable code",
don't you think, since the code is already written, it would have been nice
to have this method available/exposed? --I don't even know where to look
(using Roeder's Reflector) to see if this is possible or not...


Hmmm... I'm not at all sure. It's quite possible that it does this
escaping within an internal structure which shouldn't be exposed, or at
the same time as maintaining other state. There may be some way of
doing it at the moment using the compiler services, but it's likely to
be much more tortuous than just writing the code by hand.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #8
Thanks, all. I was afraid that I was going to have to parse it, but I wanted
to make sure I wasn't reinventing the .NET wheel.

Thanks again,
pagates
"Marcus Andrin" wrote:
After having read Jon Skeet's reply I would just like to mention that
the following code would be more readable using the Convert.ToInt32
method that he mentioned:

try
{
int value = Convert.ToInt32(s.Substring(i + 1, 4), 16);
sb.Append((char)value);
i += 4;
break;
}
catch (System.FormatException)
{
throw new ApplicationException("Unrecognized escape sequence.");
}
catch (System.ArgumentOutOfRangeException)
{
throw new ApplicationException("Unrecognized escape sequence.");
}

Nov 17 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Gene Ellis | last post by:
Very simple question. If I have the XML file below: <?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/xsl" href="newspage.xsl"?> <newspage> <content>Click here for blah...
1
by: jm | last post by:
Easy probably, please read on. I know some of you have commented already about some of my socket question. I appreciate that. I have a Form1: static void Main() { Application.Run(new...
19
by: Dales | last post by:
I have a custom control that builds what we refer to as "Formlets" around some content in a page. These are basically content "wrapper" sections that are tables that have a colored header and...
1
by: Brian Henry | last post by:
Hello, I was tring to learn socket's (being i never used them before) and have a simple question. I want to create a listner that will get any data recieved and print it out. I've been able to...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.