473,322 Members | 1,403 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

Reading Unicode escape sequences from File

Hello,

I have a file that contains ordinary text and some special charaters in
Unicode escape sequences (\uxxxx).

When I read the file using e.g. StreamReader Unicode escape sequences are
not converted to their character representation. They are shown excatly same
way than in file. Literals in C# code's variables are shown corretly.

Can anyone tell how to read Unicode escape sequences from file so that they
are presented like literals?

Thanks,
Jun 27 '08 #1
5 5283
John Ztwin <Jo****@mail.comwrote:
I have a file that contains ordinary text and some special charaters in
Unicode escape sequences (\uxxxx).

When I read the file using e.g. StreamReader Unicode escape sequences are
not converted to their character representation.
No, I wouldn't expect them to be. That's done by the C# compiler - it
would be a big mistake for it to be done by StreamReader.
They are shown excatly same
way than in file. Literals in C# code's variables are shown corretly.

Can anyone tell how to read Unicode escape sequences from file so that they
are presented like literals?
You basically need to parse the text you've read, just like the C#
compiler does. You can search for \u fairly easily, then take the next
four digits, complain if they're not all hex, convert the hex to a
char, then replace the whole section with the character value.

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Jun 27 '08 #2
A little bit more work than in Java if I remember right,
Thanks for reply!

"Jon Skeet [C# MVP]" <sk***@pobox.comwrote in message
news:MP*********************@msnews.microsoft.com. ..
John Ztwin <Jo****@mail.comwrote:
>I have a file that contains ordinary text and some special charaters in
Unicode escape sequences (\uxxxx).

When I read the file using e.g. StreamReader Unicode escape sequences are
not converted to their character representation.

No, I wouldn't expect them to be. That's done by the C# compiler - it
would be a big mistake for it to be done by StreamReader.
>They are shown excatly same
way than in file. Literals in C# code's variables are shown corretly.

Can anyone tell how to read Unicode escape sequences from file so that
they
are presented like literals?

You basically need to parse the text you've read, just like the C#
compiler does. You can search for \u fairly easily, then take the next
four digits, complain if they're not all hex, convert the hex to a
char, then replace the whole section with the character value.

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com

Jun 27 '08 #3
John Ztwin <Jo****@mail.comwrote:
A little bit more work than in Java if I remember right,
Well, not if you use the normal BufferedReader and InputStreamReader in
Java.

Java's Properties class will do the unescaping for properties files,
but it isn't general purpose.

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Jun 27 '08 #4
John Ztwin wrote:
I have a file that contains ordinary text and some special charaters in
Unicode escape sequences (\uxxxx).

When I read the file using e.g. StreamReader Unicode escape sequences are
not converted to their character representation. They are shown excatly same
way than in file. Literals in C# code's variables are shown corretly.

Can anyone tell how to read Unicode escape sequences from file so that they
are presented like literals?
You will need to make a text replace.

Example code:

public static string U2U(string s)
{
string res = s;
MatchCollection reg = Regex.Matches(res, @"\\u([0-9A-F]{4})");
for(int i = 0; i < reg.Count; i++) {
res = res.Replace(reg[i].Groups[0].Value, "" +
(char)int.Parse(reg[i].Groups[1].Value, NumberStyles.HexNumber));
}
return res;
}

Arne
Jun 27 '08 #5
John Ztwin wrote:
I have a file that contains ordinary text and some special charaters in
Unicode escape sequences (\uxxxx).
If the file always uses \u then there is no risk. However, some
standards (like the C# spec) allow \U (uppercase) escape sequences:

unicode-escape-sequence:
\u hex-digit hex-digit hex-digit hex-digit
\U hex-digit hex-digit hex-digit hex-digit hex-digit
hex-digit hex-digit hex-digit

http://msdn.microsoft.com/en-us/library/aa664812.aspx
Best regards
--
Michael Justin
SCJP, SCJA
betasoft - Software for Delphiâ„¢ and for the Javaâ„¢ platform
http://www.mikejustin.com - http://www.betabeans.de
Jun 27 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: Thomas Philips | last post by:
I have a data file that I read with readline(), and would like to control the formats of the lines when they are printed. I have tried inserting escape sequences into the data file, but am having...
3
by: harrelson | last post by:
I have a list of about 2500 html escape sequences (decimal) that I need to convert to utf-8. Stuff like: 비 행 기 로 보 낼 거
2
by: Silvio Lopes de Oliveira | last post by:
Hello, I use Unicode in my program to enter data in Chinese into a nvarchar field. When inserting or updating records, my application does not use Unicode escape sequences but rather the Unicode...
5
by: Johannes | last post by:
Is it correct that Unicode characters with code points above 0x10FFFF are not supported by C# I have a hard time believing this since it would eliminate some Asian languages. If it is true, is...
4
by: Rehceb Rotkiv | last post by:
Hello, I have this little grep-like program: ++++++++++snip++++++++++ #!/usr/bin/python import sys import re
2
by: | last post by:
I mainly work on OS X, but thought I'd experiment with some Python code on XP. The problem is I can't seem to get these things to work at all. First of all, I'd like to use Greek letters in the...
8
by: mario | last post by:
I have checks in code, to ensure a decode/encode cycle returns the original string. Given no UnicodeErrors, are there any cases for the following not to be True? unicode(s, enc).encode(enc)...
1
by: Eric S. Johansson | last post by:
I'm having a problem (Python 2.4) converting strings with random 8-bit characters into an escape form which is 7-bit clean for storage in a database. Here's an example: body =...
1
by: anonymous | last post by:
1 Objective to write little programs to help me learn German. See code after numbered comments. //Thanks in advance for any direction or suggestions. tk 2 Want keyboard answer input, for...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.