473,769 Members | 5,131 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Reading Unicode escape sequences from File

Hello,

I have a file that contains ordinary text and some special charaters in
Unicode escape sequences (\uxxxx).

When I read the file using e.g. StreamReader Unicode escape sequences are
not converted to their character representation. They are shown excatly same
way than in file. Literals in C# code's variables are shown corretly.

Can anyone tell how to read Unicode escape sequences from file so that they
are presented like literals?

Thanks,
Jun 27 '08 #1
5 5333
John Ztwin <Jo****@mail.co mwrote:
I have a file that contains ordinary text and some special charaters in
Unicode escape sequences (\uxxxx).

When I read the file using e.g. StreamReader Unicode escape sequences are
not converted to their character representation.
No, I wouldn't expect them to be. That's done by the C# compiler - it
would be a big mistake for it to be done by StreamReader.
They are shown excatly same
way than in file. Literals in C# code's variables are shown corretly.

Can anyone tell how to read Unicode escape sequences from file so that they
are presented like literals?
You basically need to parse the text you've read, just like the C#
compiler does. You can search for \u fairly easily, then take the next
four digits, complain if they're not all hex, convert the hex to a
char, then replace the whole section with the character value.

--
Jon Skeet - <sk***@pobox.co m>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Jun 27 '08 #2
A little bit more work than in Java if I remember right,
Thanks for reply!

"Jon Skeet [C# MVP]" <sk***@pobox.co mwrote in message
news:MP******** *************@m snews.microsoft .com...
John Ztwin <Jo****@mail.co mwrote:
>I have a file that contains ordinary text and some special charaters in
Unicode escape sequences (\uxxxx).

When I read the file using e.g. StreamReader Unicode escape sequences are
not converted to their character representation.

No, I wouldn't expect them to be. That's done by the C# compiler - it
would be a big mistake for it to be done by StreamReader.
>They are shown excatly same
way than in file. Literals in C# code's variables are shown corretly.

Can anyone tell how to read Unicode escape sequences from file so that
they
are presented like literals?

You basically need to parse the text you've read, just like the C#
compiler does. You can search for \u fairly easily, then take the next
four digits, complain if they're not all hex, convert the hex to a
char, then replace the whole section with the character value.

--
Jon Skeet - <sk***@pobox.co m>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com

Jun 27 '08 #3
John Ztwin <Jo****@mail.co mwrote:
A little bit more work than in Java if I remember right,
Well, not if you use the normal BufferedReader and InputStreamRead er in
Java.

Java's Properties class will do the unescaping for properties files,
but it isn't general purpose.

--
Jon Skeet - <sk***@pobox.co m>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Jun 27 '08 #4
John Ztwin wrote:
I have a file that contains ordinary text and some special charaters in
Unicode escape sequences (\uxxxx).

When I read the file using e.g. StreamReader Unicode escape sequences are
not converted to their character representation. They are shown excatly same
way than in file. Literals in C# code's variables are shown corretly.

Can anyone tell how to read Unicode escape sequences from file so that they
are presented like literals?
You will need to make a text replace.

Example code:

public static string U2U(string s)
{
string res = s;
MatchCollection reg = Regex.Matches(r es, @"\\u([0-9A-F]{4})");
for(int i = 0; i < reg.Count; i++) {
res = res.Replace(reg[i].Groups[0].Value, "" +
(char)int.Parse (reg[i].Groups[1].Value, NumberStyles.He xNumber));
}
return res;
}

Arne
Jun 27 '08 #5
John Ztwin wrote:
I have a file that contains ordinary text and some special charaters in
Unicode escape sequences (\uxxxx).
If the file always uses \u then there is no risk. However, some
standards (like the C# spec) allow \U (uppercase) escape sequences:

unicode-escape-sequence:
\u hex-digit hex-digit hex-digit hex-digit
\U hex-digit hex-digit hex-digit hex-digit hex-digit
hex-digit hex-digit hex-digit

http://msdn.microsoft.com/en-us/library/aa664812.aspx
Best regards
--
Michael Justin
SCJP, SCJA
betasoft - Software for Delphiâ„¢ and for the Javaâ„¢ platform
http://www.mikejustin.com - http://www.betabeans.de
Jun 27 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
4917
by: Thomas Philips | last post by:
I have a data file that I read with readline(), and would like to control the formats of the lines when they are printed. I have tried inserting escape sequences into the data file, but am having trouble getting them to work as I think they should. For example, if my data file has only one line which reads: 1\n234\n567 I would like to read it with a command of the form x=datafile.readline()
3
7303
by: harrelson | last post by:
I have a list of about 2500 html escape sequences (decimal) that I need to convert to utf-8. Stuff like: 비 행 기 로 보 낼 거
2
7482
by: Silvio Lopes de Oliveira | last post by:
Hello, I use Unicode in my program to enter data in Chinese into a nvarchar field. When inserting or updating records, my application does not use Unicode escape sequences but rather the Unicode characters themselves. Thus, my app may have SQL statements like: UPDATE table SET field1 = '<some string in Chinese>'; The syntax above works both in my app and through SQL Server Enterprise
5
8846
by: Johannes | last post by:
Is it correct that Unicode characters with code points above 0x10FFFF are not supported by C# I have a hard time believing this since it would eliminate some Asian languages. If it is true, is there a workaround? Do other .NET languages support code points > 0x10FFFF I appreciate any comments Thanks Johannes
4
2509
by: Rehceb Rotkiv | last post by:
Hello, I have this little grep-like program: ++++++++++snip++++++++++ #!/usr/bin/python import sys import re
2
3307
by: | last post by:
I mainly work on OS X, but thought I'd experiment with some Python code on XP. The problem is I can't seem to get these things to work at all. First of all, I'd like to use Greek letters in the command prompt window, so I was going to use unicode to do this. But in the command prompt, the unicode characters are displaying as strange looking characters. I tried installing the 'Bitstream Vera Sans Mono' font in hopes it had all the...
8
2410
by: mario | last post by:
I have checks in code, to ensure a decode/encode cycle returns the original string. Given no UnicodeErrors, are there any cases for the following not to be True? unicode(s, enc).encode(enc) == s mario
1
3573
by: Eric S. Johansson | last post by:
I'm having a problem (Python 2.4) converting strings with random 8-bit characters into an escape form which is 7-bit clean for storage in a database. Here's an example: body = meta.encode('unicode-escape') when given an 8-bit string, (in meta), the code fragment above yields the error below. 'ascii' codec can't decode byte 0xe1 in position 13: ordinal not in range(128)
1
5776
by: anonymous | last post by:
1 Objective to write little programs to help me learn German. See code after numbered comments. //Thanks in advance for any direction or suggestions. tk 2 Want keyboard answer input, for example: answer_str = raw_input(' Enter answer ') Herr Üü
0
9423
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10039
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9990
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
7406
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6668
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5445
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3955
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3560
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2814
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.