473,791 Members | 2,861 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Converting Unicode

Hi @all,

I'm searching for a solution for the following problem:
I want to replace all unicode characters in a string with a valid
substituition.

For example:

string s = "Catalán";
string s2 = ModifyMyString( s); //s2 = "Catal\xC3\xA1n "

Since replacing unicode characters in a string that way, should be a
very common task, I asked myself whether there is a function in the
..NET-Framework, that does this job. Doing a s.Replace("á"," \xC3\xA1")
would not be a very effective way cause there are "many" unicode
characters. :-)

Thanks for help
Sams

Nov 17 '05 #1
7 1861
Hi,

You can probably use Regular Expressions to replace multiple occurences of
the same character with the substitution sequence.

--
Sincerely,
Dmytro Lapshyn [Visual Developer - Visual C# MVP]

"sams" <Fr************ @web.de> wrote in message
news:11******** **************@ g44g2000cwa.goo glegroups.com.. .
Hi @all,

I'm searching for a solution for the following problem:
I want to replace all unicode characters in a string with a valid
substituition.

For example:

string s = "Catalán";
string s2 = ModifyMyString( s); //s2 = "Catal\xC3\xA1n "

Since replacing unicode characters in a string that way, should be a
very common task, I asked myself whether there is a function in the
..NET-Framework, that does this job. Doing a s.Replace("á"," \xC3\xA1")
would not be a very effective way cause there are "many" unicode
characters. :-)

Thanks for help
Sams

Nov 17 '05 #2
What is the nature of your substitution? Are you trying to convert
Unicode to UTF-8? If so, there are methods for doing this within the
Framework.

If the encoding is totally your own then you would need to create a new
subclass of Encoding (if you want to build the Cadillac version).

Nov 17 '05 #3
Thanks so far for your suggestions.

To answer the question about the nature of the substitution Bruce
asked: I'm reading content from a SQL Server 2000 and want to insert it
into an PostgreSQL DB. I think the database driver I use only accepts
ASCII encodings (characters [0..9][A..z] and those replacement strings
I already mentioned). The database, of course, is unicode compatible.
Since my knowledge of unicode/utf-8 is not sufficient enough, I'm going
to find those functions Bruce mentioned. I will keep you up to date. If
someone has another idea I would be very happy.

Sams

Nov 17 '05 #4
Thanks so far for your suggestions.

To answer the question about the nature of the substitution Bruce
asked: I'm reading content from a SQL Server 2000 and want to insert it
into an PostgreSQL DB. I think the database driver I use only accepts
ASCII encodings (characters [0..9][A..z] and those replacement strings
I already mentioned). The database, of course, is unicode compatible.
Since my knowledge of unicode/utf-8 is not sufficient enough, I'm going
to find those functions Bruce mentioned. I will keep you up to date. If
someone has another idea I would be very happy.

Sams

Nov 17 '05 #5
Sams,

The real question is how are you going to pass UTF-8 characters to the
driver. Remember that System.String is *always* Unicode, so unless you have
a way to pass a byte array, you might have hard time passing a UTF-8 string.
Can you please elaborate on the driver interface you are using?

--
Sincerely,
Dmytro Lapshyn [Visual Developer - Visual C# MVP]
"sams" <Fr************ @web.de> wrote in message
news:11******** **************@ f14g2000cwb.goo glegroups.com.. .
Thanks so far for your suggestions.

To answer the question about the nature of the substitution Bruce
asked: I'm reading content from a SQL Server 2000 and want to insert it
into an PostgreSQL DB. I think the database driver I use only accepts
ASCII encodings (characters [0..9][A..z] and those replacement strings
I already mentioned). The database, of course, is unicode compatible.
Since my knowledge of unicode/utf-8 is not sufficient enough, I'm going
to find those functions Bruce mentioned. I will keep you up to date. If
someone has another idea I would be very happy.

Sams


Nov 17 '05 #6
sams wrote:
Hi @all,

I'm searching for a solution for the following problem:
I want to replace all unicode characters in a string with a valid
substituition.

For example:

string s = "Catalán";
string s2 = ModifyMyString( s); //s2 = "Catal\xC3\xA1n "

Since replacing unicode characters in a string that way, should be a
very common task, I asked myself whether there is a function in the
.NET-Framework, that does this job. Doing a s.Replace("á"," \xC3\xA1")
would not be a very effective way cause there are "many" unicode
characters. :-)


Characters and thus strings in .NET are alyways Unicode. There's no
difference between replacing characters with characters and replacing
Unicode characters with characters. And "\xC3\xA1" is not a character,
but a string that says
\xC3\xA1

You seem to be confusing these things with character encoding?

Cheers,
--
http://www.joergjooss.de
mailto:ne****** **@joergjooss.d e
Nov 17 '05 #7
> string s = "Catalán";
string s2 = ModifyMyString( s); //s2 = "Catal\xC3\xA1n "


C3 A1 are the bytes used to represent á as UTF-8.
A .NET string is Unicode (UTF-16 representation) , so probably
what you want is to convert a string to a UTF-8 byte array.
If this is the case, take a look at System.Text.UTF 8Encoding

But depending on what mechanism you are using to interact with the database,
you may not need to do your own conversion.

--
Mihai Nita [Microsoft MVP, Windows - SDK]
------------------------------------------
Replace _year_ with _ to get the real email
Nov 17 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
10706
by: Peter Wilkinson | last post by:
Hello tlistmembers, I am using the encoding function to convert unicode to ascii. At one point this code was working just fine, however, now it has broken. I am reading a text file that has is in unicode (I am unsure of which flavour or bit depth). as I read in the file one line at a time (readlines()) it converts to ascii. Simple enough. At the same time I am copressing to bz2 with the bz2 module but that works just fine. The code...
10
8541
by: Maxim Kasimov | last post by:
there are a few questions i can find answer in manual: 1. how to define which is internal encoding of python unicode strings (UTF-8, UTF-16 ...) 2. how to convert string to UCS-2 (Python 2.2.3 on freebsd4) -- Best regards, Maxim
3
5252
by: Supratim | last post by:
Hi, For past few weeks I am working on a function that would take encoded Unicode characters from query string of http requests and then decode them back to Unicode numbers. I have full success with UTF-8 encoding but it is UTF-16 where I stumble. Can somebody help me with one of the following examples that puzzle me : %B7%C9 is UTF-16 encoded version of unicode 98DE (39134 in decimal)
7
543
by: Robert Diamond | last post by:
Hi ppl, just a quick question... I need to use "MultiByteToWideChar(stuff)" to convert a char to unicode, so that OleLoadPicturePath can get the image files i want, and load it into a HBITMAP, etc... I'm having trouble when my char has a %, for example: "C:\Image\my image.bmp" is translated to the unicode equivalent, without problems
1
8054
by: Rajesh Kumar Mallah | last post by:
Hi , my current database in 7.3.4 is SQL_ASCII are there any benifits of coverting it to UNICODE encoding in 7.4 ? If so what is the process. Regds Mallah.
8
3875
by: Alphaboomer | last post by:
I'm using the following code to retrieve a list of all the Categories used by Microsoft Outlook: sub test() Dim objWSHShell As Object Dim strCategoryList As Variant Set objWSHShell = CreateObject("WScript.Shell") strCategoryList = objWSHShell.RegRead_ ("HKEY_CURRENT_USER\Software\Microsoft\Office\11.0\Outlook\Categories\MasterList")
0
2038
by: kurotsuke | last post by:
I need to convert a sequence of keys presses on the keyboard into the corresponding character code (UNICODE). I'm intercepting the KeyUp event (using an external hooking library) and need to get the corrisponding code, according to the user keyboard. For example, if the user pressed SHIFT+a I need to get the 65 code I'd also need to the the opposite thing, that is converting the character UNICODE code to a sequence of key presses.
2
6647
by: Paolo | last post by:
I imported a VC++6.0 project into VC++7.1. The conversion operation makes a mess with Preprocessor Definitions, adding a "$(NoInherit)" for each file. For example: I had a DLL project in VC++6.0 where the definitions were: _UNICODE,_DEBUG,_WIN32_DCOM,WIN32,_WINDOWS,_WINDLL,_AFXDLL,_USRDLL In VC++7.1, these are the preprocessor definitions of the project (right-click the project in Solution Explorer and choose Properties -> C++ ->...
5
2557
by: Sonu | last post by:
Hello everyone and thanks in advance. I have a multilingual application which has been built in MFC VC++ 6.0 (non-Unicode). It support English German Hungarian so far, which has been fine. But now I need it to work on Russian computers and I realized that the application should be converted to Unicode to work in Russian. I am totally new to .NET so I'm not sure of this, but I read somewhere that if converted my apllication to .NET...
2
5387
by: Nikola Skoric | last post by:
What I have is a bunch of text in arabic, and series of Unicode bytes which represent those arabic words (like this: \'c2\'e4\'f6\'d3\'f3\'c9 \'f1). Now I have to figure out how to convert my arabic text to bunch of \'somethings. If I understood Unicode correctly (and I'm not sure if I did), I first have to figure out which encoding this is (UTF-16 or UTF-32 or some other) and then convert the letters to their byte representation. I think...
0
9515
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
10154
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9993
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9029
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7537
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6776
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
4109
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3713
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2913
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.