473,473 Members | 1,837 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

problem with unicode

Hi all

I brought a string from a .txt file which was saved like utf-8.
In the .txt file i have this string "frédéric".My problem is that when i
read this file .txt,the bytes of é are like this : 101,180.

é length are 2 in utf-8 file.how can i change this 2 length to 1.

my problem that i want to use é like 233 byte and not like 2 bytes 101,180

please help me

can i correct this problem after read the .txt file

Mar 15 '06 #1
10 1519
Thus wrote tvin,
Hi all

I brought a string from a .txt file which was saved like utf-8.
In the .txt file i have this string "frédéric".My problem is that when
i
read this file .txt,the bytes of é are like this : 101,180.
é length are 2 in utf-8 file.how can i change this 2 length to 1.
That's how UTF-8 works.
my problem that i want to use é like 233 byte and not like 2 bytes
101,180


Then use an 8 bit encoding like Windows-1252 or ISO-8859-1 or -15.

But why on earth do you want *one* byte? It's not 1976 anymore, and no 8
bit encoding on this planet has a similar coverage as Unicode. BTW, all functionality
in the BCL to process text files uses UTF-8 by default.

Cheers,
--
Joerg Jooss
ne********@joergjooss.de
Mar 15 '06 #2
Joerg

"Joerg Jooss" wrote:
Thus wrote tvin,
Hi all

I brought a string from a .txt file which was saved like utf-8.
In the .txt file i have this string "frédéric".My problem is that when
i
read this file .txt,the bytes of é are like this : 101,180.
é length are 2 in utf-8 file.how can i change this 2 length to 1.


That's how UTF-8 works.
my problem that i want to use é like 233 byte and not like 2 bytes
101,180


Then use an 8 bit encoding like Windows-1252 or ISO-8859-1 or -15.

But why on earth do you want *one* byte? It's not 1976 anymore, and no 8
bit encoding on this planet has a similar coverage as Unicode. BTW, all functionality
in the BCL to process text files uses UTF-8 by default.

Cheers,
--
Joerg Jooss
ne********@joergjooss.de

joerg
i can use 2bytes ,3byte...i don't have a problem
but the lenght of frédéric is len(frédéric) =10,

the lenght of frédéric should be 8 to insert correctly in sql database .
please help me jeorg
Mar 15 '06 #3
Thus wrote tvin,
i can use 2bytes ,3byte...i don't have a problem
but the lenght of frédéric is len(frédéric) =10,
the lenght of frédéric should be 8 to insert correctly in sql database
. please help me jeorg


You're confusing bytes and characters. Frédéric has 10 characters, but it
may have 10 or more bytes depending on the character encoding being used
-- if you were using UTF-32, it would be a whopping 40 ;-)

Doesn't your database support the nvarchar type for Unicode characters?
--
Joerg Jooss
ne********@joergjooss.de
Mar 15 '06 #4
Hi joerg

"Joerg Jooss" wrote:
Thus wrote tvin,
i can use 2bytes ,3byte...i don't have a problem
but the lenght of frédéric is len(frédéric) =10,
the lenght of frédéric should be 8 to insert correctly in sql database
. please help me jeorg


You're confusing bytes and characters. Frédéric has 10 characters, but it
may have 10 or more bytes depending on the character encoding being used
-- if you were using UTF-32, it would be a whopping 40 ;-)

Doesn't your database support the nvarchar type for Unicode characters?
--
Joerg Jooss
ne********@joergjooss.de

jeorg ,my parameters are nvarchar in sql database.but
Frédéric which was insert in sql is like this:"Frédéric " i used
trim(chr(31),chr(30))
but the result is the same.
what should i do to insert Frédéric in sql like this "Frédéric".
i feel that the problem is when i open the .txt file to bring Frédéric,i
brought it like 10 character but it should be 8 to insert correctly in sql
database
Mar 16 '06 #5
tvin <tv**@discussions.microsoft.com> wrote:
jeorg ,my parameters are nvarchar in sql database.but
Fr?d?ric which was insert in sql is like this:"Fr?d?ric " i used
trim(chr(31),chr(30))
but the result is the same.
what should i do to insert Fr?d?ric in sql like this "Fr?d?ric".
i feel that the problem is when i open the .txt file to bring Fr?d?ric,i
brought it like 10 character but it should be 8 to insert correctly in sql
database


When you say "when I open the .txt file" - what are you using to open
it? You need to use something that understands UTF-8.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 16 '06 #6
hi jon

"Jon Skeet [C# MVP]" wrote:
tvin <tv**@discussions.microsoft.com> wrote:
jeorg ,my parameters are nvarchar in sql database.but
Fr?d?ric which was insert in sql is like this:"Fr?d?ric " i used
trim(chr(31),chr(30))
but the result is the same.
what should i do to insert Fr?d?ric in sql like this "Fr?d?ric".
i feel that the problem is when i open the .txt file to bring Fr?d?ric,i
brought it like 10 character but it should be 8 to insert correctly in sql
database


When you say "when I open the .txt file" - what are you using to open
it? You need to use something that understands UTF-8.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

i am using this code to read test.txt file which was saved like utf-8.
Dim file As New System.IO.StreamReader("c:\test.txt")
Dim words As String = file.ReadToEnd()
Console.WriteLine(words)
file.Close()
I think that the problem is when i am reading the file because the length of
"frédéric" is 10, len(words)=10 ,after reading ,i think that "words" should
be 8(because i have 8 character) to convert correctly to unicode and to
insert correctly in sql database.

after read test.txt i convert this word to unicode with this code:

Dim uni As New UnicodeEncoding()

Dim encodedBytes As Byte() = uni.GetBytes(words)

Dim decodedString As String = uni.GetString(encodedBytes)

please help me

Mar 16 '06 #7
tvin <tv**@discussions.microsoft.com> wrote:
When you say "when I open the .txt file" - what are you using to open
it? You need to use something that understands UTF-8.
i am using this code to read test.txt file which was saved like utf-8.
Dim file As New System.IO.StreamReader("c:\test.txt")
Dim words As String = file.ReadToEnd()
Console.WriteLine(words)
file.Close()


Okay - that will be reading it as a UTF-8 file.
I think that the problem is when i am reading the file because the length of
"fr?d?ric" is 10, len(words)=10 ,after reading ,i think that "words" should
be 8(because i have 8 character) to convert correctly to unicode and to
insert correctly in sql database.
I suspect the problem isn't the accents at all - my guess is that
you've got a carriage return and line feed at the end of the file, and
*that's* what's making the length 10.
after read test.txt i convert this word to unicode with this code:

Dim uni As New UnicodeEncoding()
Dim encodedBytes As Byte() = uni.GetBytes(words)
Dim decodedString As String = uni.GetString(encodedBytes)


No, you don't need to do that. The above is effectively a no-op - *all*
strings in .NET are in Unicode.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 16 '06 #8
jon thx for aiding me
but
when i use another word instead of "frédéric" i don't have any problem.
when i insert "frédéric" in sql,its like this "frédéric "

well,i don't know wath is the problem.

thx jon
"Jon Skeet [C# MVP]" wrote:
tvin <tv**@discussions.microsoft.com> wrote:
When you say "when I open the .txt file" - what are you using to open
it? You need to use something that understands UTF-8.


i am using this code to read test.txt file which was saved like utf-8.
Dim file As New System.IO.StreamReader("c:\test.txt")
Dim words As String = file.ReadToEnd()
Console.WriteLine(words)
file.Close()


Okay - that will be reading it as a UTF-8 file.
I think that the problem is when i am reading the file because the length of
"fr?d?ric" is 10, len(words)=10 ,after reading ,i think that "words" should
be 8(because i have 8 character) to convert correctly to unicode and to
insert correctly in sql database.


I suspect the problem isn't the accents at all - my guess is that
you've got a carriage return and line feed at the end of the file, and
*that's* what's making the length 10.
after read test.txt i convert this word to unicode with this code:

Dim uni As New UnicodeEncoding()
Dim encodedBytes As Byte() = uni.GetBytes(words)
Dim decodedString As String = uni.GetString(encodedBytes)


No, you don't need to do that. The above is effectively a no-op - *all*
strings in .NET are in Unicode.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Mar 16 '06 #9
Thus wrote Joerg,
Thus wrote tvin,
i can use 2bytes ,3byte...i don't have a problem
but the lenght of frédéric is len(frédéric) =10,
the lenght of frédéric should be 8 to insert correctly in sql
database
. please help me jeorg

You're confusing bytes and characters. Frédéric has 10 characters, but
it may have 10 or more bytes depending on the character encoding being
used


Oops, the old counting issue crept up again... make that 8 characters ;-)
--
Joerg Jooss
ne********@joergjooss.de
Mar 16 '06 #10
tvin <tv**@discussions.microsoft.com> wrote:
when i use another word instead of "fr?d?ric" i don't have any problem.
when i insert "fr?d?ric" in sql,its like this "fr?d?ric "

well,i don't know wath is the problem.


See http://www.pobox.com/~skeet/csharp/d...ngunicode.html

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 16 '06 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

30
by: aurora | last post by:
I have long find the Python default encoding of strict ASCII frustrating. For one thing I prefer to get garbage character than an exception. But the biggest issue is Unicode exception often pop up...
3
by: Kidus Yared | last post by:
I am having a problem displaying Unicode characters on my Forms labels and buttons. After coding Button1.Text = unicode; where the unicode is a Unicode character or string (‘\u1234’ or...
5
by: Frank Millman | last post by:
Hi all I am using odbc from pywin32 to connect to MS SQL Server. I am changing my program from the old (incorrect) style of embedding values in the SQL command to the new (correct) style of...
5
by: Norman Diamond | last post by:
Here are two complete lines of output from Visual Studio 2005: 1>$B%W%m%8%'%/%H=PNO$K(B Authenticode $B=pL>$7$F$$$^$9(B... 1>Successfully signed: c:\T The first line means roughly: Doing...
12
by: David E. Ross | last post by:
The page is my home page at <http://www.rossde.com/>. With IE 7, the quote "You're Pre-Approved ..." appears below the "Statistics" box. With a Mozilla-based browser, the quote is to the left...
1
by: erikcw | last post by:
Hi, I'm trying to insert some data from an XML file into MySQL. However, while importing one of the files, I got this error: Traceback (most recent call last): File "wa.py", line 304, in ?...
0
by: santhescript01 | last post by:
Unicode to non unicode conversion problem -------------------------------------------------------------------------------- Hi All, I am using C dll in macro which converts Unicode data to...
1
by: Eric S. Johansson | last post by:
I'm having a problem (Python 2.4) converting strings with random 8-bit characters into an escape form which is 7-bit clean for storage in a database. Here's an example: body =...
0
by: John Machin | last post by:
On Apr 25, 9:15 pm, "andreas.prof...@googlemail.com" <andreas.prof...@googlemail.comwrote: Guessing is no substitute for reading the manual. print has nothing to do with your problem; the...
1
by: Victor Lin | last post by:
Hi, I'm writting a application using python standard logging system. I encounter some problem with unicode message passed to logging library. I found that unicode message will be messed up by...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...
1
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.