problem with unicode - Latest Bytes

tvin

Hi all

I brought a string from a .txt file which was saved like utf-8.
In the .txt file i have this string "frÃ©dÃ©ric".My problem is that when i
read this file .txt,the bytes of Ã© are like this : 101,180.

Ã© length are 2 in utf-8 file.how can i change this 2 length to 1.

my problem that i want to use Ã© like 233 byte and not like 2 bytes 101,180

please help me

can i correct this problem after read the .txt file

Mar 15 '06 #1

Subscribe Reply

1519

Joerg Jooss

Thus wrote tvin,

Hi all

I brought a string from a .txt file which was saved like utf-8.
In the .txt file i have this string "frédéric".My problem is that when
i
read this file .txt,the bytes of é are like this : 101,180.
é length are 2 in utf-8 file.how can i change this 2 length to 1.
That's how UTF-8 works.
my problem that i want to use é like 233 byte and not like 2 bytes
101,180

Then use an 8 bit encoding like Windows-1252 or ISO-8859-1 or -15.

But why on earth do you want *one* byte? It's not 1976 anymore, and no 8
bit encoding on this planet has a similar coverage as Unicode. BTW, all functionality
in the BCL to process text files uses UTF-8 by default.

Cheers,
--
Joerg Jooss
ne********@joergjooss.de

Mar 15 '06 #2

tvin

Joerg

"Joerg Jooss" wrote:

Thus wrote tvin,
Hi all

I brought a string from a .txt file which was saved like utf-8.
In the .txt file i have this string "frÃ©dÃ©ric".My problem is that when
i
read this file .txt,the bytes of Ã© are like this : 101,180.
Ã© length are 2 in utf-8 file.how can i change this 2 length to 1.

That's how UTF-8 works.
my problem that i want to use Ã© like 233 byte and not like 2 bytes
101,180

Then use an 8 bit encoding like Windows-1252 or ISO-8859-1 or -15.

But why on earth do you want *one* byte? It's not 1976 anymore, and no 8
bit encoding on this planet has a similar coverage as Unicode. BTW, all functionality
in the BCL to process text files uses UTF-8 by default.

Cheers,
--
Joerg Jooss
ne********@joergjooss.de

joerg
i can use 2bytes ,3byte...i don't have a problem
but the lenght of frÃ©dÃ©ric is len(frÃ©dÃ©ric) =10,

the lenght of frÃ©dÃ©ric should be 8 to insert correctly in sql database .
please help me jeorg

Mar 15 '06 #3

Joerg Jooss

Thus wrote tvin,

i can use 2bytes ,3byte...i don't have a problem
but the lenght of frédéric is len(frédéric) =10,
the lenght of frédéric should be 8 to insert correctly in sql database
. please help me jeorg

You're confusing bytes and characters. Frédéric has 10 characters, but it
may have 10 or more bytes depending on the character encoding being used
-- if you were using UTF-32, it would be a whopping 40 ;-)

Doesn't your database support the nvarchar type for Unicode characters?
--
Joerg Jooss
ne********@joergjooss.de

Mar 15 '06 #4

tvin

Hi joerg

"Joerg Jooss" wrote:

Thus wrote tvin,
i can use 2bytes ,3byte...i don't have a problem
but the lenght of frÃ©dÃ©ric is len(frÃ©dÃ©ric) =10,
the lenght of frÃ©dÃ©ric should be 8 to insert correctly in sql database
. please help me jeorg

You're confusing bytes and characters. FrÃ©dÃ©ric has 10 characters, but it
may have 10 or more bytes depending on the character encoding being used
-- if you were using UTF-32, it would be a whopping 40 ;-)

Doesn't your database support the nvarchar type for Unicode characters?
--
Joerg Jooss
ne********@joergjooss.de

jeorg ,my parameters are nvarchar in sql database.but
FrÃ©dÃ©ric which was insert in sql is like this:"FrÃ©dÃ©ric " i used
trim(chr(31),chr(30))
but the result is the same.
what should i do to insert FrÃ©dÃ©ric in sql like this "FrÃ©dÃ©ric".
i feel that the problem is when i open the .txt file to bring FrÃ©dÃ©ric,i
brought it like 10 character but it should be 8 to insert correctly in sql
database

Mar 16 '06 #5

Jon Skeet [C# MVP]

tvin <tv**@discussions.microsoft.com> wrote:

jeorg ,my parameters are nvarchar in sql database.but
Fr?d?ric which was insert in sql is like this:"Fr?d?ric " i used
trim(chr(31),chr(30))
but the result is the same.
what should i do to insert Fr?d?ric in sql like this "Fr?d?ric".
i feel that the problem is when i open the .txt file to bring Fr?d?ric,i
brought it like 10 character but it should be 8 to insert correctly in sql
database

When you say "when I open the .txt file" - what are you using to open
it? You need to use something that understands UTF-8.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Mar 16 '06 #6

tvin

hi jon

"Jon Skeet [C# MVP]" wrote:

tvin <tv**@discussions.microsoft.com> wrote:
jeorg ,my parameters are nvarchar in sql database.but
Fr?d?ric which was insert in sql is like this:"Fr?d?ric " i used
trim(chr(31),chr(30))
but the result is the same.
what should i do to insert Fr?d?ric in sql like this "Fr?d?ric".
i feel that the problem is when i open the .txt file to bring Fr?d?ric,i
brought it like 10 character but it should be 8 to insert correctly in sql
database

When you say "when I open the .txt file" - what are you using to open
it? You need to use something that understands UTF-8.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

i am using this code to read test.txt file which was saved like utf-8.
Dim file As New System.IO.StreamReader("c:\test.txt")
Dim words As String = file.ReadToEnd()
Console.WriteLine(words)
file.Close()
I think that the problem is when i am reading the file because the length of
"frÃ©dÃ©ric" is 10, len(words)=10 ,after reading ,i think that "words" should
be 8(because i have 8 character) to convert correctly to unicode and to
insert correctly in sql database.

after read test.txt i convert this word to unicode with this code:

Dim uni As New UnicodeEncoding()

Dim encodedBytes As Byte() = uni.GetBytes(words)

Dim decodedString As String = uni.GetString(encodedBytes)

please help me

Mar 16 '06 #7

Jon Skeet [C# MVP]

tvin <tv**@discussions.microsoft.com> wrote:

When you say "when I open the .txt file" - what are you using to open
it? You need to use something that understands UTF-8.
i am using this code to read test.txt file which was saved like utf-8.
Dim file As New System.IO.StreamReader("c:\test.txt")
Dim words As String = file.ReadToEnd()
Console.WriteLine(words)
file.Close()

Okay - that will be reading it as a UTF-8 file.
I think that the problem is when i am reading the file because the length of
"fr?d?ric" is 10, len(words)=10 ,after reading ,i think that "words" should
be 8(because i have 8 character) to convert correctly to unicode and to
insert correctly in sql database.
I suspect the problem isn't the accents at all - my guess is that
you've got a carriage return and line feed at the end of the file, and
*that's* what's making the length 10.
after read test.txt i convert this word to unicode with this code:

Dim uni As New UnicodeEncoding()
Dim encodedBytes As Byte() = uni.GetBytes(words)
Dim decodedString As String = uni.GetString(encodedBytes)

No, you don't need to do that. The above is effectively a no-op - *all*
strings in .NET are in Unicode.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Mar 16 '06 #8

tvin

jon thx for aiding me
but
when i use another word instead of "frÃ©dÃ©ric" i don't have any problem.
when i insert "frÃ©dÃ©ric" in sql,its like this "frÃ©dÃ©ric "

well,i don't know wath is the problem.

thx jon
"Jon Skeet [C# MVP]" wrote:

tvin <tv**@discussions.microsoft.com> wrote:
When you say "when I open the .txt file" - what are you using to open
it? You need to use something that understands UTF-8.

i am using this code to read test.txt file which was saved like utf-8.
Dim file As New System.IO.StreamReader("c:\test.txt")
Dim words As String = file.ReadToEnd()
Console.WriteLine(words)
file.Close()

Okay - that will be reading it as a UTF-8 file.
I think that the problem is when i am reading the file because the length of
"fr?d?ric" is 10, len(words)=10 ,after reading ,i think that "words" should
be 8(because i have 8 character) to convert correctly to unicode and to
insert correctly in sql database.

I suspect the problem isn't the accents at all - my guess is that
you've got a carriage return and line feed at the end of the file, and
*that's* what's making the length 10.
after read test.txt i convert this word to unicode with this code:

Dim uni As New UnicodeEncoding()
Dim encodedBytes As Byte() = uni.GetBytes(words)
Dim decodedString As String = uni.GetString(encodedBytes)

No, you don't need to do that. The above is effectively a no-op - *all*
strings in .NET are in Unicode.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Mar 16 '06 #9

Joerg Jooss

Thus wrote Joerg,

Thus wrote tvin,
i can use 2bytes ,3byte...i don't have a problem
but the lenght of frédéric is len(frédéric) =10,
the lenght of frédéric should be 8 to insert correctly in sql
database
. please help me jeorg

You're confusing bytes and characters. Frédéric has 10 characters, but
it may have 10 or more bytes depending on the character encoding being
used

Oops, the old counting issue crept up again... make that 8 characters ;-)
--
Joerg Jooss
ne********@joergjooss.de

Mar 16 '06 #10

Jon Skeet [C# MVP]

tvin <tv**@discussions.microsoft.com> wrote:

when i use another word instead of "fr?d?ric" i don't have any problem.
when i insert "fr?d?ric" in sql,its like this "fr?d?ric "

well,i don't know wath is the problem.

See http://www.pobox.com/~skeet/csharp/d...ngunicode.html

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Mar 16 '06 #11

Similar topics

unicode encoding usablilty problem

by: aurora | last post by:

I have long find the Python default encoding of strict ASCII frustrating. For one thing I prefer to get garbage character than an exception. But the biggest issue is Unicode exception often pop up...