By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,962 Members | 2,036 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,962 IT Pros & Developers. It's quick & easy.

problem with unicode

P: n/a
Hi all

I brought a string from a .txt file which was saved like utf-8.
In the .txt file i have this string "frédéric".My problem is that when i
read this file .txt,the bytes of é are like this : 101,180.

é length are 2 in utf-8 file.how can i change this 2 length to 1.

my problem that i want to use é like 233 byte and not like 2 bytes 101,180

please help me

can i correct this problem after read the .txt file

Mar 15 '06 #1
Share this Question
Share on Google+
10 Replies


P: n/a
Thus wrote tvin,
Hi all

I brought a string from a .txt file which was saved like utf-8.
In the .txt file i have this string "frdric".My problem is that when
i
read this file .txt,the bytes of are like this : 101,180.
length are 2 in utf-8 file.how can i change this 2 length to 1.
That's how UTF-8 works.
my problem that i want to use like 233 byte and not like 2 bytes
101,180


Then use an 8 bit encoding like Windows-1252 or ISO-8859-1 or -15.

But why on earth do you want *one* byte? It's not 1976 anymore, and no 8
bit encoding on this planet has a similar coverage as Unicode. BTW, all functionality
in the BCL to process text files uses UTF-8 by default.

Cheers,
--
Joerg Jooss
ne********@joergjooss.de
Mar 15 '06 #2

P: n/a
Joerg

"Joerg Jooss" wrote:
Thus wrote tvin,
Hi all

I brought a string from a .txt file which was saved like utf-8.
In the .txt file i have this string "frédéric".My problem is that when
i
read this file .txt,the bytes of é are like this : 101,180.
é length are 2 in utf-8 file.how can i change this 2 length to 1.


That's how UTF-8 works.
my problem that i want to use é like 233 byte and not like 2 bytes
101,180


Then use an 8 bit encoding like Windows-1252 or ISO-8859-1 or -15.

But why on earth do you want *one* byte? It's not 1976 anymore, and no 8
bit encoding on this planet has a similar coverage as Unicode. BTW, all functionality
in the BCL to process text files uses UTF-8 by default.

Cheers,
--
Joerg Jooss
ne********@joergjooss.de

joerg
i can use 2bytes ,3byte...i don't have a problem
but the lenght of frédéric is len(frédéric) =10,

the lenght of frédéric should be 8 to insert correctly in sql database .
please help me jeorg
Mar 15 '06 #3

P: n/a
Thus wrote tvin,
i can use 2bytes ,3byte...i don't have a problem
but the lenght of frdric is len(frdric) =10,
the lenght of frdric should be 8 to insert correctly in sql database
. please help me jeorg


You're confusing bytes and characters. Frdric has 10 characters, but it
may have 10 or more bytes depending on the character encoding being used
-- if you were using UTF-32, it would be a whopping 40 ;-)

Doesn't your database support the nvarchar type for Unicode characters?
--
Joerg Jooss
ne********@joergjooss.de
Mar 15 '06 #4

P: n/a
Hi joerg

"Joerg Jooss" wrote:
Thus wrote tvin,
i can use 2bytes ,3byte...i don't have a problem
but the lenght of frédéric is len(frédéric) =10,
the lenght of frédéric should be 8 to insert correctly in sql database
. please help me jeorg


You're confusing bytes and characters. Frédéric has 10 characters, but it
may have 10 or more bytes depending on the character encoding being used
-- if you were using UTF-32, it would be a whopping 40 ;-)

Doesn't your database support the nvarchar type for Unicode characters?
--
Joerg Jooss
ne********@joergjooss.de

jeorg ,my parameters are nvarchar in sql database.but
Frédéric which was insert in sql is like this:"Frédéric " i used
trim(chr(31),chr(30))
but the result is the same.
what should i do to insert Frédéric in sql like this "Frédéric".
i feel that the problem is when i open the .txt file to bring Frédéric,i
brought it like 10 character but it should be 8 to insert correctly in sql
database
Mar 16 '06 #5

P: n/a
tvin <tv**@discussions.microsoft.com> wrote:
jeorg ,my parameters are nvarchar in sql database.but
Fr?d?ric which was insert in sql is like this:"Fr?d?ric " i used
trim(chr(31),chr(30))
but the result is the same.
what should i do to insert Fr?d?ric in sql like this "Fr?d?ric".
i feel that the problem is when i open the .txt file to bring Fr?d?ric,i
brought it like 10 character but it should be 8 to insert correctly in sql
database


When you say "when I open the .txt file" - what are you using to open
it? You need to use something that understands UTF-8.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 16 '06 #6

P: n/a
hi jon

"Jon Skeet [C# MVP]" wrote:
tvin <tv**@discussions.microsoft.com> wrote:
jeorg ,my parameters are nvarchar in sql database.but
Fr?d?ric which was insert in sql is like this:"Fr?d?ric " i used
trim(chr(31),chr(30))
but the result is the same.
what should i do to insert Fr?d?ric in sql like this "Fr?d?ric".
i feel that the problem is when i open the .txt file to bring Fr?d?ric,i
brought it like 10 character but it should be 8 to insert correctly in sql
database


When you say "when I open the .txt file" - what are you using to open
it? You need to use something that understands UTF-8.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

i am using this code to read test.txt file which was saved like utf-8.
Dim file As New System.IO.StreamReader("c:\test.txt")
Dim words As String = file.ReadToEnd()
Console.WriteLine(words)
file.Close()
I think that the problem is when i am reading the file because the length of
"frédéric" is 10, len(words)=10 ,after reading ,i think that "words" should
be 8(because i have 8 character) to convert correctly to unicode and to
insert correctly in sql database.

after read test.txt i convert this word to unicode with this code:

Dim uni As New UnicodeEncoding()

Dim encodedBytes As Byte() = uni.GetBytes(words)

Dim decodedString As String = uni.GetString(encodedBytes)

please help me

Mar 16 '06 #7

P: n/a
tvin <tv**@discussions.microsoft.com> wrote:
When you say "when I open the .txt file" - what are you using to open
it? You need to use something that understands UTF-8.
i am using this code to read test.txt file which was saved like utf-8.
Dim file As New System.IO.StreamReader("c:\test.txt")
Dim words As String = file.ReadToEnd()
Console.WriteLine(words)
file.Close()


Okay - that will be reading it as a UTF-8 file.
I think that the problem is when i am reading the file because the length of
"fr?d?ric" is 10, len(words)=10 ,after reading ,i think that "words" should
be 8(because i have 8 character) to convert correctly to unicode and to
insert correctly in sql database.
I suspect the problem isn't the accents at all - my guess is that
you've got a carriage return and line feed at the end of the file, and
*that's* what's making the length 10.
after read test.txt i convert this word to unicode with this code:

Dim uni As New UnicodeEncoding()
Dim encodedBytes As Byte() = uni.GetBytes(words)
Dim decodedString As String = uni.GetString(encodedBytes)


No, you don't need to do that. The above is effectively a no-op - *all*
strings in .NET are in Unicode.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 16 '06 #8

P: n/a
jon thx for aiding me
but
when i use another word instead of "frédéric" i don't have any problem.
when i insert "frédéric" in sql,its like this "frédéric "

well,i don't know wath is the problem.

thx jon
"Jon Skeet [C# MVP]" wrote:
tvin <tv**@discussions.microsoft.com> wrote:
When you say "when I open the .txt file" - what are you using to open
it? You need to use something that understands UTF-8.


i am using this code to read test.txt file which was saved like utf-8.
Dim file As New System.IO.StreamReader("c:\test.txt")
Dim words As String = file.ReadToEnd()
Console.WriteLine(words)
file.Close()


Okay - that will be reading it as a UTF-8 file.
I think that the problem is when i am reading the file because the length of
"fr?d?ric" is 10, len(words)=10 ,after reading ,i think that "words" should
be 8(because i have 8 character) to convert correctly to unicode and to
insert correctly in sql database.


I suspect the problem isn't the accents at all - my guess is that
you've got a carriage return and line feed at the end of the file, and
*that's* what's making the length 10.
after read test.txt i convert this word to unicode with this code:

Dim uni As New UnicodeEncoding()
Dim encodedBytes As Byte() = uni.GetBytes(words)
Dim decodedString As String = uni.GetString(encodedBytes)


No, you don't need to do that. The above is effectively a no-op - *all*
strings in .NET are in Unicode.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Mar 16 '06 #9

P: n/a
Thus wrote Joerg,
Thus wrote tvin,
i can use 2bytes ,3byte...i don't have a problem
but the lenght of frdric is len(frdric) =10,
the lenght of frdric should be 8 to insert correctly in sql
database
. please help me jeorg

You're confusing bytes and characters. Frdric has 10 characters, but
it may have 10 or more bytes depending on the character encoding being
used


Oops, the old counting issue crept up again... make that 8 characters ;-)
--
Joerg Jooss
ne********@joergjooss.de
Mar 16 '06 #10

P: n/a
tvin <tv**@discussions.microsoft.com> wrote:
when i use another word instead of "fr?d?ric" i don't have any problem.
when i insert "fr?d?ric" in sql,its like this "fr?d?ric "

well,i don't know wath is the problem.


See http://www.pobox.com/~skeet/csharp/d...ngunicode.html

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Mar 16 '06 #11

This discussion thread is closed

Replies have been disabled for this discussion.