473,382 Members | 1,424 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

Regarding UTF16

Hi ,

I wanted information about UTF16 format and what is disadvantage over UTF8
format..

TIA

Mohan
Feb 2 '06 #1
5 1617
news.fe.internet.bosch.com wrote:
Hi ,

I wanted information about UTF16 format and what is disadvantage over UTF8
format..

This group (c.l.c) is the wrong place to ask.

I believe comp.software.international is (followup-to added).

Cheers

Vladimir

Feb 2 '06 #2
Try:

http://www.unicode.org/

Hope this help.

Feb 2 '06 #3
in comp.lang.c i read:
I wanted information about UTF16 format and what is disadvantage over UTF8
format..


in c the most notable difference would be that utf-16 would usually be
composed of two bytes -- in c bytes need not be 8 bits, but it is very
common -- so then fully half the code space has a byte whose value is 0. a
sequence of such codes cannot be treated as a string. a utf-8 sequence can
be treated as a normal string, and these days it is a common form for an
implementation's mbcs.

in c we also have wide characters and wide character strings. there is no
requirement that the encoding be utf-16 -- some implementations use it,
some do not; these days i would expect utf-32 (or ucs-4 -- yuck!) the more
common. with a wide character string the embedded null byte pitfall is
avoided but there is other effort required to make them work well.

--
a signature
Feb 5 '06 #4
On 2006-02-05, those who know me have no need of my name <no****************@usa.net> wrote:
in comp.lang.c i read:
I wanted information about UTF16 format and what is disadvantage over UTF8
format..


in c the most notable difference would be that utf-16 would usually be
composed of two bytes -- in c bytes need not be 8 bits, but it is very
common -- so then fully half the code space has a byte whose value is 0. a


Actually it's roughly one in 128. Of the set of 16-bit values:

There are a total of 65536 values. There are 510 that have exactly one 0
byte, exactly 1 that has two 0 bytes, and 255*255=65025 that do not
contain a 0 byte.

However, the area with a first byte of 0 and a second byte between 32
and 126 are considered "the most important" for traditional reasons, and
this encompasses the entire basic execution character set.
Feb 5 '06 #5
in comp.lang.c i read:
On 2006-02-05, those who know me have no need of my name
<no****************@usa.net> wrote:
in c [...] utf-16 would usually be composed of two bytes [...] so then
fully half the code space has a byte whose value is 0.


Actually it's roughly one in 128. Of the set of 16-bit values:

There are a total of 65536 values. There are 510 that have exactly one 0
byte, exactly 1 that has two 0 bytes, and 255*255=65025 that do not
contain a 0 byte.


err, oops -- thanks for the catch!
However, the area with a first byte of 0 and a second byte between 32
and 126 are considered "the most important" for traditional reasons, and
this encompasses the entire basic execution character set.


just where my mind was at, unfortunately.

--
a signature
Feb 12 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Francis Lavoie | last post by:
Hello I have some questions regarding webframework, I must say that I quite lost and these questions are basicly to help me understand the way it work. I have some knowledge with PHP and JSP....
2
by: Xah Lee | last post by:
i have a bunch of files encoded in GB18030. Is there a way to convert them to utf16 with python? Xah xah@xahlee.org http://xahlee.org/PageTwo_dir/more.html
3
by: praba kar | last post by:
Dear All, I am new to Python. I am in need of some sorting functions (eg) numerical sorting functions and alphapetical sorting functions. I have searched through net But I cannot find any...
1
by: John Perks and Sarah Mount | last post by:
(My Python uses UTF16 natively; can someone with UTF32 Python let me know if that behaves differently?) >>> import codecs >>> u'\ud800' # part of surrogate pair u'\ud800'...
12
by: Chris Mullins | last post by:
I'm implementing RFC 3491 in .NET, and running into a strange issue. Step 1 of RFC 3491 is performing a set of mappings dicated by tables B.1 and B.2. I'm having trouble with the following...
8
by: Mike | last post by:
Hello, I have a few rather urgent questions that I hope someone can help with (I need to figure this out prior to a meeting tomorrow.) First, a bit of background: The company I work for is...
4
by: Fuzzyman | last post by:
Hello all, I'm handling some text files where I don't (necessarily) know the encoding beforehand. Because I use regular expressions to parse the text I *must* decode UTF16 encoded text...
4
by: R Wood | last post by:
Greetings - A recent Perl experiment hasn't turned out so well, which has piqued my interest in Python. The project is this: take a Vcard file exported from Apple's Addressbook and use a...
1
by: Server Applications | last post by:
Hello I am trying to build a system where I can full-text index documents with UTF8 or UTF16 data using Oracle Text. I am doing the filtering in a third-party component outside the database, so...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.