472,364 Members | 2,069 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,364 software developers and data experts.

Regarding UTF16

Hi ,

I wanted information about UTF16 format and what is disadvantage over UTF8
format..

TIA

Mohan
Feb 2 '06 #1
5 1482
news.fe.internet.bosch.com wrote:
Hi ,

I wanted information about UTF16 format and what is disadvantage over UTF8
format..

This group (c.l.c) is the wrong place to ask.

I believe comp.software.international is (followup-to added).

Cheers

Vladimir

Feb 2 '06 #2
Try:

http://www.unicode.org/

Hope this help.

Feb 2 '06 #3
in comp.lang.c i read:
I wanted information about UTF16 format and what is disadvantage over UTF8
format..


in c the most notable difference would be that utf-16 would usually be
composed of two bytes -- in c bytes need not be 8 bits, but it is very
common -- so then fully half the code space has a byte whose value is 0. a
sequence of such codes cannot be treated as a string. a utf-8 sequence can
be treated as a normal string, and these days it is a common form for an
implementation's mbcs.

in c we also have wide characters and wide character strings. there is no
requirement that the encoding be utf-16 -- some implementations use it,
some do not; these days i would expect utf-32 (or ucs-4 -- yuck!) the more
common. with a wide character string the embedded null byte pitfall is
avoided but there is other effort required to make them work well.

--
a signature
Feb 5 '06 #4
On 2006-02-05, those who know me have no need of my name <no****************@usa.net> wrote:
in comp.lang.c i read:
I wanted information about UTF16 format and what is disadvantage over UTF8
format..


in c the most notable difference would be that utf-16 would usually be
composed of two bytes -- in c bytes need not be 8 bits, but it is very
common -- so then fully half the code space has a byte whose value is 0. a


Actually it's roughly one in 128. Of the set of 16-bit values:

There are a total of 65536 values. There are 510 that have exactly one 0
byte, exactly 1 that has two 0 bytes, and 255*255=65025 that do not
contain a 0 byte.

However, the area with a first byte of 0 and a second byte between 32
and 126 are considered "the most important" for traditional reasons, and
this encompasses the entire basic execution character set.
Feb 5 '06 #5
in comp.lang.c i read:
On 2006-02-05, those who know me have no need of my name
<no****************@usa.net> wrote:
in c [...] utf-16 would usually be composed of two bytes [...] so then
fully half the code space has a byte whose value is 0.


Actually it's roughly one in 128. Of the set of 16-bit values:

There are a total of 65536 values. There are 510 that have exactly one 0
byte, exactly 1 that has two 0 bytes, and 255*255=65025 that do not
contain a 0 byte.


err, oops -- thanks for the catch!
However, the area with a first byte of 0 and a second byte between 32
and 126 are considered "the most important" for traditional reasons, and
this encompasses the entire basic execution character set.


just where my mind was at, unfortunately.

--
a signature
Feb 12 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: Francis Lavoie | last post by:
Hello I have some questions regarding webframework, I must say that I quite lost and these questions are basicly to help me understand the way it work. I have some knowledge with PHP and JSP....
2
by: Xah Lee | last post by:
i have a bunch of files encoded in GB18030. Is there a way to convert them to utf16 with python? Xah xah@xahlee.org http://xahlee.org/PageTwo_dir/more.html
3
by: praba kar | last post by:
Dear All, I am new to Python. I am in need of some sorting functions (eg) numerical sorting functions and alphapetical sorting functions. I have searched through net But I cannot find any...
1
by: John Perks and Sarah Mount | last post by:
(My Python uses UTF16 natively; can someone with UTF32 Python let me know if that behaves differently?) >>> import codecs >>> u'\ud800' # part of surrogate pair u'\ud800'...
12
by: Chris Mullins | last post by:
I'm implementing RFC 3491 in .NET, and running into a strange issue. Step 1 of RFC 3491 is performing a set of mappings dicated by tables B.1 and B.2. I'm having trouble with the following...
8
by: Mike | last post by:
Hello, I have a few rather urgent questions that I hope someone can help with (I need to figure this out prior to a meeting tomorrow.) First, a bit of background: The company I work for is...
4
by: Fuzzyman | last post by:
Hello all, I'm handling some text files where I don't (necessarily) know the encoding beforehand. Because I use regular expressions to parse the text I *must* decode UTF16 encoded text...
4
by: R Wood | last post by:
Greetings - A recent Perl experiment hasn't turned out so well, which has piqued my interest in Python. The project is this: take a Vcard file exported from Apple's Addressbook and use a...
1
by: Server Applications | last post by:
Hello I am trying to build a system where I can full-text index documents with UTF8 or UTF16 data using Oracle Text. I am doing the filtering in a third-party component outside the database, so...
2
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and efficiency. While initially associated with cryptocurrencies...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge required to effectively administer and manage Oracle...
0
by: AndyPSV | last post by:
HOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and on my computerHOW CAN I CREATE AN AI with an .executable file that would suck all files in the folder and...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific technical details, Gmail likely implements measures...
0
Oralloy
by: Oralloy | last post by:
Hello Folks, I am trying to hook up a CPU which I designed using SystemC to I/O pins on an FPGA. My problem (spelled failure) is with the synthesis of my design into a bitstream, not the C++...
0
by: Carina712 | last post by:
Setting background colors for Excel documents can help to improve the visual appeal of the document and make it easier to read and understand. Background colors can be used to highlight important...
0
BLUEPANDA
by: BLUEPANDA | last post by:
At BluePanda Dev, we're passionate about building high-quality software and sharing our knowledge with the community. That's why we've created a SaaS starter kit that's not only easy to use but also...
0
by: Rahul1995seven | last post by:
Introduction: In the realm of programming languages, Python has emerged as a powerhouse. With its simplicity, versatility, and robustness, Python has gained popularity among beginners and experts...
1
by: ezappsrUS | last post by:
Hi, I wonder if someone knows where I am going wrong below. I have a continuous form and two labels where only one would be visible depending on the checkbox being checked or not. Below is the...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.