473,893 Members | 1,567 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Unicode and stream

Hello.

I have compiler BC Builder 6.0.

I have an example:

#include <strstrea.h>

int main () {
wchar_t ff [10] = {' s','d ', 'f', 'g', 't'};
istrstream b1 (ff);
return 0;
}

This example have compile error.
Error message: Could not find a match for ' istrstream:: istrstream (wchar_t *).

Questions:

1. Can I have a Unicode stream?
2. If it is impossible, can I work with Unicode without the OS tools?
I want work with Unicode only by language tools.
3. Is there the other compilers with support Unicode streams?
4. What is about Unicode stream in the standard?

Thank Basil
Jul 22 '05 #1
4 3175
Dietmar Kuehl wrote:

If a user
starts using e.g. a 'std::wstring' to hold Unicode characters, he is
probably in for a few surprises, even if 'wchar_t' is large enough to
accomodate UCS-32! For example, the 'size()' function does no longer
count the number of "glyphs" (what is normally considered to be a
character) because e.g. a u-umlaut (the second character of my last
name) is not necessarily represented by one character but possible
encoded as the "u" character followed by the umlaut composing
character.


Unicode does not deal with glyphs. Just ask 'em! A 32 bit wide character
is large enough to hold all Unicode characters. All implementations of
Unicode have to deal with combining characters. This isn't a C++ issue.

--

Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)
Jul 22 '05 #2
Pete Becker wrote:
Unicode does not deal with glyphs. Just ask 'em!
Effectively, a glyph is what a user wants see at some point and in the
description of combining characters (Unicode 4.0, section 2.10) they
definitely talk about glyphs. Also, whether they deal with them or not
is not really that relevant at all: for example, if you count the
"characters " in my name (correctly written; since enough programs get
it
wrong I use a common transformation in most electronic conversation)
you want to get four, independent on whether the "u-umlaut" Unicode
character or a "u" character and a "umlaut" combining character is
used.
If you used a 'std::wstring' to represent the Unicode characters, you
would get four or five depending on what some software choose to
represent the "u-umlaut".
A 32 bit wide character is large enough to hold all Unicode characters.

I didn't dispute this. However, some Unicode sequences don't make any
sense if you rip apart certain characters, notably the combination of
a Unicode character and a following combining character (which are two
Unicode characters if I got things right).
All implementations of
Unicode have to deal with combining characters. This isn't a C++

issue.

I didn't claim that it is an issue specific to C++. I just pointed out
that the C and C++ libraries do not provide any help in processing
Unicode. In particular, the view taken by the these libraries with
respect to character processing (which does not include the code
conversion facilities, IMO, as these operate on bytes rather than on
characters) is that each character is a fixed sized unit, e.g. of
type 'char' or 'wchar_t' (these two character types are directly
supported; user might choose to use e.g. 'long' if their implementation
has choosen to use a 16 bit entity for 'wchar_t' but this would imply
that they provide a whole bunch of stuff, e.g. suitable facets) and
Unicode does not exactly fit this description, not even UCS-4
(I erronously labeled UCS-4 "UCS-32" in an earlier article). ... and
I think it *is* a C++ issue that C++ has no real Unicode support. Of
course, this *is* also an issue for various other languages - despite
the claims of some proponents of such other languages that the language
has proper Unicode support.
--
<mailto:di***** ******@yahoo.co m> <http://www.dietmar-kuehl.de/>
<http://www.contendix.c om> - Software Development & Consulting

Jul 22 '05 #3
Dietmar Kuehl wrote:

I didn't dispute this. However, some Unicode sequences don't make any
sense if you rip apart certain characters, notably the combination of
a Unicode character and a following combining character (which are two
Unicode characters if I got things right).


No, that makes perfect sense: it's two Unicode characters, the first
being, say, LATIN SMALL LETTER U (0x0075), and the second being
COMBINING DIAERESIS (0x0308). If you're concerned about keeping those
two Unicode characters together, replace them with the single character
LATIN SMALL LETTER U WITH DIAERESIS (0x00fc).

The point is that in Unicode every code point (i.e. valid numeric value
in a 32-bit representation) always means the same thing; you don't have
to look at context to figure out what it means. That's the basic
requirement for wchar_t, as well. It's not the case for char, though,
because the meaning of a single code point can depend on what comes
after it (first byte in a multi-byte character) or what came before it
(with shift encodings and with the second or subsequent bytes in a
multi-byte character).

As to glyphs, they involve a great deal more than what we might call a
"letter". From the Unicode standard:

The difference between identifying a code value and rendering it
on screen or paper is crucial to understanding the Unicode
Standard's role in text processing. The character identified by
a Unicode value is an abstract entity, such as "LATIN CAPITAL
LETTER A" or "BENGALI DIGIT 5". The mark made on screen or paper,
called a glyph, is a visual representation of the character.

--

Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)
Jul 22 '05 #4
Pete Becker wrote:

As to glyphs, they involve a great deal more than what we might call a
"letter". From the Unicode standard:

The difference between identifying a code value and rendering it
on screen or paper is crucial to understanding the Unicode
Standard's role in text processing. The character identified by
a Unicode value is an abstract entity, such as "LATIN CAPITAL
LETTER A" or "BENGALI DIGIT 5". The mark made on screen or paper,
called a glyph, is a visual representation of the character.


Sorry, thinking too slowly today. I was trying to suggest that we use
different terminology, because "glyph" really isn't what you're talking
about. That's why I said "letter". I think it gets at what we're talking
about: 'u-umlaut', whether it's represented by two Unicode characters or
one, is a single letter, and it's not 'u'. At least, most of the time
it's not. <g>

--

Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)
Jul 22 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
17627
by: Michael Weir | last post by:
I'm sure this is a very simple thing to do, once you know how to do it, but I am having no fun at all trying to write utf-8 strings to a unicode file. Does anyone have a couple of lines of code that - opens a file appropriately for output - writes to this file Thanks very much. Michael Weir
14
2868
by: wolfgang haefelinger | last post by:
Hi, I wonder whether someone could explain me a bit what's going on here: import sys # I'm running Mandrake 1o and Windows XP. print sys.version ## 2.3.3 (#2, Feb 17 2004, 11:45:40)
22
5526
by: Keith MacDonald | last post by:
Hello, Is there a portable (at least for VC.Net and g++) method to convert text between wchar_t and char, using the standard library? I may have missed something obvious, but the section on codecvt, in Josuttis' "The Standard C++ Library", did not help, and I'm still awaiting delivery of Langer's "Standard C++ IOStreams and Locales". Thanks,
1
27267
by: Jim P. | last post by:
I have a client server set of apps that can connect through socets and send data back and forth. I'm trying to get it to send XML messages back and both. Currently it works as string data. I collect all of the incoming data to a string but when I try to parse the incoming XML I get the following message: ------------------------------------------- Error Parsing message: System.Xml.Exception: There is no Unicode byte order mark. ...
9
13438
by: Charles F McDevitt | last post by:
I'm trying to upgrade some old code that used old iostreams. At one place in the code, I have a path/filename in a wchar_t string (unicode utf-16). I need to open an ifstream to that file. But the open() on ifstream only takes char * strings (mbcs?). In old iostreams, I could _wopen() the file, get the filedesc, and call attach() on the ifstream.
10
8129
by: Nikolay Petrov | last post by:
How can I convert DOS cyrillic text to Unicode
18
34164
by: Ger | last post by:
I have not been able to find a simple, straight forward Unicode to ASCII string conversion function in VB.Net. Is that because such a function does not exists or do I overlook it? I found Encoding.Convert, but that needs byte arrays. Thanks, /Ger
17
4215
by: Stuart McGraw | last post by:
In the announcement for Python-2.3 http://groups.google.com/group/comp.lang.python/msg/287e94d9fe25388d?hl=en it says "raw_input(): can now return Unicode objects". But I didn't see anything about this in Andrew Kuchling's "2.3 What's New", nor does the current python docs for raw_input() say anything about this. A test on a MS Windows system with a cp932 (japanese) default locale shows the object returned by raw_input() is a str()...
18
620
by: Chameleon | last post by:
I am trying to #define this: #ifdef UNICODE_STRINGS #define UC16 L typedef wstring String; #else #define UC16 typedef string String; #endif ....
1
3420
by: Victor Lin | last post by:
Hi, I'm writting a application using python standard logging system. I encounter some problem with unicode message passed to logging library. I found that unicode message will be messed up by logging handler. piese of StreamHandler: try: self.stream.write(fs % msg) except UnicodeError:
0
9982
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9830
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
11243
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10837
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10927
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9645
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
8021
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5858
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
4280
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.