473,382 Members | 1,648 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

Problems with UTF-8 on Windows

Hello Friends,

I am working on a project to support internationalization for a
existing project.

While supporting UTF-8 I am facing a problem , while doing POC.

I have a C string
which I have declared as
const char* utf8buf = "Bienvenue à l'anglais ";

I want to supporint UTF-8 for I/0 and wchat_t strings for internal
manipulations. So I am setting locale to setlocale(LC_CTYPE,"UTF8");
before I start with the main code for string handling.

Then I am using MultiByteToWideChar (using codepage as CP_UTF8) to
convert it to wstring.

Then again before output I am converting the string back to UTF8 format
using WideCharToMultiByte.

The problem is after getting back the UTF8 string after above
conversion , when I am printing the string, I am getting "Bienvenue
l'anglais" as output , which is not same as the input utfbuf.

Does C++ string class support UTF-8 ?

In real environment , we are planning to get the UTF8 strings from
MySQL database.

How can I correct this?

Is there any other way in C/C++ to represent UTF8 strings?

Thanks,
Aman

Jan 11 '07 #1
1 2173

amandeep.bhat...@gmail.com skrev:
Hello Friends,

I am working on a project to support internationalization for a
existing project.

While supporting UTF-8 I am facing a problem , while doing POC.

I have a C string
which I have declared as
const char* utf8buf = "Bienvenue à l'anglais ";
The above is not valid utf-8.
>
I want to supporint UTF-8 for I/0 and wchat_t strings for internal
manipulations. So I am setting locale to setlocale(LC_CTYPE,"UTF8");
before I start with the main code for string handling.
Now we enter implementation defined territory.
>
Then I am using MultiByteToWideChar (using codepage as CP_UTF8) to
convert it to wstring.
And this is not C++ but Windows and thus off-topic.
>
Then again before output I am converting the string back to UTF8 format
using WideCharToMultiByte.
Once again off-topic.
>
The problem is after getting back the UTF8 string after above
conversion , when I am printing the string, I am getting "Bienvenue
l'anglais" as output , which is not same as the input utfbuf.

Does C++ string class support UTF-8 ?
Well.... the short answer is no. You will have no problem storing an
utf-8 buffer in a std::string, but accesss to individual characters is
off: string[n] might be a character, but it could also be part of an
escape sequence.
>
In real environment , we are planning to get the UTF8 strings from
MySQL database.
There is no problem getting utf-8 from a MySQL database, but I doubt
that there is any reason to store it in a std::string (but it will not
lead to an incorrect program).
>
How can I correct this?
Correct what? The problem with the missing á above could very well be
related to the fact that the string above is not valid utf8, but you
should go to the platform specific group (perhaps something like
microsoft.public.internationalization?) for that part.
>
Is there any other way in C/C++ to represent UTF8 strings?
You can store it in a variety of ways. The most natural way for many
applications would be to convert at APIs - for instance at the point
you get the data from your database. If you expect to keep large
amounts of strings in memory and if you expect UTF-8 would be a smart
internal format, you should look for a utf8-string class. Most probably
there will already be some nice classes out there and I vaguely
remember having read something about utf8-strings in boost (and that is
always the first place I look).

/Peter

Jan 11 '07 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Irmen de Jong | last post by:
Hi I'm trying to create e-mail content using the email.MIMEText module. It basically works, until I tried to send mail in non-ascii format. What I did, to test both iso-8859-15 and UTF-8...
0
by: Blah Blah | last post by:
i just thought i'd shoot out a quick email on problems i've been having with utf-8 in moving from 4.1.0 to 4.1.1. (please note that because i am using UTF-8 as my default character set, i compiled...
2
by: jan00000 | last post by:
Hi, I'm using Xalan to do some transforming of XML in Java. My problem is: I have unicode in my XML (i.e., German Umlauts (ä,ö,ü, and since they trouble me, I did not try out any other...
2
by: Ryan Gregg | last post by:
I'm having a major problem with assembly references that I keep running into, and I'm really hoping that someone can help me out. The problem occurs when I have three projects in my solution, two...
7
by: John | last post by:
I am having problems with special characters with database calls (if I'm referring to this in the right way). the problem is with apostrophes of all things. If an end user puts an apostrophe in...
5
by: Kevin Westhead | last post by:
I'm using XslTransform to apply a transform to an XML document, however I get validation problems when parsing the resulting XML document due to invalid whitespace. I'm passing in an XPathNavigator...
2
by: Guillermo Rosich Capablanca | last post by:
I have a problem with utf-8 enconding and I don't know what to do in order to make it work. I want to open a new window with excel data so the user can choose to save it local. Everything...
6
by: ThunderMusic | last post by:
Hi, We are trying to encode to ISO-8859-1, but we have problems doing it using the encoders in .NET. We get some unknown characters in some culture which comes out fine if we post (from IE) from a...
15
by: Bexm | last post by:
Hello I have searched through this forum and it seems some people are having similar problems to me but none of the fixes are fixing mine..! :( I have a table in my database that has two xml...
3
by: Klaus Herzberg | last post by:
Hi, I come from the "dark side" php/mysql and there often problems with character sets (utf-8, latin...) and storing data in datebase. Exists in the world of dot.net and ms-sql-server similiar...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.