473,395 Members | 1,527 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Problem with UTF-8

I'm designing a C++ application for the web (with FastCGI) and it has
to use UTF-8 because there will be users who will type Asian glyphs.
When I compile the application, if I use ANSI, no problem, it compiles
properly. But if I save the files as UTF-8, I get this error message:

%g++ -o cgi-bin/test.fcgi test.cpp
test.csp.cpp:1: error: stray '\239' in program
test.csp.cpp:1: error: stray '\187' in program
test.csp.cpp:1: error: stray '\191' in program
test.csp.cpp:1: error: invalid token
test.csp.cpp:1: error: expected constructor, destructor, or type
conversion before '<' token
test.csp.cpp: In function `int main()':
test.csp.cpp:5: error: `cout' was not declared in this scope
test.csp.cpp:5: error: `endl' was not declared in this scope
%

I guess this is because UTF-8 format adds some extra info in the
header of the file. Do you know how I could use UTF-8 with my
application? Other than that, do some of you use C++ and FastCGI? What
do you think? So far I've been really pleased with the low resource
usage and with the outstanding speed. Thanks.

Charles.

Nov 5 '07 #1
7 5015
On Nov 5, 12:21 pm, Charles <landema...@gmail.comwrote:
I'm designing a C++ application for the web (with FastCGI) and it has
to use UTF-8 because there will be users who will type Asian glyphs.
When I compile the application, if I use ANSI, no problem, it compiles
properly. But if I save the files as UTF-8, I get this error message:

%g++ -o cgi-bin/test.fcgi test.cpp
test.csp.cpp:1: error: stray '\239' in program
test.csp.cpp:1: error: stray '\187' in program
test.csp.cpp:1: error: stray '\191' in program
test.csp.cpp:1: error: invalid token
test.csp.cpp:1: error: expected constructor, destructor, or type
conversion before '<' token
test.csp.cpp: In function `int main()':
test.csp.cpp:5: error: `cout' was not declared in this scope
test.csp.cpp:5: error: `endl' was not declared in this scope
%

I guess this is because UTF-8 format adds some extra info in the
header of the file. Do you know how I could use UTF-8 with my
application?
You should process UTF-8 encoded data wthout a need to save your
source files in that encoding. For instance, take a look at
http://utfcpp.sourceforge.net/

Nov 5 '07 #2
On Nov 5, 2:55 pm, Nemanja Trifunovic <ntrifuno...@hotmail.comwrote:
You should process UTF-8 encoded data wthout a need to save your
source files in that encoding. For instance, take a look athttp://utfcpp.sourceforge.net/
Nice, thanks.

Charles.

Nov 5 '07 #3
On Nov 5, 6:21 pm, Charles <landema...@gmail.comwrote:
I'm designing a C++ application for the web (with FastCGI) and it has
to use UTF-8 because there will be users who will type Asian glyphs.
When I compile the application, if I use ANSI, no problem, it compiles
properly. But if I save the files as UTF-8, I get this error message:
%g++ -o cgi-bin/test.fcgi test.cpp
test.csp.cpp:1: error: stray '\239' in program
test.csp.cpp:1: error: stray '\187' in program
test.csp.cpp:1: error: stray '\191' in program
test.csp.cpp:1: error: invalid token
test.csp.cpp:1: error: expected constructor, destructor, or type
conversion before '<' token
test.csp.cpp: In function `int main()':
test.csp.cpp:5: error: `cout' was not declared in this scope
test.csp.cpp:5: error: `endl' was not declared in this scope
%
Something funny is going on. First, of course, if the file only
contains characters in the basic source character set, whether
it is UTF-8 or ASCII shouldn't make a difference---all of the
characters in the basic source character set are identical in
the two encodings. Even stranger, however, are the error
messages: g++ normally displays the uninterpretable character in
*octal*. But octal with an 8 or 9 in it? Something is very
strange about your g++.
I guess this is because UTF-8 format adds some extra info in
the header of the file.
It shouldn't.
Do you know how I could use UTF-8 with my application?
My editor at home is configured to use UTF-8, and it saves my
C++ files in "UTF-8". And I've never had any problems. (When I
write the comments in French, they look funny on my machine at
work, because it doesn't have any UTF-8 fonts installed, but
other than that, the compiler doesn't complain.)

Before anything else, however, I'd try to find out why your
installation of g++ is inserting 8's and 9's into its octal.
Then I'd write a very, very simple program (hello, world) with
my editor, and look at a hex dump of it, to see what it is
actually writing to the file---if the editor automatically
inserts junk you didn't insert, it may not be usable for program
development.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Nov 6 '07 #4
On Nov 6, 6:18 am, James Kanze <james.ka...@gmail.comwrote:
Before anything else, however, I'd try to find out why your
installation of g++ is inserting 8's and 9's into its octal.
Then I'd write a very, very simple program (hello, world) with
my editor, and look at a hex dump of it, to see what it is
actually writing to the file---if the editor automatically
inserts junk you didn't insert, it may not be usable for program
development.

Thanks James, will do.

Charles.

Nov 6 '07 #5
Charles wrote:
%g++ -o cgi-bin/test.fcgi test.cpp
test.csp.cpp:1: error: stray '\239' in program
test.csp.cpp:1: error: stray '\187' in program
test.csp.cpp:1: error: stray '\191' in program
test.csp.cpp:1: error: invalid token
test.csp.cpp:1: error: expected constructor, destructor, or type
conversion before '<' token
test.csp.cpp: In function `int main()':
test.csp.cpp:5: error: `cout' was not declared in this scope
test.csp.cpp:5: error: `endl' was not declared in this scope
%

I guess this is because UTF-8 format adds some extra info in the
header of the file. Do you know how I could use UTF-8 with my
application? Other than that, do some of you use C++ and FastCGI? What
do you think? So far I've been really pleased with the low resource
usage and with the outstanding speed. Thanks.

The character set of the execution is INDEPENDANT of the character
set the program is written in. C++ only has barely adequate half-assed
wide character support. You must make sure that you have no characters
not in the basic set in the source file (outside of string/character
literals).

It looks like the first line has so cruft in it. Delete it and
retype it being careful not to use any characters not in the basic
set. You may need to use a different text editor.
Nov 6 '07 #6

Charles <la********@gmail.comwrote:
I'm designing a C++ application for the web (with FastCGI) and it has
to use UTF-8 because there will be users who will type Asian glyphs.
When I compile the application, if I use ANSI, no problem, it compiles
properly. But if I save the files as UTF-8, I get this error message:

%g++ -o cgi-bin/test.fcgi test.cpp
test.csp.cpp:1: error: stray '\239' in program
test.csp.cpp:1: error: stray '\187' in program
test.csp.cpp:1: error: stray '\191' in program
Notepad inserts these very bytes in front of all utf-8 files. See

http://en.wikipedia.org/wiki/Utf-8#Windows
Nov 7 '07 #7
On Nov 7, 1:15 am, "Ole Nielsby"
<ole.niel...@tekare-you-spamminglogisk.dkwrote:
Charles <landema...@gmail.comwrote:
I'm designing a C++ application for the web (with FastCGI) and it has
to use UTF-8 because there will be users who will type Asian glyphs.
When I compile the application, if I use ANSI, no problem, it compiles
properly. But if I save the files as UTF-8, I get this error message:
%g++ -o cgi-bin/test.fcgi test.cpp
test.csp.cpp:1: error: stray '\239' in program
test.csp.cpp:1: error: stray '\187' in program
test.csp.cpp:1: error: stray '\191' in program
Notepad inserts these very bytes in front of all utf-8 files. See
http://en.wikipedia.org/wiki/Utf-8#Windows
Interesting. However, in that case, I would expect to see
'\357', '\273' and '\277' as the stray bytes, rather than the
rather wierd values he saw. (I wonder: is g++ assigning these
to a signed char, and doing the conversion to octal without
noticing that its dealing with a negative value. But I see the
correct values when I try it with g++.)

As far as I can tell, even if the compiler processed the file as
UTF-8, a BOM is illegal in a C++ program, unless the compiler
were to simply eliminate it in phase 1 (where it maps the
physical source file characters to the basic source character
set---in an implementation defined manner). It might be worth
modifying the standard to require a few more characters to be
recognized as white space: requiring '\r' and the BOM would make
life a lot easier in practice.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Nov 7 '07 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: sinasalek | last post by:
i have a problem with MySQL 4.1.x and UTF8. in version 4.0, i'm using html forms with utf8 charset for inserting unicode strings. but in version 4.1.x it is not working! if i change the charset of...
2
by: Dale Gerdemann | last post by:
I'm having trouble with Unicode encoding in DOM. As a simple example, I read in a UTF-8 encoded xml file such as: <?xml version="1.0" encoding="UTF-8" standalone="no"?> <aText>letter 'a' with...
2
by: gnv | last post by:
Hi all, I am writing a cross-browser(i.e. 6 and netscape 7.1) javascript program to save an XML file to local file system. I have an xml string like below: var xmlStr = "<?xml version="1.0"...
8
by: Demon News | last post by:
I'm trying to do a transform (Using XmlTransform class in c#) and in the Transform I'm specifying the the output xsl below: <xsl:output method="xml" encoding="UTF-8" indent="no"/> the...
7
by: Harolds | last post by:
The code below worked in VS 2003 & dotnet framework 1.1 but now in VS 2005 the pmID is evaluated to "" instead of what the value is set to: .... xmlItems.Document = pmXML // Add the pmID...
8
by: Ondrej Srubar | last post by:
Hello, in the Web.config I have <globalization requestEncoding="utf-8" responseEncoding="utf-8" />, but for one page I need to set the requestEncoding="windows-1250". If I use <%@Page...
5
by: Dany C. | last post by:
We have install a valid SSL certificate issued to www.mycompany.com on our web server running IIS 6.0 / win2003 SP1. Then we have created a sub domain pointing to the same server for our web...
5
by: Segfahlt | last post by:
I need a little help here please. I have 2 win forms user controls in 2 different projects that I'm hosting in 2 different virtual directories. The controls have been test and operate okay in...
4
by: shreshth.luthra | last post by:
Hi All, I am having a GUI which accepts a Unicode string and searches a given set of xml files for that string. Now, i have 2 XML files both of them saved in UTF-8 format, having characters...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.