Problem with UTF-8

Charles

I'm designing a C++ application for the web (with FastCGI) and it has
to use UTF-8 because there will be users who will type Asian glyphs.
When I compile the application, if I use ANSI, no problem, it compiles
properly. But if I save the files as UTF-8, I get this error message:

%g++ -o cgi-bin/test.fcgi test.cpp
test.csp.cpp:1: error: stray '\239' in program
test.csp.cpp:1: error: stray '\187' in program
test.csp.cpp:1: error: stray '\191' in program
test.csp.cpp:1: error: invalid token
test.csp.cpp:1: error: expected constructor, destructor, or type
conversion before '<' token
test.csp.cpp: In function `int main()':
test.csp.cpp:5: error: `cout' was not declared in this scope
test.csp.cpp:5: error: `endl' was not declared in this scope
%

I guess this is because UTF-8 format adds some extra info in the
header of the file. Do you know how I could use UTF-8 with my
application? Other than that, do some of you use C++ and FastCGI? What
do you think? So far I've been really pleased with the low resource
usage and with the outstanding speed. Thanks.

Charles.

Nov 5 '07 #1

Subscribe Post Reply

5015

Nemanja Trifunovic

On Nov 5, 12:21 pm, Charles <landema...@gmail.comwrote:

I'm designing a C++ application for the web (with FastCGI) and it has
to use UTF-8 because there will be users who will type Asian glyphs.
When I compile the application, if I use ANSI, no problem, it compiles
properly. But if I save the files as UTF-8, I get this error message:

%g++ -o cgi-bin/test.fcgi test.cpp
test.csp.cpp:1: error: stray '\239' in program
test.csp.cpp:1: error: stray '\187' in program
test.csp.cpp:1: error: stray '\191' in program
test.csp.cpp:1: error: invalid token
test.csp.cpp:1: error: expected constructor, destructor, or type
conversion before '<' token
test.csp.cpp: In function `int main()':
test.csp.cpp:5: error: `cout' was not declared in this scope
test.csp.cpp:5: error: `endl' was not declared in this scope
%

I guess this is because UTF-8 format adds some extra info in the
header of the file. Do you know how I could use UTF-8 with my
application?

You should process UTF-8 encoded data wthout a need to save your
source files in that encoding. For instance, take a look at
http://utfcpp.sourceforge.net/

Nov 5 '07 #2

Charles

On Nov 5, 2:55 pm, Nemanja Trifunovic <ntrifuno...@hotmail.comwrote:

You should process UTF-8 encoded data wthout a need to save your
source files in that encoding. For instance, take a look athttp://utfcpp.sourceforge.net/

Nice, thanks.

Charles.

Nov 5 '07 #3

James Kanze

On Nov 5, 6:21 pm, Charles <landema...@gmail.comwrote:

I'm designing a C++ application for the web (with FastCGI) and it has
to use UTF-8 because there will be users who will type Asian glyphs.
When I compile the application, if I use ANSI, no problem, it compiles
properly. But if I save the files as UTF-8, I get this error message:

%g++ -o cgi-bin/test.fcgi test.cpp
test.csp.cpp:1: error: stray '\239' in program
test.csp.cpp:1: error: stray '\187' in program
test.csp.cpp:1: error: stray '\191' in program
test.csp.cpp:1: error: invalid token
test.csp.cpp:1: error: expected constructor, destructor, or type
conversion before '<' token
test.csp.cpp: In function `int main()':
test.csp.cpp:5: error: `cout' was not declared in this scope
test.csp.cpp:5: error: `endl' was not declared in this scope
%

Something funny is going on. First, of course, if the file only
contains characters in the basic source character set, whether
it is UTF-8 or ASCII shouldn't make a difference---all of the
characters in the basic source character set are identical in
the two encodings. Even stranger, however, are the error
messages: g++ normally displays the uninterpretable character in
*octal*. But octal with an 8 or 9 in it? Something is very
strange about your g++.

I guess this is because UTF-8 format adds some extra info in
the header of the file.

It shouldn't.

Do you know how I could use UTF-8 with my application?

My editor at home is configured to use UTF-8, and it saves my
C++ files in "UTF-8". And I've never had any problems. (When I
write the comments in French, they look funny on my machine at
work, because it doesn't have any UTF-8 fonts installed, but
other than that, the compiler doesn't complain.)

Before anything else, however, I'd try to find out why your
installation of g++ is inserting 8's and 9's into its octal.
Then I'd write a very, very simple program (hello, world) with
my editor, and look at a hex dump of it, to see what it is
actually writing to the file---if the editor automatically
inserts junk you didn't insert, it may not be usable for program
development.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Nov 6 '07 #4

Charles

On Nov 6, 6:18 am, James Kanze <james.ka...@gmail.comwrote:

Before anything else, however, I'd try to find out why your
installation of g++ is inserting 8's and 9's into its octal.
Then I'd write a very, very simple program (hello, world) with
my editor, and look at a hex dump of it, to see what it is
actually writing to the file---if the editor automatically
inserts junk you didn't insert, it may not be usable for program
development.

Thanks James, will do.

Charles.

Nov 6 '07 #5

Ron Natalie

Charles wrote:

%g++ -o cgi-bin/test.fcgi test.cpp
test.csp.cpp:1: error: stray '\239' in program
test.csp.cpp:1: error: stray '\187' in program
test.csp.cpp:1: error: stray '\191' in program
test.csp.cpp:1: error: invalid token
test.csp.cpp:1: error: expected constructor, destructor, or type
conversion before '<' token
test.csp.cpp: In function `int main()':
test.csp.cpp:5: error: `cout' was not declared in this scope
test.csp.cpp:5: error: `endl' was not declared in this scope
%

I guess this is because UTF-8 format adds some extra info in the
header of the file. Do you know how I could use UTF-8 with my
application? Other than that, do some of you use C++ and FastCGI? What
do you think? So far I've been really pleased with the low resource
usage and with the outstanding speed. Thanks.

The character set of the execution is INDEPENDANT of the character
set the program is written in. C++ only has barely adequate half-assed
wide character support. You must make sure that you have no characters
not in the basic set in the source file (outside of string/character
literals).

It looks like the first line has so cruft in it. Delete it and
retype it being careful not to use any characters not in the basic
set. You may need to use a different text editor.

Nov 6 '07 #6

Ole Nielsby

Charles <la********@gmail.comwrote:

I'm designing a C++ application for the web (with FastCGI) and it has
to use UTF-8 because there will be users who will type Asian glyphs.
When I compile the application, if I use ANSI, no problem, it compiles
properly. But if I save the files as UTF-8, I get this error message:

%g++ -o cgi-bin/test.fcgi test.cpp
test.csp.cpp:1: error: stray '\239' in program
test.csp.cpp:1: error: stray '\187' in program
test.csp.cpp:1: error: stray '\191' in program

Notepad inserts these very bytes in front of all utf-8 files. See

http://en.wikipedia.org/wiki/Utf-8#Windows

Nov 7 '07 #7

James Kanze

On Nov 7, 1:15 am, "Ole Nielsby"
<ole.niel...@tekare-you-spamminglogisk.dkwrote:

Charles <landema...@gmail.comwrote:
I'm designing a C++ application for the web (with FastCGI) and it has
to use UTF-8 because there will be users who will type Asian glyphs.
When I compile the application, if I use ANSI, no problem, it compiles
properly. But if I save the files as UTF-8, I get this error message:

%g++ -o cgi-bin/test.fcgi test.cpp
test.csp.cpp:1: error: stray '\239' in program
test.csp.cpp:1: error: stray '\187' in program
test.csp.cpp:1: error: stray '\191' in program

Notepad inserts these very bytes in front of all utf-8 files. See

http://en.wikipedia.org/wiki/Utf-8#Windows

Interesting. However, in that case, I would expect to see
'\357', '\273' and '\277' as the stray bytes, rather than the
rather wierd values he saw. (I wonder: is g++ assigning these
to a signed char, and doing the conversion to octal without
noticing that its dealing with a negative value. But I see the
correct values when I try it with g++.)

As far as I can tell, even if the compiler processed the file as
UTF-8, a BOM is illegal in a C++ program, unless the compiler
were to simply eliminate it in phase 1 (where it maps the
physical source file characters to the basic source character
set---in an implementation defined manner). It might be worth
modifying the standard to require a few more characters to be
recognized as white space: requiring '\r' and the BOM would make
life a lot easier in practice.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Nov 7 '07 #8

by: sinasalek | last post by:

i have a problem with MySQL 4.1.x and UTF8. in version 4.0, i'm using html forms with utf8 charset for inserting unicode strings. but in version 4.1.x it is not working! if i change the charset of...

PHP

Unicode problem with Java Xerces DOM

by: Dale Gerdemann | last post by:

I'm having trouble with Unicode encoding in DOM. As a simple example, I read in a UTF-8 encoded xml file such as: <?xml version="1.0" encoding="UTF-8" standalone="no"?> <aText>letter 'a' with...

.NET Framework

String encoding Converting and Save File Problem in IE

by: gnv | last post by:

Hi all, I am writing a cross-browser(i.e. 6 and netscape 7.1) javascript program to save an XML file to local file system. I have an xml string like below: var xmlStr = "<?xml version="1.0"...

Javascript

Encoding problem

by: Demon News | last post by:

I'm trying to do a transform (Using XmlTransform class in c#) and in the Transform I'm specifying the the output xsl below: <xsl:output method="xml" encoding="UTF-8" indent="no"/> the...

.NET Framework

passing param to xslt styleseet problem

by: Harolds | last post by:

The code below worked in VS 2003 & dotnet framework 1.1 but now in VS 2005 the pmID is evaluated to "" instead of what the value is set to: .... xmlItems.Document = pmXML // Add the pmID...

.NET Framework

Problem with @Page RequestEncoding

by: Ondrej Srubar | last post by:

Hello, in the Web.config I have <globalization requestEncoding="utf-8" responseEncoding="utf-8" />, but for one page I need to set the requestEncoding="windows-1250". If I use <%@Page...

ASP.NET

Problem with SSL with a sub domain

by: Dany C. | last post by:

We have install a valid SSL certificate issued to www.mycompany.com on our web server running IIS 6.0 / win2003 SP1. Then we have created a sub domain pointing to the same server for our web...

.NET Framework

Problem with WinForms User Control Hosted in IE

by: Segfahlt | last post by:

I need a little help here please. I have 2 win forms user controls in 2 different projects that I'm hosting in 2 different virtual directories. The controls have been test and operate okay in...

ASP.NET

UTF-8 encoding problem

by: shreshth.luthra | last post by:

Hi All, I am having a GUI which accepts a Unicode string and searches a given set of xml files for that string. Now, i have 2 XML files both of them saved in UTF-8 format, having characters...

Visual Basic .NET

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Problem with UTF-8

Similar topics