473,320 Members | 1,990 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

French characters not recognised in C?

Hi,

In the debugger at run time, characters like é are not recognised by
their normal ASCII number, but something like -8615722... . I've seen
this number before, it means "rubbish" right?

So how can I possible modify my program so that french characters get
recognised?

Thanks in advance,
Ehsan.
Nov 14 '05 #1
6 2567

On Thu, 1 Apr 2004, Ess355 wrote:

In the debugger at run time, characters like é are not recognised by
their normal ASCII number, but something like -8615722... .
That doesn't make a whole lot of sense. What do you mean, "characters
....are not recognized by their normal ASCII number"? First of all,
é doesn't *have* an ASCII number. Second, assuming you've
picked an encoding somehow and you're expecting to see é displayed
correctly, what's going wrong?
Do you type é at the keyboard and your program doesn't recognize
it?
Do you type é in your source code and it doesn't display
correctly?
Do you type é in your source code and it refuses to compile at
all?

In general, the C programming language only deals with a very restricted
"basic character set," which doesn't contain things like é. If
you want to display or process that sort of input or output, you'll need
to either find a compiler with nice language support; find a library that
handles your national encoding(s) or Unicode; or roll your own library.
'wchar_t' and the wchar functions might be useful to you, too; read the
manpages for them or Google 'wchar_t manpage' for details.
So how can I possible modify my program so that french characters get
recognised?


Depending on what exactly your problem is, you might try:

* Posting to fr.comp.lang.c or another French-language group.
* Getting a better compiler.
* Using 'wchar_t' in place of 'char'.
* Using a translation library that can convert between French encodings
and a useful ASCII encoding of the same text, e.g.: é -> \'e

If you post a complete, compilable, minimal program that demonstrates
the problem, someone here might be able to help you more. But
fr.comp.lang.c sounds like a better bet to me.

HTH,
-Arthur

Nov 14 '05 #2
On Thu, 01 Apr 2004 21:21:06 -0500, Ess355 wrote:
In the debugger at run time, characters like é are not recognised by
their normal ASCII number, but something like -8615722... . I've seen
this number before, it means "rubbish" right?

So how can I possible modify my program so that french characters get
recognised?


By default, most platforms (all?) will execute programs in the "C"
locale which only supports ASCII. ASCII is a 7bit encoding/charset that
does not support european characters. You might try adding a call to
setlocale like:

setlocale(LC_CTYPE, "");

This will check some environment variables to determine the locale
your running in. You can force a specific locale like setlocal(LC_ALL,
"fr_FR") but you may or may not want to do that depending on the source
of the characters.

Or you might need to run the debugger in a different locale. For example
on Unix systems a very simple way to run a program in a different locale
is by preceeding the command with an environment variable like:

$ LC_CTYPE=fr_CA dbug ./myproggie

Mike
Nov 14 '05 #3
In <pa*********************************@ioplex.com> Michael B Allen <mb*****@ioplex.com> writes:
On Thu, 01 Apr 2004 21:21:06 -0500, Ess355 wrote:
In the debugger at run time, characters like é are not recognised by
their normal ASCII number, but something like -8615722... . I've seen
this number before, it means "rubbish" right?

So how can I possible modify my program so that french characters get
recognised?
By default, most platforms (all?) will execute programs in the "C"
locale which only supports ASCII.


Nope. By default most platforms will use one 8-bit extension to ASCII or
another in the "C" locale. The others will use one EBCDIC flavour (code
page) or another. In principle, one could attach a KSR-33 to a serial
port (and figure out how to set the speed of that port to 110 bps), just
to prove me wrong ;-)

This can be easily tested with a trivial program like this:

#include <stdio.h>

int main()
{
printf("\376\375\374 Hello world\n");
return 0;
}
ASCII is a 7bit encoding/charset that
does not support european characters. You might try adding a call to
setlocale like:

setlocale(LC_CTYPE, "");
You're really naive if you believe that this will change the character
set used by the implementation. It will merely change the behaviour of
certain functions that are affected by the current locale.

In practice, it is the user's job to select a character set suitable for
his locale and to set the default native locale accordingly.
This will check some environment variables to determine the locale
your running in. You can force a specific locale like setlocal(LC_ALL,
"fr_FR") but you may or may not want to do that depending on the source
of the characters.
1. Where did you get the idea that "fr_FR" is a valid locale name from?
May I have the chapter and verse?

2. If the user has a Russian terminal, selecting a French locale won't
make Latin-1 characters appear as intended.
Or you might need to run the debugger in a different locale. For example
on Unix systems a very simple way to run a program in a different locale
is by preceeding the command with an environment variable like:

$ LC_CTYPE=fr_CA dbug ./myproggie


Let's see:

fangorn:~ 171> LC_CTYPE=fr_CA dbug ./myproggie
LC_CTYPE=fr_CA: Command not found.

Doesn't Linux count as a Unix system any more? ;-)

The issue is very simple in practice, but extremely difficult to describe
in terms of what the C standard actually says. Each new C programmer
should to a bit of experimenting, using programs like the one shown above,
to see what happens when values above 127 (and, for pragmatic reasons,
the range 128 - 159 should be avoided) are used as (unsigned) character
values.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 14 '05 #4
On Fri, 02 Apr 2004 07:20:32 -0500, Dan Pop wrote:
In the debugger at run time, characters like é are not recognised by
their normal ASCII number, but something like -8615722... . I've seen
this number before, it means "rubbish" right?

So how can I possible modify my program so that french characters get
recognised?
By default, most platforms (all?) will execute programs in the "C"
locale which only supports ASCII.


Nope. By default most platforms will use one 8-bit extension to ASCII
or another in the "C" locale. The others will use one EBCDIC flavour
(code page) or another. In principle, one could attach a KSR-33 to a
serial port (and figure out how to set the speed of that port to 110
bps), just to prove me wrong ;-)

This can be easily tested with a trivial program like this:

#include <stdio.h>

int main()
{
printf("\376\375\374 Hello world\n"); return 0;
}


Why do you think this will give you the default behavior? If you run
this on a fancy machine with extravagant libraries and locales available
it will likely give you different results depending on what the default
locale is. On my system this will print Latin1.
ASCII is a 7bit encoding/charset that does not support european
characters. You might try adding a call to setlocale like:

setlocale(LC_CTYPE, "");


You're really naive if you believe that this will change the character
set used by the implementation. It will merely change the behaviour of
certain functions that are affected by the current locale.


What do you mean by "used by the implementation"? The OP said "at run
time". On my system if I do:

$ LANG=en_US.UTF-8 ./myproggie

it indeed changes the behavior of how characters are interpreted
at runtime. I said nothing about the charset or encoding used by the
compiler or how string literal are stored in binaries.
Let's see:

fangorn:~ 171> LC_CTYPE=fr_CA dbug ./myproggie LC_CTYPE=fr_CA:
Command not found.

Doesn't Linux count as a Unix system any more? ;-)


Actually I meant LANG=fr_CA but this is clearly a shell feature so let's
not get too pedantic about it. You've embarrassed yourself enough by
acknowledging you use C shell :->

Mike
Nov 14 '05 #5
Michael B Allen <mb*****@ioplex.com> wrote:
On Fri, 02 Apr 2004 07:20:32 -0500, Dan Pop wrote:
[ Quoting was buggered up-stream; the next bit is by Michael B Allen. ]
By default, most platforms (all?) will execute programs in the "C"
locale which only supports ASCII.


Nope. By default most platforms will use one 8-bit extension to ASCII
or another in the "C" locale. The others will use one EBCDIC flavour
(code page) or another. In principle, one could attach a KSR-33 to a
serial port (and figure out how to set the speed of that port to 110
bps), just to prove me wrong ;-)

This can be easily tested with a trivial program like this:

#include <stdio.h>

int main()
{
printf("\376\375\374 Hello world\n"); return 0;
}


Why do you think this will give you the default behavior?


It must, if compiled in ISO C mode. All programs start in the "C"
locale. Even so...
If you run this on a fancy machine with extravagant libraries and
locales available it will likely give you different results depending
on what the default locale is. On my system this will print Latin1.


....even so, the char types must be at least 8-bit, which means that
plain ASCII, being 7-bit, is out of the race from the start. Your
default character set _must_ be either an (at least 8-bit) extension to
ASCII, or something else entirely (most usually EBCDIC, which itself is
rare enough, but not entirely unheard of).
IOW, Dan's '\376' et al. must specify a valid member of the character
set, even though they are not part of ASCII.
fangorn:~ 171> LC_CTYPE=fr_CA dbug ./myproggie LC_CTYPE=fr_CA:
Command not found.

Doesn't Linux count as a Unix system any more? ;-)


Actually I meant LANG=fr_CA but this is clearly a shell feature so let's
not get too pedantic about it. You've embarrassed yourself enough by
acknowledging you use C shell :->


And what other shell did you expect to see used in _this_ newsgroup,
then <g>?

Richard
Nov 14 '05 #6
In <pa**********************************@ioplex.com > Michael B Allen <mb*****@ioplex.com> writes:
On Fri, 02 Apr 2004 07:20:32 -0500, Dan Pop wrote:
In the debugger at run time, characters like é are not recognised by
their normal ASCII number, but something like -8615722... . I've seen
this number before, it means "rubbish" right?

So how can I possible modify my program so that french characters get
recognised?

By default, most platforms (all?) will execute programs in the "C"
locale which only supports ASCII.
Nope. By default most platforms will use one 8-bit extension to ASCII
or another in the "C" locale. The others will use one EBCDIC flavour
(code page) or another. In principle, one could attach a KSR-33 to a
serial port (and figure out how to set the speed of that port to 110
bps), just to prove me wrong ;-)

This can be easily tested with a trivial program like this:

#include <stdio.h>

int main()
{
printf("\376\375\374 Hello world\n"); return 0;
}


Why do you think this will give you the default behavior? If you run
this on a fancy machine with extravagant libraries and locales available
it will likely give you different results depending on what the default
locale is.


Because this program runs in the "C" locale, reagrdless of what the
default locale is. It's the default font/character set that will
determine it's output, not the default locale. I can set the default
locale to an English locale using Latin1, but if the font currently
used by the terminal where the program generates its output is Latin2,
I'm not going to see Latin1 output.
On my system this will print Latin1.
More likely, it will simply output some character codes and let an entity
external to the implementation to decide what character set to use.

On my system, I can switch between Latin1 and Latin2 fonts in an
xterm window with the mouse. Therefore, I can alter the program output
even *after* running the program, by selecting another font for that
window. The only invariant is the character codes output by the program.
This is *not* a locale issue at all.
ASCII is a 7bit encoding/charset that does not support european
characters. You might try adding a call to setlocale like:

setlocale(LC_CTYPE, "");


You're really naive if you believe that this will change the character
set used by the implementation. It will merely change the behaviour of
certain functions that are affected by the current locale.


What do you mean by "used by the implementation"? The OP said "at run
time". On my system if I do:

$ LANG=en_US.UTF-8 ./myproggie

it indeed changes the behavior of how characters are interpreted
at runtime.


But does it have *any* effect on what appears on your screen?
I said nothing about the charset or encoding used by the
compiler or how string literal are stored in binaries.
Let's see:

fangorn:~ 171> LC_CTYPE=fr_CA dbug ./myproggie LC_CTYPE=fr_CA:
Command not found.

Doesn't Linux count as a Unix system any more? ;-)
Actually I meant LANG=fr_CA but this is clearly a shell feature so let's
not get too pedantic about it.


Confusing Unix features and shell features is quite embarrassing, for a
Unix user...
You've embarrassed yourself enough by acknowledging you use C shell :->


I am NOT using C shell ;-)

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 14 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Salgoud Dell | last post by:
I have a VB6 application running on a French computer using Windows ME. The app doesn't work the same way as it does when running on an English Windows OS. The main issue I have, that I can't...
0
by: Mirza Khodabaccus | last post by:
Hi, Thanks for the reply that Mr John Robert sent me, but i have already tried the urlencode and urldecode but it did not work. my question is that, I am having a big problem reading french...
8
by: Ess355 | last post by:
Hi, In the debugger at run time, characters like é are not recognised by their normal ASCII number, but something like -8615722... . I've seen this number before, it means "rubbish" right? So...
6
by: Vinoth | last post by:
Hi, Our Website is french website. When we search in google for our site its showing the title name with some characters like -- droit propriété industrielle, protection propriétà ... But...
4
by: Lu | last post by:
Hi, i am currently working on ASP.Net v1.0 and is encountering the following problem. In javascript, I'm passing in: "somepage.aspx?QSParameter=<RowID>Chèques</RowID>" as part of the query...
9
by: kaustubh.deo | last post by:
I am facing issues printing french chars like using printf function. I have reproduced this issue with simple C program as follows. #include <stdio.h> #include <locale.h> int main(int...
1
by: prasadoo | last post by:
Hi everyone, I am trying to populate DropDownList with French characters but for some characters like é,d' etc.it is not showing the exacts French characters. Can anybody help me how to populate...
7
chunk1978
by: chunk1978 | last post by:
i though i had this under control, but i'm completely lost... i have a form which allows a user to fill out information, which is then emailed to myself as well as themselves... the form is...
2
by: sono | last post by:
Today I read there is a possibility to write direct Fench characters in HTML. Thus instead of having to write : L'&eacute;l&egrave;ve va &agrave; l'&eacute;cole et ceci est &eacute;crit en...
0
by: shintu | last post by:
Hallo, I am trying to write french accented characters é è ê in Excel worksheet using my perl script , But I am stuck here as I couldnt find a way of writing it !: My code: use strict;...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.