473,722 Members | 2,397 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

ansi c compiler character encoding

Hi!

Is it determined that the C standard compiler always encode characters
with the same character excoding? If for example the functions Foo and
Bar are compiled by different compilers, is it unambiguous how to
interpret the character string in Bar?

Does string.h expect a specific string format?

void Foo(void)
{
char myTextString[11] = "stuvxyzåäö ";
Bar(myTextStrin g);
}

void Bar(char* inp)
{
What character set to expect?
}
Aug 18 '08 #1
12 3902
Andreas Lundgren wrote:
Hi!

Is it determined that the C standard compiler always encode characters
with the same character excoding?
No.
Aug 18 '08 #2
Andreas Lundgren <d9****@efd.lth .sewrites:
Is it determined that the C standard compiler always encode characters
with the same character excoding? If for example the functions Foo and
Bar are compiled by different compilers, is it unambiguous how to
interpret the character string in Bar?

Does string.h expect a specific string format?

void Foo(void)
{
char myTextString[11] = "stuvxyzåäö ";
Bar(myTextStrin g);
}

void Bar(char* inp)
{
What character set to expect?
}
No.

But if the two compilers are being used on the same system, it's very
likely that they'll use the same encoding. Since you're calling one
function from the other, presumably you're using the compilers on the
same system and linking the resulting code into a single executable or
equivalent.

Typically a given operating system will impose representations for
certain things. Though this is outside the scope of the C standard,
it's in the best interest of compiler writers to make their generate
code work and play well with that of other compilers. (For example, a
C compiler for Linux that generates code that's incompatible with code
generated by gcc wouldn't be very useful.)

This goes far beyond character set issues and includes things like
integer and floating-point type representations and function calling
conventions.

Your later followup suggests that you're concerned about some
real-world situation, presumably on some specific system. You should
ask in a newsgroup that deals with that system.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
Aug 18 '08 #3
Keith Thompson <ks***@mib.orgw rote:
Andreas Lundgren <d9****@efd.lth .sewrites:
Is it determined that the C standard compiler always encode characters
with the same character excoding? If for example the functions Foo and
Bar are compiled by different compilers, is it unambiguous how to
interpret the character string in Bar?

Does string.h expect a specific string format?

void Foo(void)
{
char myTextString[11] = "stuvxyzåäö" ;
Bar(myTextStrin g);
}

void Bar(char* inp)
{
What character set to expect?
}
No.
But if the two compilers are being used on the same system, it's very
likely that they'll use the same encoding. Since you're calling one
function from the other, presumably you're using the compilers on the
same system and linking the resulting code into a single executable or
equivalent.
Is it actually a question about the compiler at all? As far as
I can see the compiler will happily create a string literal with
whatever there is in the string, not caring a bit about the en-
coding of the string. I guess the problem is much more one of
how the source files are generated and the expectations of the
output medium.

Consider the case of using one editor for the first file, set
to output files in e.g. one of the different (and incompatible)
russian extended ASCII code pages, and the second file genera-
ted with another editor, set to output in a different encoding.
Even if you use the same compiler this should lead to trouble.
And if then the terminal that receives the output of the pro-
gram is set to a third encoding it becomes a complete mess;-)

Regards, Jens
--
\ Jens Thoms Toerring ___ jt@toerring.de
\______________ ____________ http://toerring.de
Aug 18 '08 #4
Jens Thoms Toerring wrote:
Keith Thompson <ks***@mib.orgw rote:
>[...]
But if the two compilers are being used on the same system, it's very
likely that they'll use the same encoding. Since you're calling one
function from the other, presumably you're using the compilers on the
same system and linking the resulting code into a single executable or
equivalent.

Is it actually a question about the compiler at all? As far as
I can see the compiler will happily create a string literal with
whatever there is in the string, not caring a bit about the en-
coding of the string. I guess the problem is much more one of
how the source files are generated and the expectations of the
output medium.
A crucial point here is that the encoding of characters in the
C source files need have nothing to do with the encoding of
characters in the execution environment. The compiler generates
execution-encoded strings from source-encoded string literals, and
the transformation is not necessarily the identity mapping. For
example, consider a compiler that reads ASCII-encoded source and
produces a program for an EBCDIC environment: An X in a source
literal would go into the compiler as the value 88, but produce a
character with the value 231 in the executed program.

The fact that the source-to-execution mapping might not be
a simple copy is surprising, but it really shouldn't be. There are
plenty of other non-copy steps in the manufacture of an execution
string from a source literal: Escapes (hex, octal, and symbolic)
are translated, adjacent literals are spliced, the quotation marks
vanish, a trailing zero appears out of thin air -- in light of all
the other things that happen to a source character on its way into
the executable program, why should we imagine that the encoding of
an 'X' would be immune to change?

--
Er*********@sun .com
Aug 18 '08 #5
On Aug 18, 7:48 am, Andreas Lundgren <d99...@efd.lth .sewrote:
Hi!

Is it determined that the C standard compiler always encode characters
with the same character excoding? If for example the functions Foo and
Bar are compiled by different compilers, is it unambiguous how to
interpret the character string in Bar?
No, it does not depends on the compiler...
>
Does string.h expect a specific string format?

void Foo(void)
{
char myTextString[11] = "stuvxyzåäö ";
Here, instead of char, try with wchar_t and
related functions if you are using unicode
for your messages and your .c files
Bar(myTextStrin g);

}

void Bar(char* inp)
{
What character set to expect?
Thats depends on the user environment, but if the
user environments is using unicode, you can expect no
more than an array of bytes, other case is with
wchar_t and related functions...
>
}
Regards,
DMW
Aug 18 '08 #6
Daniel Molina Wegener wrote:
On Aug 18, 7:48 am, Andreas Lundgren <d99...@efd.lth .sewrote:
Hi!

Is it determined that the C standard compiler always encode characters
with the same character excoding? If for example the functions Foo and
Bar are compiled by different compilers, is it unambiguous how to
interpret the character string in Bar?

No, it does not depends on the compiler...

Does string.h expect a specific string format?

void Foo(void)
{
char myTextString[11] = "stuvxyzï¿½ï¿½ï ¿½";

Here, instead of char, try with wchar_t and
related functions if you are using unicode
for your messages and your .c files
Whether or not wchar_t has anything to do with unicode depends upon
the compiler; the standard makes no such requirement. When it does,
the way in which you can take advantage of that fact depends upon the
compiler as well.
Aug 18 '08 #7
Daniel Molina Wegener wrote, On 18/08/08 18:29:
On Aug 18, 7:48 am, Andreas Lundgren <d99...@efd.lth .sewrote:
>Hi!

Is it determined that the C standard compiler always encode characters
with the same character excoding? If for example the functions Foo and
Bar are compiled by different compilers, is it unambiguous how to
interpret the character string in Bar?

No, it does not depends on the compiler...
You are wrong. See the replies others posted before you for details.
>Does string.h expect a specific string format?

void Foo(void)
{
char myTextString[11] = "stuvxyzåäö ";

Here, instead of char, try with wchar_t and
related functions if you are using unicode
for your messages and your .c files
> Bar(myTextStrin g);

}

void Bar(char* inp)
{
What character set to expect?

Thats depends on the user environment,
Wrong. It depends on what the function is written to expect and
(assuming the function expects a simple C string, which is likely) on
the encoding the implementation expects.

Actually, the expected encodings for standard C library functions which
handle strings and characters can be changed at run-time using the
setlocale() function, so it could also depend on what the program has
done before calling this function.
but if the
user environments is using unicode, you can expect no
more than an array of bytes,
Not necessarily.
other case is with
wchar_t and related functions...
For a start, an array of wchar_t is not simply an array of bytes.
>}
--
Flash Gordon
Aug 18 '08 #8
Many inputs and some disagreement.

A simple example may be the letter Ö that in ASCII is represented by
the number 153, but in ISO-8859-1 and Unicode is represented by the
number 214.

From what I have read out, I have to specify to customers that a
specific method has an input of a city name _coded with ISO-8859-1_ in
a char pointer. Elsewhise 'Göthenborg' stores in ISO-8859-1 encoding
will not match a search for 'Göthenborg' provided in ASCII format.

Best Regards,
Andreas Lundgren
Aug 20 '08 #9
In article <d8************ *************** *******@26g2000 hsk.googlegroup s.com>,
Andreas Lundgren <d9****@efd.lth .sewrote:
>A simple example may be the letter Ö that in ASCII is represented by
the number 153
That's not ASCII. It's a Microsoft extension of ASCII called "code
page 437". ASCII has only 128 characters.

-- Richard
--
Please remember to mention me / in tapes you leave behind.
Aug 20 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
3384
by: aa | last post by:
Is it OK to include an ANSI file into a UTF-8 file?
20
7090
by: Petter Reinholdtsen | last post by:
Is the code fragment 'char a = ("a");' valid ANSI C? The problematic part is '("a")'. I am sure 'char a = "a";' is valid ANSI C, but I am more unsure if it is allowed to place () around the string literal.
2
20370
by: Ziver MALHASOGLU | last post by:
Hi, I produce a text file using my windows application written with c#. -- System.Text.Encoding encOutput=null; encOutput=System.Text.Encoding.UTF8; StreamWriter sw=new StreamWriter(@"c:\a.txt",false, encOutput); -- this UTF-8 file will be used to transfer data to another database's import wizard. Unfortunately db that will import requires the file to be
4
19011
by: Nick | last post by:
Hi, I am trying to output a string of chinese characters as a text file. When I open a file for writing from VB, the file is automatically set to UTF-8 encoding (can tell by opening the file from notepad). However, when I open this file from a Chinese program that does not support unicode, garbage is displayed. So what I have to do is to first use Notepad to change the encoding of the file to ANSI encoding, then the file would be...
11
31730
by: LucaJonny | last post by:
Hi, I've got a problem using StreamReader in VB.NET. I try to read a txt file that contains extended characters and theese are removed from the line that is being read. I've read a lot of articles about ANSI encoding like this http://support.microsoft.com/default.aspx?scid=kb;en-us;889835 but System.Text.Encoding.Default don't work!!
10
30179
by: Mark Rae | last post by:
Hi, I'm in the process if converting the data out of an old DOS-based SunAccounts system (don't ask!) into SQL Server. The data has been sent to me as a collection of hundreds of SunAccounts backup files and it's a simple (yet extremely laborious!) process of opening each backup file in turn, reading the file line by line, splitting it up into its constituent parts, and then squirting it into SQL Server.
2
3725
by: gizmo | last post by:
Hi, Here's a little hack I put together to try to get to the bottom of a problem I'm having with trying to base64 encode a hash value. The hash value contains character codes 135 and 130 amongst others. This snippet will set up a string of chars 190, 135, 130, 73, 242, 243, 10. It puts them into a bytearray. string encodedData;
0
10590
NeoPa
by: NeoPa | last post by:
ANSI-89 v ANSI-92 Before we get into all the various types of pattern matching that can be used, there are two ANSI standards used for the main types of wildcard matching (matching zero or more characters or simply matching a single character) : ANSI-89 - Mainly used only by Jet / ACE SQL ANSI-92 - Mainly used by SQL Server and other grown-up products In the later versions of Access it is now possible to select ANSI-92 compatibility as an...
1
4341
by: Tejas | last post by:
Hi, I am using ldap_get_values() call to get the user attributes from LDAP. This call is returning the user attributes in UTF-8 encoding and its a PCHAR*. For normal English characters this is working well. When Multibyte characters are involved like Japanese, Chinese or Korean, I need to convert UTF8 to ANSI encoding to get the correct values.
0
8863
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8739
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9384
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9238
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8052
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6681
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4762
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3207
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2602
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.