473,667 Members | 2,670 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

FR accent characters

Hello,

I am struggling with an extended stored procedure DLL coded in C. The
SQL Server database contains French accented characters. The DLL drops
them as it passes on the data to managed code.

I'm trying to wrap my mind around the wchar_t data type.

I tried doing a simple EXE which compiles and runs on Visual Studion
..NET 2003:

#include <string.h>

void main()
{
char xml[100] = "hello";
strcat(xml," world");
printf("%s",xml );
}

I wanted to do the same with wide characters. The following compiles
but chokes on the second line when run:

#include <string.h>
#include <wchar.h>

void main()
{
wchar_t * xml = "hello";
wcscat(xml," là mondé");
// The 2 accented characters above might show up wrong when posted,
on a web page
// the special HTML characters are &agrave; and &eacute;
sprintf("%s",xm l);
}

What would be the working code? Scoured the refs (K&R, C Unleashed)
they were unhelpful.

URLs would be welcome.

Thanks in advance!
Bert
Nov 14 '05 #1
9 2532
Bert Szoghy wrote:
Hello,

I am struggling with an extended stored procedure DLL coded in C. The
SQL Server database contains French accented characters. The DLL drops
them as it passes on the data to managed code.
DLL, SQL, etc. are not C and are off-topic. They have nothing to do
with your problem, either, so we'll ignore them.
I'm trying to wrap my mind around the wchar_t data type.

I tried doing a simple EXE which compiles and runs on Visual Studion
.NET 2003:
Then turn your diagnostics on ...
#include <string.h>

void main() ^^^^
This marks the coder as incompetent.
{
char xml[100] = "hello";
strcat(xml," world");
printf("%s",xml );
}

I wanted to do the same with wide characters. The following compiles
but chokes on the second line when run:

#include <string.h>
#include <wchar.h>

void main()
{
wchar_t * xml = "hello";
wcscat(xml," là mondé");
// The 2 accented characters above might show up wrong when posted,
on a web page
// the special HTML characters are &agrave; and &eacute;
sprintf("%s",xm l);
}

What would be the working code? Scoured the refs (K&R, C Unleashed)
they were unhelpful.


Try the following. Notice the differences in the form of the constant
strings and in the output format. What actually is produced is
implementation- and locale-specific.

#include <stdio.h>
#include <string.h>
#include <wchar.h>

void first_proc(void )
{
char xml[100] = "hello";
strcat(xml, " world");
printf("%s\n", xml);
}
void second_proc(voi d)
{
wchar_t *xml = L"hello";
wcscat(xml, L" là mondé");
printf("%ls\n", xml);
}

int main()
{
first_proc();
second_proc();
return 0;
}

Nov 14 '05 #2
On Wed, 27 Apr 2005 01:16:53 GMT, Martin Ambuhl
<ma*****@earthl ink.net> wrote:
Try the following. Notice the differences in the form of the constant
strings and in the output format. What actually is produced is
implementation- and locale-specific.
As in core dumps and other nasal demons...
#include <stdio.h>
#include <string.h>
#include <wchar.h>

void first_proc(void )
{
char xml[100] = "hello";
strcat(xml, " world");
printf("%s\n", xml);
}
void second_proc(voi d)
{
wchar_t *xml = L"hello";
wcscat(xml, L" là mondé");
Splat! You've just tried to write past the end of a string which is a
pointer to a wide string literal with 6 characters. Undefined
behaviour twice (writing to a string literal and writing off the end of
it)...

If you make the declaration of xml wchar_t xml[100] (to match the first
one) it works rather better.

However, putting accented characters in source is horribly undefined and
non-portable. Indeed, the wide character functions do not produce
anything portable, which is why most people using Unicode have their own
code to handle it, the functions in the C Standard aren't guaranteed to
do anything at all usable unless __STDC_ISO_1064 6__ is defined, which it
needn't be (if the "supported locales" for the implementation are
limited then wchar_t can be an 8 bit type).
printf("%ls\n", xml);
}

int main()
{
first_proc();
second_proc();
return 0;
}


The C library on my system doesn't support %ls, unfortunately, I used
the debugger to verify that the wcscat worked...

Chris C
Nov 14 '05 #3
Chris Croughton wrote:
[..]
As in core dumps and other nasal demons...
wchar_t *xml = L"hello";
wcscat(xml, L" là mondé");

Splat! You've just tried to write past the end of a string which is a
pointer to a wide string literal with 6 characters.


Thank you; I made the grievous error of correcting as little as possible
of the OP's code without noticeing that errors were on almost every line
rather than only half of them.
Undefined
behaviour twice (writing to a string literal and writing off the end of
it)...


It's always good for regulars to be occasionally humiliated by someone
who only showed up last Novermber 15th.
Nov 14 '05 #4
Hello Martin and Chris,

Indeed the following code does not go Splat! (the correct technical term
requires an exclamation point)

#include <string.h>
#include <wchar.h>
#include <stdio.h>

void main()
{
wchar_t xml[100] = L"hello";
wcscat(xml, L" là mondé");
printf("%ls", xml);
}

But the result of the printf will give (on Windows XP) on the command prompt
"hello lÓ mondÚ"

The code I am looking for will never be portedto anything else than Windows.

Best regards,
Bert
Nov 14 '05 #5
Bertrand Szoghy wrote:
Hello Martin and Chris,

Indeed the following code does not go Splat! (the correct technical term requires an exclamation point)

#include <string.h>
#include <wchar.h>
#include <stdio.h>

void main()
{
wchar_t xml[100] = L"hello";
wcscat(xml, L" là mondé");
printf("%ls", xml);
}

But the result of the printf will give (on Windows XP) on the command prompt "hello lÓ mondÚ"

The code I am looking for will never be portedto anything else than Windows.
Best regards,
Bert


void main() is bogus, try int main(void) and returning something.
And a newline after your printf is required to guarantee anything
is displayed. But neither of those is your issue.

You probably need some help from a windows group at this point, not
comp.lang.c. How characters actually display in your command window
on windows is off topic here and may depend on the language setup of
your system. <OT> I seem to remember you can change this stuff by
changing the code page with chcp, as in chcp 1252. Gives you a
starting point at
least. But that is old, suspect knowledge, ask in a windows group for
better info. </OT>

-David

Nov 14 '05 #6
On Wed, 27 Apr 2005 13:34:52 -0400, in comp.lang.c , "Bertrand Szoghy"
<we*******@quad more.com> wrote:
void main()
this is still wrong. Please read Martin's original post.
But the result of the printf will give (on Windows XP) on the command prompt
"hello lÓ mondÚ"


This is probably entirely dependent on the codepage that the terminal
device uses for printing output. You'll need to ask Windows experts
about that part, as its nothing to do with C.

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt >
Nov 14 '05 #7
On Wed, 27 Apr 2005 13:34:52 -0400, Bertrand Szoghy
<we*******@quad more.com> wrote:
Hello Martin and Chris,

Indeed the following code does not go Splat! (the correct technical term
requires an exclamation point)
And a sentence requires a period, question mark or exclamtion mark at
the end <g>...
#include <string.h>
#include <wchar.h>
#include <stdio.h>

void main()
{
wchar_t xml[100] = L"hello";
wcscat(xml, L" là mondé");
printf("%ls", xml);
}

But the result of the printf will give (on Windows XP) on the command prompt
"hello lÓ mondÚ"
Quite likely, as I said putting accented characters into the code is
non-portable. So are wide characters in general, there is no guarantee
that what is printed will resemble what was compiled in because it
depends on the locale set, the setup of the terminal on which it is
output (or if output to a file what program you use to read the file),
etc.
The code I am looking for will never be portedto anything else than Windows.


I suspect that what you need are the Windows-specific conversion
interfaces, for those you will need to ask on a Windows-specific
newsgroup. But almost certainly you won't easily be able to just write
the characters in the literal strings.

I have some functions to convert from UCS2 or UCS4 to UTF8 (and the
reverse) if you would find those useful. I wrote them from the relevant
RFCs, they are open source (non-contaminating licence based on the zlib
licence). They don't do any locale conversion, though...

Chris C
Nov 14 '05 #8
Hello Chris,

Yes, please, I would like to look at your routines. I know I will learn
something if I do. What is the URL?

My feeling about the subject is, wchar_t and associated functions,
libraries, and so on, are part
of a recent C standard, the wchar_t datatype is mentioned in passing in the
second edition of K&R
as "it's new", Petzold also mentions it in passing in Chapter 2 of his 5th
Edition "Programmin g for Windows" without any example code, so overall this
is a really good subject for a C newsgroup
discussion. It's not off topic. It's dead on topic, right there in the grey
zone of compilers handling something new in a myriad of ways.

Portability is a good thing, but as I said this question is for a project
that will never be ported away from Windows. Reading specifications is also
a good thing, fellow programmers.

Some of you mentioned that I shouldn't concatenate accented characters, and
I agree, and in fact I never intended to. In the actual system, we were
thinking of concatenating
XML (without accents) in order to target the DLL toward another programming
language. The code I
provided was reduced to the smallest example I could muster. A good hint of
that is that "hello world" code is likely not part of a large system's
source.

Thank you all for responding and have a nice day,
Bert Szoghy
"Chris Croughton" <ch***@keristor .net> a écrit dans le message de news:
sl************* *****@ccserver. keris.net...
On Wed, 27 Apr 2005 13:34:52 -0400, Bertrand Szoghy
<we*******@quad more.com> wrote:
Hello Martin and Chris,

Indeed the following code does not go Splat! (the correct technical term
requires an exclamation point)


And a sentence requires a period, question mark or exclamtion mark at
the end <g>...
#include <string.h>
#include <wchar.h>
#include <stdio.h>

void main()
{
wchar_t xml[100] = L"hello";
wcscat(xml, L" là mondé");
printf("%ls", xml);
}

But the result of the printf will give (on Windows XP) on the command
prompt
"hello lÓ mondÚ"


Quite likely, as I said putting accented characters into the code is
non-portable. So are wide characters in general, there is no guarantee
that what is printed will resemble what was compiled in because it
depends on the locale set, the setup of the terminal on which it is
output (or if output to a file what program you use to read the file),
etc.
The code I am looking for will never be portedto anything else than
Windows.


I suspect that what you need are the Windows-specific conversion
interfaces, for those you will need to ask on a Windows-specific
newsgroup. But almost certainly you won't easily be able to just write
the characters in the literal strings.

I have some functions to convert from UCS2 or UCS4 to UTF8 (and the
reverse) if you would find those useful. I wrote them from the relevant
RFCs, they are open source (non-contaminating licence based on the zlib
licence). They don't do any locale conversion, though...

Chris C


Nov 14 '05 #9
On Mon, 2 May 2005 08:31:33 -0400, Bertrand Szoghy
<we*******@quad more.com> wrote:
Hello Chris,

Yes, please, I would like to look at your routines. I know I will learn
something if I do. What is the URL?
Well, it wasn't actually on the web until you asked, I hadn't gotten
round to putting it there <g>.

The code is at

http://www.keristor.net/stuff/xutfstr.c

Documentation (produced by Doxygen) is at

http://www.keristor.net/stuff/xutfstr.html

(Note that the documentation has broken links to other pages, ignore
those...)
My feeling about the subject is, wchar_t and associated functions,
libraries, and so on, are part
of a recent C standard, the wchar_t datatype is mentioned in passing in the
second edition of K&R
as "it's new", Petzold also mentions it in passing in Chapter 2 of his 5th
Edition "Programmin g for Windows" without any example code, so overall this
is a really good subject for a C newsgroup
discussion. It's not off topic. It's dead on topic, right there in the grey
zone of compilers handling something new in a myriad of ways.
Certainly wchar_t and associated types and variables are on-topic for
comp.lang.c, but what they do on specific systems isn't. In other
words, noting that their action is implementation defined is on topic,
but asking about how to use them on Windows isn't.
Portability is a good thing, but as I said this question is for a project
that will never be ported away from Windows. Reading specifications is also
a good thing, fellow programmers.
Reading specifications is indeed a good thing, but you need to read
those specifications relevant to what you are doing. The C standard does
not say anything about what locales are supported on specific operating
systems, or what size wchar_t should be (except that it's at least as
big as unsigned char), for that you need to go to the specifications of
your system and compiler.

By the way, top-posting (replying at the top of the text to which you
are responding) is frowned on in comp.lang.c, it makes things harder to
read:

Terrible! How does he smell?
My dog has no nose.


Chris C
Nov 14 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
14879
by: JB | last post by:
Hi, I'm having a problem with accent in PHP/HTML. I have a value stored in a table. The value is : é è à ù When I look directly into table via phpmyadmin, the value seems to be stored correctly. Yet, when I try to get the value in my php code (mysql_query() ... ), what I get is totally different for the three lasts characters.
3
2644
by: Vittorio | last post by:
Hi, I have one problem ( I am a python beginner) in a cgi: print"""<a href="mailto:%s">""" %(campovalore.encode('iso-8859-1')) print campovalore.encode('iso-8859-1') print """</a>""" campovalore is an email address with special (italian) characters.
3
21803
by: Darren Jensen | last post by:
Hi, Does anyone have a function which replaces accent chars from a string with the non-accented equivalent? For example 'hôpital' should return 'hopital'. Thank you in advance.
0
1213
by: Eric | last post by:
Hello, I'm using mysql 4.1. When I insert data with special french character with accent é, à... and that I use select command to see the line, I can see the characters correctly. But when I use the outfile command the characters are not in the correct format. I tried several charset and collation, but it seems that it's not the
3
3959
by: Daedalus.OS | last post by:
Hi, I've been asked to transfert a database from a webhoster to another. I think the database is on a windows box but I'm not 100% shure. The webhoster says it provides combined hosting on NT/Linux Servers. After exporting the database with phpMyAdmin I tried to import it to a Linux server with the mysql>source file.sql command via a ssh access (the file is 40 Mbytes so phpMyAdmin is useless). Everything seemed ok but all accented...
2
3291
by: vinhat | last post by:
Hi, Is there any option / feature in DB2 database where one can retrieve data from the database by treating the accented / diacritic characters equivalent to their English characters. For example I would like "à" is treated equivalent to "a" or "é" is treated equivalent to "e" so that the SELECT query can retreive data irrespective if it has accented or non-accented characters. This is required because we have French and German names...
1
2514
by: John | last post by:
Hi I need to export data to an app that does not accept accented characters like é etc. Is there a function in access to un-accent such characters? Thanks Regards
1
1640
by: Stan Sainte-Rose | last post by:
Hi, I have a problem when I use StreamReader.ReadToEnd. All the the accent characters are deleted.. I mean by accent characters : éééààà etc... Any idea ? Stan
1
1850
by: John Dalberg | last post by:
I am using the code snippet below. If the datagrid displays words with french accent 'e' like 'cafe' or 'Toshiba Protege', the file test.xls displays these e's as garbled 3 characters. Do I need to do some formatting or specify some code page, langauge..etc? StreamWriter sr; StringWriter sw = new StringWriter(); HtmlTextWriter htw = new HtmlTextWriter(sw);
3
7061
by: Gordowey | last post by:
Hi all, I have an easy question. I have a table with names, for example name -------- José José Albert ALBERT
0
8458
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8366
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8888
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8790
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8650
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7391
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6206
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4202
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
2
1779
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.