473,322 Members | 1,431 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

FR accent characters

Hello,

I am struggling with an extended stored procedure DLL coded in C. The
SQL Server database contains French accented characters. The DLL drops
them as it passes on the data to managed code.

I'm trying to wrap my mind around the wchar_t data type.

I tried doing a simple EXE which compiles and runs on Visual Studion
..NET 2003:

#include <string.h>

void main()
{
char xml[100] = "hello";
strcat(xml," world");
printf("%s",xml);
}

I wanted to do the same with wide characters. The following compiles
but chokes on the second line when run:

#include <string.h>
#include <wchar.h>

void main()
{
wchar_t * xml = "hello";
wcscat(xml," là mondé");
// The 2 accented characters above might show up wrong when posted,
on a web page
// the special HTML characters are &agrave; and &eacute;
sprintf("%s",xml);
}

What would be the working code? Scoured the refs (K&R, C Unleashed)
they were unhelpful.

URLs would be welcome.

Thanks in advance!
Bert
Nov 14 '05 #1
9 2521
Bert Szoghy wrote:
Hello,

I am struggling with an extended stored procedure DLL coded in C. The
SQL Server database contains French accented characters. The DLL drops
them as it passes on the data to managed code.
DLL, SQL, etc. are not C and are off-topic. They have nothing to do
with your problem, either, so we'll ignore them.
I'm trying to wrap my mind around the wchar_t data type.

I tried doing a simple EXE which compiles and runs on Visual Studion
.NET 2003:
Then turn your diagnostics on ...
#include <string.h>

void main() ^^^^
This marks the coder as incompetent.
{
char xml[100] = "hello";
strcat(xml," world");
printf("%s",xml);
}

I wanted to do the same with wide characters. The following compiles
but chokes on the second line when run:

#include <string.h>
#include <wchar.h>

void main()
{
wchar_t * xml = "hello";
wcscat(xml," là mondé");
// The 2 accented characters above might show up wrong when posted,
on a web page
// the special HTML characters are &agrave; and &eacute;
sprintf("%s",xml);
}

What would be the working code? Scoured the refs (K&R, C Unleashed)
they were unhelpful.


Try the following. Notice the differences in the form of the constant
strings and in the output format. What actually is produced is
implementation- and locale-specific.

#include <stdio.h>
#include <string.h>
#include <wchar.h>

void first_proc(void)
{
char xml[100] = "hello";
strcat(xml, " world");
printf("%s\n", xml);
}
void second_proc(void)
{
wchar_t *xml = L"hello";
wcscat(xml, L" là mondé");
printf("%ls\n", xml);
}

int main()
{
first_proc();
second_proc();
return 0;
}

Nov 14 '05 #2
On Wed, 27 Apr 2005 01:16:53 GMT, Martin Ambuhl
<ma*****@earthlink.net> wrote:
Try the following. Notice the differences in the form of the constant
strings and in the output format. What actually is produced is
implementation- and locale-specific.
As in core dumps and other nasal demons...
#include <stdio.h>
#include <string.h>
#include <wchar.h>

void first_proc(void)
{
char xml[100] = "hello";
strcat(xml, " world");
printf("%s\n", xml);
}
void second_proc(void)
{
wchar_t *xml = L"hello";
wcscat(xml, L" là mondé");
Splat! You've just tried to write past the end of a string which is a
pointer to a wide string literal with 6 characters. Undefined
behaviour twice (writing to a string literal and writing off the end of
it)...

If you make the declaration of xml wchar_t xml[100] (to match the first
one) it works rather better.

However, putting accented characters in source is horribly undefined and
non-portable. Indeed, the wide character functions do not produce
anything portable, which is why most people using Unicode have their own
code to handle it, the functions in the C Standard aren't guaranteed to
do anything at all usable unless __STDC_ISO_10646__ is defined, which it
needn't be (if the "supported locales" for the implementation are
limited then wchar_t can be an 8 bit type).
printf("%ls\n", xml);
}

int main()
{
first_proc();
second_proc();
return 0;
}


The C library on my system doesn't support %ls, unfortunately, I used
the debugger to verify that the wcscat worked...

Chris C
Nov 14 '05 #3
Chris Croughton wrote:
[..]
As in core dumps and other nasal demons...
wchar_t *xml = L"hello";
wcscat(xml, L" là mondé");

Splat! You've just tried to write past the end of a string which is a
pointer to a wide string literal with 6 characters.


Thank you; I made the grievous error of correcting as little as possible
of the OP's code without noticeing that errors were on almost every line
rather than only half of them.
Undefined
behaviour twice (writing to a string literal and writing off the end of
it)...


It's always good for regulars to be occasionally humiliated by someone
who only showed up last Novermber 15th.
Nov 14 '05 #4
Hello Martin and Chris,

Indeed the following code does not go Splat! (the correct technical term
requires an exclamation point)

#include <string.h>
#include <wchar.h>
#include <stdio.h>

void main()
{
wchar_t xml[100] = L"hello";
wcscat(xml, L" là mondé");
printf("%ls", xml);
}

But the result of the printf will give (on Windows XP) on the command prompt
"hello lÓ mondÚ"

The code I am looking for will never be portedto anything else than Windows.

Best regards,
Bert
Nov 14 '05 #5
Bertrand Szoghy wrote:
Hello Martin and Chris,

Indeed the following code does not go Splat! (the correct technical term requires an exclamation point)

#include <string.h>
#include <wchar.h>
#include <stdio.h>

void main()
{
wchar_t xml[100] = L"hello";
wcscat(xml, L" là mondé");
printf("%ls", xml);
}

But the result of the printf will give (on Windows XP) on the command prompt "hello lÓ mondÚ"

The code I am looking for will never be portedto anything else than Windows.
Best regards,
Bert


void main() is bogus, try int main(void) and returning something.
And a newline after your printf is required to guarantee anything
is displayed. But neither of those is your issue.

You probably need some help from a windows group at this point, not
comp.lang.c. How characters actually display in your command window
on windows is off topic here and may depend on the language setup of
your system. <OT> I seem to remember you can change this stuff by
changing the code page with chcp, as in chcp 1252. Gives you a
starting point at
least. But that is old, suspect knowledge, ask in a windows group for
better info. </OT>

-David

Nov 14 '05 #6
On Wed, 27 Apr 2005 13:34:52 -0400, in comp.lang.c , "Bertrand Szoghy"
<we*******@quadmore.com> wrote:
void main()
this is still wrong. Please read Martin's original post.
But the result of the printf will give (on Windows XP) on the command prompt
"hello lÓ mondÚ"


This is probably entirely dependent on the codepage that the terminal
device uses for printing output. You'll need to ask Windows experts
about that part, as its nothing to do with C.

--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt>
Nov 14 '05 #7
On Wed, 27 Apr 2005 13:34:52 -0400, Bertrand Szoghy
<we*******@quadmore.com> wrote:
Hello Martin and Chris,

Indeed the following code does not go Splat! (the correct technical term
requires an exclamation point)
And a sentence requires a period, question mark or exclamtion mark at
the end <g>...
#include <string.h>
#include <wchar.h>
#include <stdio.h>

void main()
{
wchar_t xml[100] = L"hello";
wcscat(xml, L" là mondé");
printf("%ls", xml);
}

But the result of the printf will give (on Windows XP) on the command prompt
"hello lÓ mondÚ"
Quite likely, as I said putting accented characters into the code is
non-portable. So are wide characters in general, there is no guarantee
that what is printed will resemble what was compiled in because it
depends on the locale set, the setup of the terminal on which it is
output (or if output to a file what program you use to read the file),
etc.
The code I am looking for will never be portedto anything else than Windows.


I suspect that what you need are the Windows-specific conversion
interfaces, for those you will need to ask on a Windows-specific
newsgroup. But almost certainly you won't easily be able to just write
the characters in the literal strings.

I have some functions to convert from UCS2 or UCS4 to UTF8 (and the
reverse) if you would find those useful. I wrote them from the relevant
RFCs, they are open source (non-contaminating licence based on the zlib
licence). They don't do any locale conversion, though...

Chris C
Nov 14 '05 #8
Hello Chris,

Yes, please, I would like to look at your routines. I know I will learn
something if I do. What is the URL?

My feeling about the subject is, wchar_t and associated functions,
libraries, and so on, are part
of a recent C standard, the wchar_t datatype is mentioned in passing in the
second edition of K&R
as "it's new", Petzold also mentions it in passing in Chapter 2 of his 5th
Edition "Programming for Windows" without any example code, so overall this
is a really good subject for a C newsgroup
discussion. It's not off topic. It's dead on topic, right there in the grey
zone of compilers handling something new in a myriad of ways.

Portability is a good thing, but as I said this question is for a project
that will never be ported away from Windows. Reading specifications is also
a good thing, fellow programmers.

Some of you mentioned that I shouldn't concatenate accented characters, and
I agree, and in fact I never intended to. In the actual system, we were
thinking of concatenating
XML (without accents) in order to target the DLL toward another programming
language. The code I
provided was reduced to the smallest example I could muster. A good hint of
that is that "hello world" code is likely not part of a large system's
source.

Thank you all for responding and have a nice day,
Bert Szoghy
"Chris Croughton" <ch***@keristor.net> a écrit dans le message de news:
sl******************@ccserver.keris.net...
On Wed, 27 Apr 2005 13:34:52 -0400, Bertrand Szoghy
<we*******@quadmore.com> wrote:
Hello Martin and Chris,

Indeed the following code does not go Splat! (the correct technical term
requires an exclamation point)


And a sentence requires a period, question mark or exclamtion mark at
the end <g>...
#include <string.h>
#include <wchar.h>
#include <stdio.h>

void main()
{
wchar_t xml[100] = L"hello";
wcscat(xml, L" là mondé");
printf("%ls", xml);
}

But the result of the printf will give (on Windows XP) on the command
prompt
"hello lÓ mondÚ"


Quite likely, as I said putting accented characters into the code is
non-portable. So are wide characters in general, there is no guarantee
that what is printed will resemble what was compiled in because it
depends on the locale set, the setup of the terminal on which it is
output (or if output to a file what program you use to read the file),
etc.
The code I am looking for will never be portedto anything else than
Windows.


I suspect that what you need are the Windows-specific conversion
interfaces, for those you will need to ask on a Windows-specific
newsgroup. But almost certainly you won't easily be able to just write
the characters in the literal strings.

I have some functions to convert from UCS2 or UCS4 to UTF8 (and the
reverse) if you would find those useful. I wrote them from the relevant
RFCs, they are open source (non-contaminating licence based on the zlib
licence). They don't do any locale conversion, though...

Chris C


Nov 14 '05 #9
On Mon, 2 May 2005 08:31:33 -0400, Bertrand Szoghy
<we*******@quadmore.com> wrote:
Hello Chris,

Yes, please, I would like to look at your routines. I know I will learn
something if I do. What is the URL?
Well, it wasn't actually on the web until you asked, I hadn't gotten
round to putting it there <g>.

The code is at

http://www.keristor.net/stuff/xutfstr.c

Documentation (produced by Doxygen) is at

http://www.keristor.net/stuff/xutfstr.html

(Note that the documentation has broken links to other pages, ignore
those...)
My feeling about the subject is, wchar_t and associated functions,
libraries, and so on, are part
of a recent C standard, the wchar_t datatype is mentioned in passing in the
second edition of K&R
as "it's new", Petzold also mentions it in passing in Chapter 2 of his 5th
Edition "Programming for Windows" without any example code, so overall this
is a really good subject for a C newsgroup
discussion. It's not off topic. It's dead on topic, right there in the grey
zone of compilers handling something new in a myriad of ways.
Certainly wchar_t and associated types and variables are on-topic for
comp.lang.c, but what they do on specific systems isn't. In other
words, noting that their action is implementation defined is on topic,
but asking about how to use them on Windows isn't.
Portability is a good thing, but as I said this question is for a project
that will never be ported away from Windows. Reading specifications is also
a good thing, fellow programmers.
Reading specifications is indeed a good thing, but you need to read
those specifications relevant to what you are doing. The C standard does
not say anything about what locales are supported on specific operating
systems, or what size wchar_t should be (except that it's at least as
big as unsigned char), for that you need to go to the specifications of
your system and compiler.

By the way, top-posting (replying at the top of the text to which you
are responding) is frowned on in comp.lang.c, it makes things harder to
read:

Terrible! How does he smell?
My dog has no nose.


Chris C
Nov 14 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: JB | last post by:
Hi, I'm having a problem with accent in PHP/HTML. I have a value stored in a table. The value is : é è à ù When I look directly into table via phpmyadmin, the value seems to be stored...
3
by: Vittorio | last post by:
Hi, I have one problem ( I am a python beginner) in a cgi: print"""<a href="mailto:%s">""" %(campovalore.encode('iso-8859-1')) print campovalore.encode('iso-8859-1') print """</a>""" ...
3
by: Darren Jensen | last post by:
Hi, Does anyone have a function which replaces accent chars from a string with the non-accented equivalent? For example 'hôpital' should return 'hopital'. Thank you in advance.
0
by: Eric | last post by:
Hello, I'm using mysql 4.1. When I insert data with special french character with accent é, à... and that I use select command to see the line, I can see the characters correctly. But when I...
3
by: Daedalus.OS | last post by:
Hi, I've been asked to transfert a database from a webhoster to another. I think the database is on a windows box but I'm not 100% shure. The webhoster says it provides combined hosting on...
2
by: vinhat | last post by:
Hi, Is there any option / feature in DB2 database where one can retrieve data from the database by treating the accented / diacritic characters equivalent to their English characters. For...
1
by: John | last post by:
Hi I need to export data to an app that does not accept accented characters like é etc. Is there a function in access to un-accent such characters? Thanks Regards
1
by: Stan Sainte-Rose | last post by:
Hi, I have a problem when I use StreamReader.ReadToEnd. All the the accent characters are deleted.. I mean by accent characters : éééààà etc... Any idea ? Stan
1
by: John Dalberg | last post by:
I am using the code snippet below. If the datagrid displays words with french accent 'e' like 'cafe' or 'Toshiba Protege', the file test.xls displays these e's as garbled 3 characters. Do I need...
3
by: Gordowey | last post by:
Hi all, I have an easy question. I have a table with names, for example name -------- José José Albert ALBERT
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.