Hello,
I am struggling with an extended stored procedure DLL coded in C. The
SQL Server database contains French accented characters. The DLL drops
them as it passes on the data to managed code.
I'm trying to wrap my mind around the wchar_t data type.
I tried doing a simple EXE which compiles and runs on Visual Studion
..NET 2003:
#include <string.h>
void main()
{
char xml[100] = "hello";
strcat(xml," world");
printf("%s",xml);
}
I wanted to do the same with wide characters. The following compiles
but chokes on the second line when run:
#include <string.h>
#include <wchar.h>
void main()
{
wchar_t * xml = "hello";
wcscat(xml," là mondé");
// The 2 accented characters above might show up wrong when posted,
on a web page
// the special HTML characters are à and é
sprintf("%s",xml);
}
What would be the working code? Scoured the refs (K&R, C Unleashed)
they were unhelpful.
URLs would be welcome.
Thanks in advance!
Bert 9 2521
Bert Szoghy wrote: Hello,
I am struggling with an extended stored procedure DLL coded in C. The SQL Server database contains French accented characters. The DLL drops them as it passes on the data to managed code.
DLL, SQL, etc. are not C and are off-topic. They have nothing to do
with your problem, either, so we'll ignore them. I'm trying to wrap my mind around the wchar_t data type.
I tried doing a simple EXE which compiles and runs on Visual Studion .NET 2003:
Then turn your diagnostics on ... #include <string.h>
void main()
^^^^
This marks the coder as incompetent.
{ char xml[100] = "hello"; strcat(xml," world"); printf("%s",xml); }
I wanted to do the same with wide characters. The following compiles but chokes on the second line when run:
#include <string.h> #include <wchar.h>
void main() { wchar_t * xml = "hello"; wcscat(xml," là mondé"); // The 2 accented characters above might show up wrong when posted, on a web page // the special HTML characters are à and é sprintf("%s",xml); }
What would be the working code? Scoured the refs (K&R, C Unleashed) they were unhelpful.
Try the following. Notice the differences in the form of the constant
strings and in the output format. What actually is produced is
implementation- and locale-specific.
#include <stdio.h>
#include <string.h>
#include <wchar.h>
void first_proc(void)
{
char xml[100] = "hello";
strcat(xml, " world");
printf("%s\n", xml);
}
void second_proc(void)
{
wchar_t *xml = L"hello";
wcscat(xml, L" là mondé");
printf("%ls\n", xml);
}
int main()
{
first_proc();
second_proc();
return 0;
}
On Wed, 27 Apr 2005 01:16:53 GMT, Martin Ambuhl
<ma*****@earthlink.net> wrote: Try the following. Notice the differences in the form of the constant strings and in the output format. What actually is produced is implementation- and locale-specific.
As in core dumps and other nasal demons...
#include <stdio.h> #include <string.h> #include <wchar.h>
void first_proc(void) { char xml[100] = "hello"; strcat(xml, " world"); printf("%s\n", xml); } void second_proc(void) { wchar_t *xml = L"hello"; wcscat(xml, L" là mondé");
Splat! You've just tried to write past the end of a string which is a
pointer to a wide string literal with 6 characters. Undefined
behaviour twice (writing to a string literal and writing off the end of
it)...
If you make the declaration of xml wchar_t xml[100] (to match the first
one) it works rather better.
However, putting accented characters in source is horribly undefined and
non-portable. Indeed, the wide character functions do not produce
anything portable, which is why most people using Unicode have their own
code to handle it, the functions in the C Standard aren't guaranteed to
do anything at all usable unless __STDC_ISO_10646__ is defined, which it
needn't be (if the "supported locales" for the implementation are
limited then wchar_t can be an 8 bit type).
printf("%ls\n", xml); }
int main() { first_proc(); second_proc(); return 0; }
The C library on my system doesn't support %ls, unfortunately, I used
the debugger to verify that the wcscat worked...
Chris C
Chris Croughton wrote:
[..] As in core dumps and other nasal demons...
wchar_t *xml = L"hello"; wcscat(xml, L" là mondé");
Splat! You've just tried to write past the end of a string which is a pointer to a wide string literal with 6 characters.
Thank you; I made the grievous error of correcting as little as possible
of the OP's code without noticeing that errors were on almost every line
rather than only half of them.
Undefined behaviour twice (writing to a string literal and writing off the end of it)...
It's always good for regulars to be occasionally humiliated by someone
who only showed up last Novermber 15th.
Hello Martin and Chris,
Indeed the following code does not go Splat! (the correct technical term
requires an exclamation point)
#include <string.h>
#include <wchar.h>
#include <stdio.h>
void main()
{
wchar_t xml[100] = L"hello";
wcscat(xml, L" là mondé");
printf("%ls", xml);
}
But the result of the printf will give (on Windows XP) on the command prompt
"hello lÓ mondÚ"
The code I am looking for will never be portedto anything else than Windows.
Best regards,
Bert
Bertrand Szoghy wrote: Hello Martin and Chris,
Indeed the following code does not go Splat! (the correct technical
term requires an exclamation point)
#include <string.h> #include <wchar.h> #include <stdio.h>
void main() { wchar_t xml[100] = L"hello"; wcscat(xml, L" là mondé"); printf("%ls", xml); }
But the result of the printf will give (on Windows XP) on the command
prompt "hello lÓ mondÚ"
The code I am looking for will never be portedto anything else than
Windows. Best regards, Bert
void main() is bogus, try int main(void) and returning something.
And a newline after your printf is required to guarantee anything
is displayed. But neither of those is your issue.
You probably need some help from a windows group at this point, not
comp.lang.c. How characters actually display in your command window
on windows is off topic here and may depend on the language setup of
your system. <OT> I seem to remember you can change this stuff by
changing the code page with chcp, as in chcp 1252. Gives you a
starting point at
least. But that is old, suspect knowledge, ask in a windows group for
better info. </OT>
-David
On Wed, 27 Apr 2005 13:34:52 -0400, in comp.lang.c , "Bertrand Szoghy"
<we*******@quadmore.com> wrote: void main()
this is still wrong. Please read Martin's original post.
But the result of the printf will give (on Windows XP) on the command prompt "hello lÓ mondÚ"
This is probably entirely dependent on the codepage that the terminal
device uses for printing output. You'll need to ask Windows experts
about that part, as its nothing to do with C.
--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.ungerhu.com/jxh/clc.welcome.txt>
On Wed, 27 Apr 2005 13:34:52 -0400, Bertrand Szoghy
<we*******@quadmore.com> wrote: Hello Martin and Chris,
Indeed the following code does not go Splat! (the correct technical term requires an exclamation point)
And a sentence requires a period, question mark or exclamtion mark at
the end <g>...
#include <string.h> #include <wchar.h> #include <stdio.h>
void main() { wchar_t xml[100] = L"hello"; wcscat(xml, L" là mondé"); printf("%ls", xml); }
But the result of the printf will give (on Windows XP) on the command prompt "hello lÓ mondÚ"
Quite likely, as I said putting accented characters into the code is
non-portable. So are wide characters in general, there is no guarantee
that what is printed will resemble what was compiled in because it
depends on the locale set, the setup of the terminal on which it is
output (or if output to a file what program you use to read the file),
etc.
The code I am looking for will never be portedto anything else than Windows.
I suspect that what you need are the Windows-specific conversion
interfaces, for those you will need to ask on a Windows-specific
newsgroup. But almost certainly you won't easily be able to just write
the characters in the literal strings.
I have some functions to convert from UCS2 or UCS4 to UTF8 (and the
reverse) if you would find those useful. I wrote them from the relevant
RFCs, they are open source (non-contaminating licence based on the zlib
licence). They don't do any locale conversion, though...
Chris C
Hello Chris,
Yes, please, I would like to look at your routines. I know I will learn
something if I do. What is the URL?
My feeling about the subject is, wchar_t and associated functions,
libraries, and so on, are part
of a recent C standard, the wchar_t datatype is mentioned in passing in the
second edition of K&R
as "it's new", Petzold also mentions it in passing in Chapter 2 of his 5th
Edition "Programming for Windows" without any example code, so overall this
is a really good subject for a C newsgroup
discussion. It's not off topic. It's dead on topic, right there in the grey
zone of compilers handling something new in a myriad of ways.
Portability is a good thing, but as I said this question is for a project
that will never be ported away from Windows. Reading specifications is also
a good thing, fellow programmers.
Some of you mentioned that I shouldn't concatenate accented characters, and
I agree, and in fact I never intended to. In the actual system, we were
thinking of concatenating
XML (without accents) in order to target the DLL toward another programming
language. The code I
provided was reduced to the smallest example I could muster. A good hint of
that is that "hello world" code is likely not part of a large system's
source.
Thank you all for responding and have a nice day,
Bert Szoghy
"Chris Croughton" <ch***@keristor.net> a écrit dans le message de news: sl******************@ccserver.keris.net... On Wed, 27 Apr 2005 13:34:52 -0400, Bertrand Szoghy <we*******@quadmore.com> wrote:
Hello Martin and Chris,
Indeed the following code does not go Splat! (the correct technical term requires an exclamation point)
And a sentence requires a period, question mark or exclamtion mark at the end <g>...
#include <string.h> #include <wchar.h> #include <stdio.h>
void main() { wchar_t xml[100] = L"hello"; wcscat(xml, L" là mondé"); printf("%ls", xml); }
But the result of the printf will give (on Windows XP) on the command prompt "hello lÓ mondÚ"
Quite likely, as I said putting accented characters into the code is non-portable. So are wide characters in general, there is no guarantee that what is printed will resemble what was compiled in because it depends on the locale set, the setup of the terminal on which it is output (or if output to a file what program you use to read the file), etc.
The code I am looking for will never be portedto anything else than Windows.
I suspect that what you need are the Windows-specific conversion interfaces, for those you will need to ask on a Windows-specific newsgroup. But almost certainly you won't easily be able to just write the characters in the literal strings.
I have some functions to convert from UCS2 or UCS4 to UTF8 (and the reverse) if you would find those useful. I wrote them from the relevant RFCs, they are open source (non-contaminating licence based on the zlib licence). They don't do any locale conversion, though...
Chris C
On Mon, 2 May 2005 08:31:33 -0400, Bertrand Szoghy
<we*******@quadmore.com> wrote: Hello Chris,
Yes, please, I would like to look at your routines. I know I will learn something if I do. What is the URL?
Well, it wasn't actually on the web until you asked, I hadn't gotten
round to putting it there <g>.
The code is at http://www.keristor.net/stuff/xutfstr.c
Documentation (produced by Doxygen) is at http://www.keristor.net/stuff/xutfstr.html
(Note that the documentation has broken links to other pages, ignore
those...)
My feeling about the subject is, wchar_t and associated functions, libraries, and so on, are part of a recent C standard, the wchar_t datatype is mentioned in passing in the second edition of K&R as "it's new", Petzold also mentions it in passing in Chapter 2 of his 5th Edition "Programming for Windows" without any example code, so overall this is a really good subject for a C newsgroup discussion. It's not off topic. It's dead on topic, right there in the grey zone of compilers handling something new in a myriad of ways.
Certainly wchar_t and associated types and variables are on-topic for
comp.lang.c, but what they do on specific systems isn't. In other
words, noting that their action is implementation defined is on topic,
but asking about how to use them on Windows isn't.
Portability is a good thing, but as I said this question is for a project that will never be ported away from Windows. Reading specifications is also a good thing, fellow programmers.
Reading specifications is indeed a good thing, but you need to read
those specifications relevant to what you are doing. The C standard does
not say anything about what locales are supported on specific operating
systems, or what size wchar_t should be (except that it's at least as
big as unsigned char), for that you need to go to the specifications of
your system and compiler.
By the way, top-posting (replying at the top of the text to which you
are responding) is frowned on in comp.lang.c, it makes things harder to
read:
Terrible! How does he smell? My dog has no nose.
Chris C This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: JB |
last post by:
Hi,
I'm having a problem with accent in PHP/HTML.
I have a value stored in a table.
The value is : é è à ù
When I look directly into table via phpmyadmin, the value seems to be
stored...
|
by: Vittorio |
last post by:
Hi,
I have one problem ( I am a python beginner) in a cgi:
print"""<a href="mailto:%s">""" %(campovalore.encode('iso-8859-1'))
print campovalore.encode('iso-8859-1')
print """</a>"""
...
|
by: Darren Jensen |
last post by:
Hi,
Does anyone have a function which replaces accent chars from a string
with the non-accented equivalent? For example 'hôpital' should return
'hopital'.
Thank you in advance.
|
by: Eric |
last post by:
Hello,
I'm using mysql 4.1.
When I insert data with special french character with accent é, à...
and that I use select command to see the line, I can see the
characters correctly.
But when I...
|
by: Daedalus.OS |
last post by:
Hi,
I've been asked to transfert a database from a webhoster to another. I think
the database is on a windows box but I'm not 100% shure. The webhoster says
it provides combined hosting on...
|
by: vinhat |
last post by:
Hi,
Is there any option / feature in DB2 database where one can retrieve
data from the database by treating the accented / diacritic characters
equivalent to their English characters. For...
|
by: John |
last post by:
Hi
I need to export data to an app that does not accept accented characters
like é etc. Is there a function in access to un-accent such characters?
Thanks
Regards
|
by: Stan Sainte-Rose |
last post by:
Hi,
I have a problem when I use StreamReader.ReadToEnd.
All the the accent characters are deleted..
I mean by accent characters : éééààà etc...
Any idea ?
Stan
|
by: John Dalberg |
last post by:
I am using the code snippet below. If the datagrid displays words with
french accent 'e' like 'cafe' or 'Toshiba Protege', the file test.xls
displays these e's as garbled 3 characters.
Do I need...
|
by: Gordowey |
last post by:
Hi all,
I have an easy question. I have a table with names, for example
name
--------
José
José
Albert
ALBERT
|
by: ryjfgjl |
last post by:
ExcelToDatabase: batch import excel into database automatically...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: isladogs |
last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM).
In this month's session, we are pleased to welcome back...
|
by: jfyes |
last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
|
by: ArrayDB |
last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
|
by: PapaRatzi |
last post by:
Hello,
I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
|
by: Defcon1945 |
last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
|
by: af34tf |
last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome former...
| |