473,382 Members | 1,651 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,382 software developers and data experts.

Question: Unicode <-> HEX conversion in C source file?

^_^
conversion from:

a="a";

to

a=0x????;

If there are many unicode strings to convert, how can I do batch-conversion?
Nov 14 '05 #1
16 11015
^_^ wrote:
conversion from:

a="a";

to

a=0x????;

If there are many unicode strings to convert, how can I do batch-conversion?


If you really want help, then

1) Stop cross-posting wildly.
2) Stop re-posting similar messages over and over.
3) Phrase your question in a way that we can understand it.

Try posting ONE message to ONE relevant group that explains your problem
in sufficient detail, then wait for a reply (which may take several
hours). Otherwise you are likely to be ignored, flamed, and/or killfiled.

-Kevin
--
My email address is valid, but changes periodically.
To contact me please use the address from a recent posting.
Nov 14 '05 #2
> If you really want help, then

1) Stop cross-posting wildly.
2) Stop re-posting similar messages over and over.
3) Phrase your question in a way that we can understand it.

Try posting ONE message to ONE relevant group that explains your problem
in sufficient detail, then wait for a reply (which may take several
hours). Otherwise you are likely to be ignored, flamed, and/or killfiled.


I don't think this guy speaks English that well, it is a foreign language to
him, hence the cryptic messages.
Probably Chinese.

Stephen Howe
Nov 14 '05 #3
Stephen Howe wrote:
I don't think this guy speaks English that well, it is a
foreign language to him, hence the cryptic messages. Probably
Chinese.


[Reading in news:comp.lang.c]

No need to guess. From cleansugar's header:

Organization: Korea Telecom
Message-ID: <bt**********@news1.kornet.net>

I think the OP wants a tool that can be used to convert string
literals to unicode equivalents in C and/or C++ source files.

Can someone who knows more about this than I either redirect or
provide help?
--
Morris Dovey
West Des Moines, Iowa USA
C links at http://www.iedu.com/c
Read my lips: The apple doesn't fall far from the tree.

Nov 14 '05 #4
"^_^" <cl********@hotmail.com> wrote in message
news:bt**********@news1.kornet.net...
conversion from:

a="a";

to

a=0x????;

If there are many unicode strings to convert, how can I do batch-conversion?


You can try the NCBI C++ Toolkit. It is portable and free.
http://www.ncbi.nlm.nih.gov/IEB/Tool...DOC/index.html

It contains, among other things, some utility functions for converting
characters and strings from ascii to unicode.
http://www.ncbi.nih.gov/IEB/ToolBox/.../util/utf8.hpp

HTH
Tom
Nov 14 '05 #5
^_^
I'm sorry that I was rude to speak unpolite broken English.

It's my fault. I am not an English speaker

Though, I can speak more correct expression, I was neglect.

Sorry.
What I want is to convert Unicode characters in source code to 0x??? format.

Then it is going to be saved as ASCII format a documents.

Written in not-Latin format Unicode characters in source code cause that
English OS users can not read it without fonts.

If source code's format were saved as UTF8, compiler reads it automatically.

But I don't want this method.

I want to know, either, that convert decimal format numbers to hexademical
format.

For example, I'll show an source.

example.cpp:

#define MAX 16777215
void main(){
if (MAX==a) printf("wrong\n";);
}

example_I_wanted.cpp
#define MAX 0xFFFFFF <------*this part*
void main(){
if (MAX==a) printf("wrong\n";);
}

To do so, C or C++ source parsing->converting DEC to HEX->saving CPP file
with converted characters are needed.

I don't know detailed metheds.

If gurus like you give me some good ways, I will follow your wisdom.

Thank you and I'm sorry again.

"Morris Dovey" <mr*****@iedu.com> wrote in message
news:ah****************@news.uswest.net...
Stephen Howe wrote:
I don't think this guy speaks English that well, it is a
foreign language to him, hence the cryptic messages. Probably
Chinese.


[Reading in news:comp.lang.c]

No need to guess. From cleansugar's header:

Organization: Korea Telecom
Message-ID: <bt**********@news1.kornet.net>

I think the OP wants a tool that can be used to convert string
literals to unicode equivalents in C and/or C++ source files.

Can someone who knows more about this than I either redirect or
provide help?
--
Morris Dovey
West Des Moines, Iowa USA
C links at http://www.iedu.com/c
Read my lips: The apple doesn't fall far from the tree.

Nov 14 '05 #6
^_^ <cl********@hotmail.com> scribbled the following
on comp.lang.c:
What I want is to convert Unicode characters in source code to 0x??? format. Then it is going to be saved as ASCII format a documents. Written in not-Latin format Unicode characters in source code cause that
English OS users can not read it without fonts. If source code's format were saved as UTF8, compiler reads it automatically. But I don't want this method. I want to know, either, that convert decimal format numbers to hexademical
format. For example, I'll show an source. example.cpp: #define MAX 16777215
void main(){
if (MAX==a) printf("wrong\n";);
} example_I_wanted.cpp
#define MAX 0xFFFFFF <------*this part*
void main(){
if (MAX==a) printf("wrong\n";);
}


You don't *HAVE* to do this. As numbers, 16777215 and 0xFFFFFF are
completely interchangable within a C or C++ program. The runtime
program will only see them as a pattern of bits anyway.

And void main() is an illegal form of main(). Use int main().

So, the answer to your question is: your programs should work fine as
they are.

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"Life without ostriches is like coffee with milk."
- Mika P. Nieminen
Nov 14 '05 #7
^_^ wrote:
I'm sorry that I was rude to speak unpolite broken English.
It's my fault. I am not an English speaker
That's OK, it wasn't rude, nor was your English unpolite in any way. (By
the way, the normal English word is "impolite." "Unpolite" is perfectly
logical and understandable, but it disappeared from normal English use in
the early 18th century.) The problem is that you didn't give us a question
that we could understand. Many people who *are* native English speakers
fail to do this.

Posting to both C and C++ newsgroups is likely an error. C and C++ are
different languages, and, even when the languages admit the same forms of
code, the normal idioms in the two languages are different. It makes sense
to post to both _only_ when the question has the same answers in both
languages. Since you can't know this, since you would then already know
the answer, it is best to post to a newsgroup for the language you are using.
Though, I can speak more correct expression, I was neglect.
As a side note, you might consider comp.usage.english as another newsgroup
you might post in, if improving your English is important to you. The
above line, for example, might more idiomatically be written, "However, I
can express myself better. I was negligent [or neglectful]."

What I want is to convert Unicode characters in source code to 0x??? format.
If you can read the Unicode characters into a buffer, you can convert those
chars into an integer, as long as the total number of bytes in a character
is less than the sizeof the integer (best unsigned) type that you use.

I want to know, either, that convert decimal format numbers to hexademical
format.
Numbers as stored are simply binary, interpreted for humans as in some base.
Suppse you have an unsigned int
unsigned int a = 263;
We can display this as octal
printf("%#o\n",a); /* displays 0407 */
or hex
printf("%#x\n",a); /* displays 0x107 */
or decimal
printf("%u\n",a); /* displays 263 */

For example, I'll show an source.

example.cpp:

#define MAX 16777215
void main(){
main always returns an int. "void" is wrong. Don't do this.
if (MAX==a) printf("wrong\n";);
Even though this is an example of an input file, it is best not to post
hopeless code.
The variable 'a' is undeclared.
The C++ people may object that "printf" is too un-C++-like and complain
that <cstdio> is not #included.
The C people might complain that <stdio.h> is not #included. People
using compilers without C99 conformance (almost all), may complain
that main should actually return a value; 0 is common for successful
completion and EXIT_SUCCESS and EXIT_FAILURE are available if
<stdlib.h> is #included.
}

example_I_wanted.cpp
#define MAX 0xFFFFFF <------*this part*
void main(){
main always returns an int. "void" is wrong. Don't do this.
if (MAX==a) printf("wrong\n";);
}

To do so, C or C++ source parsing->converting DEC to HEX->saving CPP file
with converted characters are needed.


To parse an input file containing a C program is probably beyond you at the
moment. You will need to detect sequence of characters that might be an
integer, determine that it is one (this requires examining its context),
and probably checking the use for signedness.

It is probably better for you to edit these files by hand. It is largely
because of the occurances "void main()" that I presume that your computing
skills are not up to writing such a program. If I am in error, I apologize.

--
Martin Ambuhl
Nov 14 '05 #8
^_^ <cl********@hotmail.com> scribbled the following
on comp.lang.c:
Why I want 0x???? is easy reading.


Oh, now I see. Well, I don't have any ready-made solution for changing
the decimal values to hexadecimal ones. Sorry for wasting your time
answering the wrong question.

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"The question of copying music from the Internet is like a two-barreled sword."
- Finnish rap artist Ezkimo
Nov 14 '05 #9
^_^
Why I want 0x???? is easy reading.
"Joona I Palaste" <pa*****@cc.helsinki.fi> wrote in message
news:bt**********@oravannahka.helsinki.fi...
^_^ <cl********@hotmail.com> scribbled the following
on comp.lang.c:
What I want is to convert Unicode characters in source code to 0x??? format.
Then it is going to be saved as ASCII format a documents.
Written in not-Latin format Unicode characters in source code cause that
English OS users can not read it without fonts.

If source code's format were saved as UTF8, compiler reads it

automatically.
But I don't want this method.

I want to know, either, that convert decimal format numbers to

hexademical format.

For example, I'll show an source.

example.cpp:

#define MAX 16777215
void main(){
if (MAX==a) printf("wrong\n";);
}

example_I_wanted.cpp
#define MAX 0xFFFFFF <------*this part*
void main(){
if (MAX==a) printf("wrong\n";);
}


You don't *HAVE* to do this. As numbers, 16777215 and 0xFFFFFF are
completely interchangable within a C or C++ program. The runtime
program will only see them as a pattern of bits anyway.

And void main() is an illegal form of main(). Use int main().

So, the answer to your question is: your programs should work fine as
they are.

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"Life without ostriches is like coffee with milk."
- Mika P. Nieminen

Nov 14 '05 #10
^_^
Thank you very much.

"Martin Ambuhl" <ma*****@earthlink.net> wrote in message
news:gZ******************@newsread3.news.atl.earth link.net...
^_^ wrote:
I'm sorry that I was rude to speak unpolite broken English.
It's my fault. I am not an English speaker
That's OK, it wasn't rude, nor was your English unpolite in any way. (By
the way, the normal English word is "impolite." "Unpolite" is perfectly
logical and understandable, but it disappeared from normal English use in
the early 18th century.) The problem is that you didn't give us a

question that we could understand. Many people who *are* native English speakers
fail to do this.

Posting to both C and C++ newsgroups is likely an error. C and C++ are
different languages, and, even when the languages admit the same forms of
code, the normal idioms in the two languages are different. It makes sense to post to both _only_ when the question has the same answers in both
languages. Since you can't know this, since you would then already know
the answer, it is best to post to a newsgroup for the language you are using.
Though, I can speak more correct expression, I was neglect.
As a side note, you might consider comp.usage.english as another newsgroup
you might post in, if improving your English is important to you. The
above line, for example, might more idiomatically be written, "However, I
can express myself better. I was negligent [or neglectful]."

What I want is to convert Unicode characters in source code to 0x??? format.
If you can read the Unicode characters into a buffer, you can convert those chars into an integer, as long as the total number of bytes in a character
is less than the sizeof the integer (best unsigned) type that you use.

I want to know, either, that convert decimal format numbers to
hexademical format.


Numbers as stored are simply binary, interpreted for humans as in some

base. Suppse you have an unsigned int
unsigned int a = 263;
We can display this as octal
printf("%#o\n",a); /* displays 0407 */
or hex
printf("%#x\n",a); /* displays 0x107 */
or decimal
printf("%u\n",a); /* displays 263 */

For example, I'll show an source.

example.cpp:

#define MAX 16777215
void main(){
main always returns an int. "void" is wrong. Don't do this.
if (MAX==a) printf("wrong\n";);


Even though this is an example of an input file, it is best not to post
hopeless code.
The variable 'a' is undeclared.
The C++ people may object that "printf" is too un-C++-like and complain
that <cstdio> is not #included.
The C people might complain that <stdio.h> is not #included. People
using compilers without C99 conformance (almost all), may complain
that main should actually return a value; 0 is common for successful
completion and EXIT_SUCCESS and EXIT_FAILURE are available if
<stdlib.h> is #included.
}

example_I_wanted.cpp
#define MAX 0xFFFFFF <------*this part*
void main(){


main always returns an int. "void" is wrong. Don't do this.
if (MAX==a) printf("wrong\n";);
}

To do so, C or C++ source parsing->converting DEC to HEX->saving CPP file with converted characters are needed.


To parse an input file containing a C program is probably beyond you at

the moment. You will need to detect sequence of characters that might be an
integer, determine that it is one (this requires examining its context),
and probably checking the use for signedness.

It is probably better for you to edit these files by hand. It is largely
because of the occurances "void main()" that I presume that your computing
skills are not up to writing such a program. If I am in error, I apologize.
--
Martin Ambuhl

Nov 14 '05 #11
"^_^" <cl********@hotmail.com> writes:
I want to know, either, that convert decimal format numbers to hexademical
format.


Several of us have pointed out that you probably don't want to do
this and that it will be difficult even if you do. However, the
difficulty mainly stems from a desire to get the result
completely correct. If you're not concerned with complete
correctness, but would be willing to look over the results and
fix any mistakes (which would probably be rare), then I'd bet you
could write a fairly simple script in Perl or another scripting
langugae to do your translation; e.g., something like this, which
I have not tested at all and may contain bugs or simply be one
big bug:
#! /usr/bin/perl -p
while (/(?<!0x)([0-9]+)/) {
$_ = $` . sprintf("0x%x", $1) . $';
}
--
char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long b[]
={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa6 7f6aaa,0xaa9aa9f6,0x1f6},*p=
b,x,i=24;for(;p+=!*p;*p/=4)switch(x=*p&3)case 0:{return 0;for(p--;i--;i--)case
2:{i++;if(1)break;else default:continue;if(0)case 1:putchar(a[i&15]);break;}}}
Nov 14 '05 #12
"Ben Pfaff" <bl*@cs.stanford.edu> wrote:
"^_^" <cl********@hotmail.com> writes:
I want to know, either, that convert decimal format numbers to hexademical
format.


Several of us have pointed out that you probably don't want to do
this and that it will be difficult even if you do. However, the
difficulty mainly stems from a desire to get the result
completely correct. If you're not concerned with complete
correctness, but would be willing to look over the results and
fix any mistakes (which would probably be rare), then I'd bet you
could write a fairly simple script in Perl or another scripting
langugae to do your translation; e.g., something like this, which
I have not tested at all and may contain bugs or simply be one
big bug:
#! /usr/bin/perl -p
while (/(?<!0x)([0-9]+)/) {
$_ = $` . sprintf("0x%x", $1) . $';
}


Unfortunately, it actually hangs on the first line that _does_ contain a
number :-). It would also try to change

int data14;

to

int data0xe;

which probably isn't desirable (only slightly less desirable than 14 variables
named 'data'). The following is only slightly more robust, but the OP can run
with it if he feels inclined:
[~/perl: 137]% cat numbers
#define MAX 16777215

int main(void){
int x = 14;
int y17 = 9;
int z = 0x10A9;
return 0;
}
[~/perl: 138]% perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' numbers
#define MAX 0xFFFFFF

int main(void){
int x = 0xE;
int y17 = 0x9;
int z = 0x10A9;
return 0x0;
}
[~/perl: 139]%
Good luck,

Brandan L.
--
bclennox AT eos DOT ncsu DOT edu
Nov 14 '05 #13
"LaDainian Tomlinson" <go@away.spam> writes:
"Ben Pfaff" <bl*@cs.stanford.edu> wrote:
"^_^" <cl********@hotmail.com> writes:
I want to know, either, that convert decimal format numbers to hexademical
format.


while (/(?<!0x)([0-9]+)/) {
$_ = $` . sprintf("0x%x", $1) . $';
}


[~/perl: 138]% perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' numbers


Ah, I'd forgotten about the `e' flag, thanks.
--
"The way I see it, an intelligent person who disagrees with me is
probably the most important person I'll interact with on any given
day."
--Billy Chambless
Nov 14 '05 #14
LaDainian Tomlinson wrote:
[~/perl: 138]% perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' numbers


$ perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' <<< 010
0xA


Nov 14 '05 #15
Jeremy Yallop wrote:
LaDainian Tomlinson wrote:
[~/perl: 138]% perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' numbers

$ perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' <<< 010
0xA

Right. Numbers in octal base should be excluded because the conversion
would change the value.

changint to
$ perl -pe 's/\b([1-9][0-9]+)\b/sprintf "0x\U%x", $1/ge'
should help
--
Regards,
Christof Krueger

Nov 14 '05 #16
test..

"LaDainian Tomlinson" <go@away.spam> wrote in message
news:2x4Lb.9491$6l1.6468@okepread03...
"Ben Pfaff" <bl*@cs.stanford.edu> wrote:
"^_^" <cl********@hotmail.com> writes:
I want to know, either, that convert decimal format numbers to hexademical format.
Several of us have pointed out that you probably don't want to do
this and that it will be difficult even if you do. However, the
difficulty mainly stems from a desire to get the result
completely correct. If you're not concerned with complete
correctness, but would be willing to look over the results and
fix any mistakes (which would probably be rare), then I'd bet you
could write a fairly simple script in Perl or another scripting
langugae to do your translation; e.g., something like this, which
I have not tested at all and may contain bugs or simply be one
big bug:
#! /usr/bin/perl -p
while (/(?<!0x)([0-9]+)/) {
$_ = $` . sprintf("0x%x", $1) . $';
}


Unfortunately, it actually hangs on the first line that _does_ contain a
number :-). It would also try to change

int data14;

to

int data0xe;

which probably isn't desirable (only slightly less desirable than 14

variables named 'data'). The following is only slightly more robust, but the OP can run with it if he feels inclined:
[~/perl: 137]% cat numbers
#define MAX 16777215

int main(void){
int x = 14;
int y17 = 9;
int z = 0x10A9;
return 0;
}
[~/perl: 138]% perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' numbers
#define MAX 0xFFFFFF

int main(void){
int x = 0xE;
int y17 = 0x9;
int z = 0x10A9;
return 0x0;
}
[~/perl: 139]%
Good luck,

Brandan L.
--
bclennox AT eos DOT ncsu DOT edu

Nov 14 '05 #17

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Thomas | last post by:
I am currently passing email messages over XML_RPC as the payload for a certain function call. On some of these messages, XML_RPC blows up on the server side and says something to the effect of: ...
3
by: Roger Thornhill | last post by:
Hi - I have a question that I am sure is a basic UNICODE question for anyone out there with UNICODE experience. I simply would like to see a non-Latin unicode character printed to my console....
2
by: Irfan Bondre | last post by:
When I try to create a table with a vargraphic column I get the following error. CREATE TABLE EIITEST.IRFAN ("VARC" VARCHAR (48) , "VARGRA" VARGRAPHIC (96) ) DATA CAPTURE NONE IN USERSPACE1;...
4
by: LinasB | last post by:
Hi, How to read system setting value of "Language for non-Unicode programs" ? Or how to set it programmatically? LinasB
3
by: Richard Connamacher | last post by:
I'm new to PostgreSQL, and from the looks of it, it's a great database, and I'll be using more of it in the future. I had a quick question if anyone could clear this up. The documentation for...
1
by: Steve Marshall | last post by:
Hi all, This is probably a real dumb question, but I just haven't come across the answer... Is there a simple way to treat a byte array as a string, or to convert it to a string? And the...
4
by: ProvoWallis | last post by:
I'm totally stumped by this problem so I'm hoping someone can give me a little advice or point me in the right direction. I have a file that looks like this: <SC>APPEAL<XC>40-24; 40-46; 42-46;...
2
by: Rob | last post by:
Hello, We are in the process of writing an application that is unicode compliant. The question that I have is what unicode font should be used for all of our forms? Currently we are using...
6
by: Rob | last post by:
Is there a handy .NET call to convert a Unicode string to valid HTML so sticking in an <pinner HTML? Kind of thing converts "<" to &lt and multiple spaces to &nbsp etc. I need to display unicode...
13
by: Liang Chen | last post by:
Hope you all had a nice weekend. I have a question that I hope someone can help me out. I want to run a Python program that uses Tkinter for the user interface (GUI). The program allows me to type...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.