By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,227 Members | 1,376 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,227 IT Pros & Developers. It's quick & easy.

Question: Unicode <-> HEX conversion in C source file?

P: n/a
^_^
conversion from:

a="a";

to

a=0x????;

If there are many unicode strings to convert, how can I do batch-conversion?
Nov 14 '05 #1
Share this Question
Share on Google+
16 Replies


P: n/a
^_^ wrote:
conversion from:

a="a";

to

a=0x????;

If there are many unicode strings to convert, how can I do batch-conversion?


If you really want help, then

1) Stop cross-posting wildly.
2) Stop re-posting similar messages over and over.
3) Phrase your question in a way that we can understand it.

Try posting ONE message to ONE relevant group that explains your problem
in sufficient detail, then wait for a reply (which may take several
hours). Otherwise you are likely to be ignored, flamed, and/or killfiled.

-Kevin
--
My email address is valid, but changes periodically.
To contact me please use the address from a recent posting.
Nov 14 '05 #2

P: n/a
> If you really want help, then

1) Stop cross-posting wildly.
2) Stop re-posting similar messages over and over.
3) Phrase your question in a way that we can understand it.

Try posting ONE message to ONE relevant group that explains your problem
in sufficient detail, then wait for a reply (which may take several
hours). Otherwise you are likely to be ignored, flamed, and/or killfiled.


I don't think this guy speaks English that well, it is a foreign language to
him, hence the cryptic messages.
Probably Chinese.

Stephen Howe
Nov 14 '05 #3

P: n/a
Stephen Howe wrote:
I don't think this guy speaks English that well, it is a
foreign language to him, hence the cryptic messages. Probably
Chinese.


[Reading in news:comp.lang.c]

No need to guess. From cleansugar's header:

Organization: Korea Telecom
Message-ID: <bt**********@news1.kornet.net>

I think the OP wants a tool that can be used to convert string
literals to unicode equivalents in C and/or C++ source files.

Can someone who knows more about this than I either redirect or
provide help?
--
Morris Dovey
West Des Moines, Iowa USA
C links at http://www.iedu.com/c
Read my lips: The apple doesn't fall far from the tree.

Nov 14 '05 #4

P: n/a
"^_^" <cl********@hotmail.com> wrote in message
news:bt**********@news1.kornet.net...
conversion from:

a="a";

to

a=0x????;

If there are many unicode strings to convert, how can I do batch-conversion?


You can try the NCBI C++ Toolkit. It is portable and free.
http://www.ncbi.nlm.nih.gov/IEB/Tool...DOC/index.html

It contains, among other things, some utility functions for converting
characters and strings from ascii to unicode.
http://www.ncbi.nih.gov/IEB/ToolBox/.../util/utf8.hpp

HTH
Tom
Nov 14 '05 #5

P: n/a
^_^
I'm sorry that I was rude to speak unpolite broken English.

It's my fault. I am not an English speaker

Though, I can speak more correct expression, I was neglect.

Sorry.
What I want is to convert Unicode characters in source code to 0x??? format.

Then it is going to be saved as ASCII format a documents.

Written in not-Latin format Unicode characters in source code cause that
English OS users can not read it without fonts.

If source code's format were saved as UTF8, compiler reads it automatically.

But I don't want this method.

I want to know, either, that convert decimal format numbers to hexademical
format.

For example, I'll show an source.

example.cpp:

#define MAX 16777215
void main(){
if (MAX==a) printf("wrong\n";);
}

example_I_wanted.cpp
#define MAX 0xFFFFFF <------*this part*
void main(){
if (MAX==a) printf("wrong\n";);
}

To do so, C or C++ source parsing->converting DEC to HEX->saving CPP file
with converted characters are needed.

I don't know detailed metheds.

If gurus like you give me some good ways, I will follow your wisdom.

Thank you and I'm sorry again.

"Morris Dovey" <mr*****@iedu.com> wrote in message
news:ah****************@news.uswest.net...
Stephen Howe wrote:
I don't think this guy speaks English that well, it is a
foreign language to him, hence the cryptic messages. Probably
Chinese.


[Reading in news:comp.lang.c]

No need to guess. From cleansugar's header:

Organization: Korea Telecom
Message-ID: <bt**********@news1.kornet.net>

I think the OP wants a tool that can be used to convert string
literals to unicode equivalents in C and/or C++ source files.

Can someone who knows more about this than I either redirect or
provide help?
--
Morris Dovey
West Des Moines, Iowa USA
C links at http://www.iedu.com/c
Read my lips: The apple doesn't fall far from the tree.

Nov 14 '05 #6

P: n/a
^_^ <cl********@hotmail.com> scribbled the following
on comp.lang.c:
What I want is to convert Unicode characters in source code to 0x??? format. Then it is going to be saved as ASCII format a documents. Written in not-Latin format Unicode characters in source code cause that
English OS users can not read it without fonts. If source code's format were saved as UTF8, compiler reads it automatically. But I don't want this method. I want to know, either, that convert decimal format numbers to hexademical
format. For example, I'll show an source. example.cpp: #define MAX 16777215
void main(){
if (MAX==a) printf("wrong\n";);
} example_I_wanted.cpp
#define MAX 0xFFFFFF <------*this part*
void main(){
if (MAX==a) printf("wrong\n";);
}


You don't *HAVE* to do this. As numbers, 16777215 and 0xFFFFFF are
completely interchangable within a C or C++ program. The runtime
program will only see them as a pattern of bits anyway.

And void main() is an illegal form of main(). Use int main().

So, the answer to your question is: your programs should work fine as
they are.

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"Life without ostriches is like coffee with milk."
- Mika P. Nieminen
Nov 14 '05 #7

P: n/a
^_^ wrote:
I'm sorry that I was rude to speak unpolite broken English.
It's my fault. I am not an English speaker
That's OK, it wasn't rude, nor was your English unpolite in any way. (By
the way, the normal English word is "impolite." "Unpolite" is perfectly
logical and understandable, but it disappeared from normal English use in
the early 18th century.) The problem is that you didn't give us a question
that we could understand. Many people who *are* native English speakers
fail to do this.

Posting to both C and C++ newsgroups is likely an error. C and C++ are
different languages, and, even when the languages admit the same forms of
code, the normal idioms in the two languages are different. It makes sense
to post to both _only_ when the question has the same answers in both
languages. Since you can't know this, since you would then already know
the answer, it is best to post to a newsgroup for the language you are using.
Though, I can speak more correct expression, I was neglect.
As a side note, you might consider comp.usage.english as another newsgroup
you might post in, if improving your English is important to you. The
above line, for example, might more idiomatically be written, "However, I
can express myself better. I was negligent [or neglectful]."

What I want is to convert Unicode characters in source code to 0x??? format.
If you can read the Unicode characters into a buffer, you can convert those
chars into an integer, as long as the total number of bytes in a character
is less than the sizeof the integer (best unsigned) type that you use.

I want to know, either, that convert decimal format numbers to hexademical
format.
Numbers as stored are simply binary, interpreted for humans as in some base.
Suppse you have an unsigned int
unsigned int a = 263;
We can display this as octal
printf("%#o\n",a); /* displays 0407 */
or hex
printf("%#x\n",a); /* displays 0x107 */
or decimal
printf("%u\n",a); /* displays 263 */

For example, I'll show an source.

example.cpp:

#define MAX 16777215
void main(){
main always returns an int. "void" is wrong. Don't do this.
if (MAX==a) printf("wrong\n";);
Even though this is an example of an input file, it is best not to post
hopeless code.
The variable 'a' is undeclared.
The C++ people may object that "printf" is too un-C++-like and complain
that <cstdio> is not #included.
The C people might complain that <stdio.h> is not #included. People
using compilers without C99 conformance (almost all), may complain
that main should actually return a value; 0 is common for successful
completion and EXIT_SUCCESS and EXIT_FAILURE are available if
<stdlib.h> is #included.
}

example_I_wanted.cpp
#define MAX 0xFFFFFF <------*this part*
void main(){
main always returns an int. "void" is wrong. Don't do this.
if (MAX==a) printf("wrong\n";);
}

To do so, C or C++ source parsing->converting DEC to HEX->saving CPP file
with converted characters are needed.


To parse an input file containing a C program is probably beyond you at the
moment. You will need to detect sequence of characters that might be an
integer, determine that it is one (this requires examining its context),
and probably checking the use for signedness.

It is probably better for you to edit these files by hand. It is largely
because of the occurances "void main()" that I presume that your computing
skills are not up to writing such a program. If I am in error, I apologize.

--
Martin Ambuhl
Nov 14 '05 #8

P: n/a
^_^ <cl********@hotmail.com> scribbled the following
on comp.lang.c:
Why I want 0x???? is easy reading.


Oh, now I see. Well, I don't have any ready-made solution for changing
the decimal values to hexadecimal ones. Sorry for wasting your time
answering the wrong question.

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"The question of copying music from the Internet is like a two-barreled sword."
- Finnish rap artist Ezkimo
Nov 14 '05 #9

P: n/a
^_^
Why I want 0x???? is easy reading.
"Joona I Palaste" <pa*****@cc.helsinki.fi> wrote in message
news:bt**********@oravannahka.helsinki.fi...
^_^ <cl********@hotmail.com> scribbled the following
on comp.lang.c:
What I want is to convert Unicode characters in source code to 0x??? format.
Then it is going to be saved as ASCII format a documents.
Written in not-Latin format Unicode characters in source code cause that
English OS users can not read it without fonts.

If source code's format were saved as UTF8, compiler reads it

automatically.
But I don't want this method.

I want to know, either, that convert decimal format numbers to

hexademical format.

For example, I'll show an source.

example.cpp:

#define MAX 16777215
void main(){
if (MAX==a) printf("wrong\n";);
}

example_I_wanted.cpp
#define MAX 0xFFFFFF <------*this part*
void main(){
if (MAX==a) printf("wrong\n";);
}


You don't *HAVE* to do this. As numbers, 16777215 and 0xFFFFFF are
completely interchangable within a C or C++ program. The runtime
program will only see them as a pattern of bits anyway.

And void main() is an illegal form of main(). Use int main().

So, the answer to your question is: your programs should work fine as
they are.

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"Life without ostriches is like coffee with milk."
- Mika P. Nieminen

Nov 14 '05 #10

P: n/a
^_^
Thank you very much.

"Martin Ambuhl" <ma*****@earthlink.net> wrote in message
news:gZ******************@newsread3.news.atl.earth link.net...
^_^ wrote:
I'm sorry that I was rude to speak unpolite broken English.
It's my fault. I am not an English speaker
That's OK, it wasn't rude, nor was your English unpolite in any way. (By
the way, the normal English word is "impolite." "Unpolite" is perfectly
logical and understandable, but it disappeared from normal English use in
the early 18th century.) The problem is that you didn't give us a

question that we could understand. Many people who *are* native English speakers
fail to do this.

Posting to both C and C++ newsgroups is likely an error. C and C++ are
different languages, and, even when the languages admit the same forms of
code, the normal idioms in the two languages are different. It makes sense to post to both _only_ when the question has the same answers in both
languages. Since you can't know this, since you would then already know
the answer, it is best to post to a newsgroup for the language you are using.
Though, I can speak more correct expression, I was neglect.
As a side note, you might consider comp.usage.english as another newsgroup
you might post in, if improving your English is important to you. The
above line, for example, might more idiomatically be written, "However, I
can express myself better. I was negligent [or neglectful]."

What I want is to convert Unicode characters in source code to 0x??? format.
If you can read the Unicode characters into a buffer, you can convert those chars into an integer, as long as the total number of bytes in a character
is less than the sizeof the integer (best unsigned) type that you use.

I want to know, either, that convert decimal format numbers to
hexademical format.


Numbers as stored are simply binary, interpreted for humans as in some

base. Suppse you have an unsigned int
unsigned int a = 263;
We can display this as octal
printf("%#o\n",a); /* displays 0407 */
or hex
printf("%#x\n",a); /* displays 0x107 */
or decimal
printf("%u\n",a); /* displays 263 */

For example, I'll show an source.

example.cpp:

#define MAX 16777215
void main(){
main always returns an int. "void" is wrong. Don't do this.
if (MAX==a) printf("wrong\n";);


Even though this is an example of an input file, it is best not to post
hopeless code.
The variable 'a' is undeclared.
The C++ people may object that "printf" is too un-C++-like and complain
that <cstdio> is not #included.
The C people might complain that <stdio.h> is not #included. People
using compilers without C99 conformance (almost all), may complain
that main should actually return a value; 0 is common for successful
completion and EXIT_SUCCESS and EXIT_FAILURE are available if
<stdlib.h> is #included.
}

example_I_wanted.cpp
#define MAX 0xFFFFFF <------*this part*
void main(){


main always returns an int. "void" is wrong. Don't do this.
if (MAX==a) printf("wrong\n";);
}

To do so, C or C++ source parsing->converting DEC to HEX->saving CPP file with converted characters are needed.


To parse an input file containing a C program is probably beyond you at

the moment. You will need to detect sequence of characters that might be an
integer, determine that it is one (this requires examining its context),
and probably checking the use for signedness.

It is probably better for you to edit these files by hand. It is largely
because of the occurances "void main()" that I presume that your computing
skills are not up to writing such a program. If I am in error, I apologize.
--
Martin Ambuhl

Nov 14 '05 #11

P: n/a
"^_^" <cl********@hotmail.com> writes:
I want to know, either, that convert decimal format numbers to hexademical
format.


Several of us have pointed out that you probably don't want to do
this and that it will be difficult even if you do. However, the
difficulty mainly stems from a desire to get the result
completely correct. If you're not concerned with complete
correctness, but would be willing to look over the results and
fix any mistakes (which would probably be rare), then I'd bet you
could write a fairly simple script in Perl or another scripting
langugae to do your translation; e.g., something like this, which
I have not tested at all and may contain bugs or simply be one
big bug:
#! /usr/bin/perl -p
while (/(?<!0x)([0-9]+)/) {
$_ = $` . sprintf("0x%x", $1) . $';
}
--
char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long b[]
={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa6 7f6aaa,0xaa9aa9f6,0x1f6},*p=
b,x,i=24;for(;p+=!*p;*p/=4)switch(x=*p&3)case 0:{return 0;for(p--;i--;i--)case
2:{i++;if(1)break;else default:continue;if(0)case 1:putchar(a[i&15]);break;}}}
Nov 14 '05 #12

P: n/a
"Ben Pfaff" <bl*@cs.stanford.edu> wrote:
"^_^" <cl********@hotmail.com> writes:
I want to know, either, that convert decimal format numbers to hexademical
format.


Several of us have pointed out that you probably don't want to do
this and that it will be difficult even if you do. However, the
difficulty mainly stems from a desire to get the result
completely correct. If you're not concerned with complete
correctness, but would be willing to look over the results and
fix any mistakes (which would probably be rare), then I'd bet you
could write a fairly simple script in Perl or another scripting
langugae to do your translation; e.g., something like this, which
I have not tested at all and may contain bugs or simply be one
big bug:
#! /usr/bin/perl -p
while (/(?<!0x)([0-9]+)/) {
$_ = $` . sprintf("0x%x", $1) . $';
}


Unfortunately, it actually hangs on the first line that _does_ contain a
number :-). It would also try to change

int data14;

to

int data0xe;

which probably isn't desirable (only slightly less desirable than 14 variables
named 'data'). The following is only slightly more robust, but the OP can run
with it if he feels inclined:
[~/perl: 137]% cat numbers
#define MAX 16777215

int main(void){
int x = 14;
int y17 = 9;
int z = 0x10A9;
return 0;
}
[~/perl: 138]% perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' numbers
#define MAX 0xFFFFFF

int main(void){
int x = 0xE;
int y17 = 0x9;
int z = 0x10A9;
return 0x0;
}
[~/perl: 139]%
Good luck,

Brandan L.
--
bclennox AT eos DOT ncsu DOT edu
Nov 14 '05 #13

P: n/a
"LaDainian Tomlinson" <go@away.spam> writes:
"Ben Pfaff" <bl*@cs.stanford.edu> wrote:
"^_^" <cl********@hotmail.com> writes:
I want to know, either, that convert decimal format numbers to hexademical
format.


while (/(?<!0x)([0-9]+)/) {
$_ = $` . sprintf("0x%x", $1) . $';
}


[~/perl: 138]% perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' numbers


Ah, I'd forgotten about the `e' flag, thanks.
--
"The way I see it, an intelligent person who disagrees with me is
probably the most important person I'll interact with on any given
day."
--Billy Chambless
Nov 14 '05 #14

P: n/a
LaDainian Tomlinson wrote:
[~/perl: 138]% perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' numbers


$ perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' <<< 010
0xA


Nov 14 '05 #15

P: n/a
Jeremy Yallop wrote:
LaDainian Tomlinson wrote:
[~/perl: 138]% perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' numbers

$ perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' <<< 010
0xA

Right. Numbers in octal base should be excluded because the conversion
would change the value.

changint to
$ perl -pe 's/\b([1-9][0-9]+)\b/sprintf "0x\U%x", $1/ge'
should help
--
Regards,
Christof Krueger

Nov 14 '05 #16

P: n/a
test..

"LaDainian Tomlinson" <go@away.spam> wrote in message
news:2x4Lb.9491$6l1.6468@okepread03...
"Ben Pfaff" <bl*@cs.stanford.edu> wrote:
"^_^" <cl********@hotmail.com> writes:
I want to know, either, that convert decimal format numbers to hexademical format.
Several of us have pointed out that you probably don't want to do
this and that it will be difficult even if you do. However, the
difficulty mainly stems from a desire to get the result
completely correct. If you're not concerned with complete
correctness, but would be willing to look over the results and
fix any mistakes (which would probably be rare), then I'd bet you
could write a fairly simple script in Perl or another scripting
langugae to do your translation; e.g., something like this, which
I have not tested at all and may contain bugs or simply be one
big bug:
#! /usr/bin/perl -p
while (/(?<!0x)([0-9]+)/) {
$_ = $` . sprintf("0x%x", $1) . $';
}


Unfortunately, it actually hangs on the first line that _does_ contain a
number :-). It would also try to change

int data14;

to

int data0xe;

which probably isn't desirable (only slightly less desirable than 14

variables named 'data'). The following is only slightly more robust, but the OP can run with it if he feels inclined:
[~/perl: 137]% cat numbers
#define MAX 16777215

int main(void){
int x = 14;
int y17 = 9;
int z = 0x10A9;
return 0;
}
[~/perl: 138]% perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' numbers
#define MAX 0xFFFFFF

int main(void){
int x = 0xE;
int y17 = 0x9;
int z = 0x10A9;
return 0x0;
}
[~/perl: 139]%
Good luck,

Brandan L.
--
bclennox AT eos DOT ncsu DOT edu

Nov 14 '05 #17

This discussion thread is closed

Replies have been disabled for this discussion.