Question: Unicode <-> HEX conversion in C source file?

^_^

conversion from:

a="a";

to

a=0x????;

If there are many unicode strings to convert, how can I do batch-conversion?

Nov 14 '05 #1

Subscribe Post Reply

11015

Kevin Goodsell

^_^ wrote:

conversion from:

a="a";

to

a=0x????;

If there are many unicode strings to convert, how can I do batch-conversion?

If you really want help, then

1) Stop cross-posting wildly.
2) Stop re-posting similar messages over and over.
3) Phrase your question in a way that we can understand it.

Try posting ONE message to ONE relevant group that explains your problem
in sufficient detail, then wait for a reply (which may take several
hours). Otherwise you are likely to be ignored, flamed, and/or killfiled.

-Kevin
--
My email address is valid, but changes periodically.
To contact me please use the address from a recent posting.

Nov 14 '05 #2

Stephen Howe

> If you really want help, then

1) Stop cross-posting wildly.
2) Stop re-posting similar messages over and over.
3) Phrase your question in a way that we can understand it.

Try posting ONE message to ONE relevant group that explains your problem
in sufficient detail, then wait for a reply (which may take several
hours). Otherwise you are likely to be ignored, flamed, and/or killfiled.

I don't think this guy speaks English that well, it is a foreign language to
him, hence the cryptic messages.
Probably Chinese.

Stephen Howe

Nov 14 '05 #3

Morris Dovey

Stephen Howe wrote:

I don't think this guy speaks English that well, it is a
foreign language to him, hence the cryptic messages. Probably
Chinese.

[Reading in news:comp.lang.c]

No need to guess. From cleansugar's header:

Organization: Korea Telecom
Message-ID: <bt**********@news1.kornet.net>

I think the OP wants a tool that can be used to convert string
literals to unicode equivalents in C and/or C++ source files.

Can someone who knows more about this than I either redirect or
provide help?
--
Morris Dovey
West Des Moines, Iowa USA
C links at http://www.iedu.com/c
Read my lips: The apple doesn't fall far from the tree.

Nov 14 '05 #4

Thomas Wintschel

"^_^" <cl********@hotmail.com> wrote in message
news:bt**********@news1.kornet.net...

conversion from:

a="a";

to

a=0x????;

If there are many unicode strings to convert, how can I do batch-conversion?

You can try the NCBI C++ Toolkit. It is portable and free.
http://www.ncbi.nlm.nih.gov/IEB/Tool...DOC/index.html

It contains, among other things, some utility functions for converting
characters and strings from ascii to unicode.
http://www.ncbi.nih.gov/IEB/ToolBox/.../util/utf8.hpp

HTH
Tom

Nov 14 '05 #5

^_^

I'm sorry that I was rude to speak unpolite broken English.

It's my fault. I am not an English speaker

Though, I can speak more correct expression, I was neglect.

Sorry.
What I want is to convert Unicode characters in source code to 0x??? format.

Then it is going to be saved as ASCII format a documents.

Written in not-Latin format Unicode characters in source code cause that
English OS users can not read it without fonts.

If source code's format were saved as UTF8, compiler reads it automatically.

But I don't want this method.

I want to know, either, that convert decimal format numbers to hexademical
format.

For example, I'll show an source.

example.cpp:

#define MAX 16777215
void main(){
if (MAX==a) printf("wrong\n";);
}

example_I_wanted.cpp
#define MAX 0xFFFFFF <------*this part*
void main(){
if (MAX==a) printf("wrong\n";);
}

To do so, C or C++ source parsing->converting DEC to HEX->saving CPP file
with converted characters are needed.

I don't know detailed metheds.

If gurus like you give me some good ways, I will follow your wisdom.

Thank you and I'm sorry again.

"Morris Dovey" <mr*****@iedu.com> wrote in message
news:ah****************@news.uswest.net...

Stephen Howe wrote:
I don't think this guy speaks English that well, it is a
foreign language to him, hence the cryptic messages. Probably
Chinese.

[Reading in news:comp.lang.c]

No need to guess. From cleansugar's header:

Organization: Korea Telecom
Message-ID: <bt**********@news1.kornet.net>

I think the OP wants a tool that can be used to convert string
literals to unicode equivalents in C and/or C++ source files.

Can someone who knows more about this than I either redirect or
provide help?
--
Morris Dovey
West Des Moines, Iowa USA
C links at http://www.iedu.com/c
Read my lips: The apple doesn't fall far from the tree.

Nov 14 '05 #6

Joona I Palaste

^_^ <cl********@hotmail.com> scribbled the following
on comp.lang.c:

What I want is to convert Unicode characters in source code to 0x??? format. Then it is going to be saved as ASCII format a documents. Written in not-Latin format Unicode characters in source code cause that
English OS users can not read it without fonts. If source code's format were saved as UTF8, compiler reads it automatically. But I don't want this method. I want to know, either, that convert decimal format numbers to hexademical
format. For example, I'll show an source. example.cpp: #define MAX 16777215
void main(){
if (MAX==a) printf("wrong\n";);
} example_I_wanted.cpp
#define MAX 0xFFFFFF <------*this part*
void main(){
if (MAX==a) printf("wrong\n";);
}

You don't *HAVE* to do this. As numbers, 16777215 and 0xFFFFFF are
completely interchangable within a C or C++ program. The runtime
program will only see them as a pattern of bits anyway.

And void main() is an illegal form of main(). Use int main().

So, the answer to your question is: your programs should work fine as
they are.

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"Life without ostriches is like coffee with milk."
- Mika P. Nieminen

Nov 14 '05 #7

Martin Ambuhl

^_^ wrote:

I'm sorry that I was rude to speak unpolite broken English.
It's my fault. I am not an English speaker
That's OK, it wasn't rude, nor was your English unpolite in any way. (By
the way, the normal English word is "impolite." "Unpolite" is perfectly
logical and understandable, but it disappeared from normal English use in
the early 18th century.) The problem is that you didn't give us a question
that we could understand. Many people who *are* native English speakers
fail to do this.

Posting to both C and C++ newsgroups is likely an error. C and C++ are
different languages, and, even when the languages admit the same forms of
code, the normal idioms in the two languages are different. It makes sense
to post to both _only_ when the question has the same answers in both
languages. Since you can't know this, since you would then already know
the answer, it is best to post to a newsgroup for the language you are using.
Though, I can speak more correct expression, I was neglect.
As a side note, you might consider comp.usage.english as another newsgroup
you might post in, if improving your English is important to you. The
above line, for example, might more idiomatically be written, "However, I
can express myself better. I was negligent [or neglectful]."

What I want is to convert Unicode characters in source code to 0x??? format.
If you can read the Unicode characters into a buffer, you can convert those
chars into an integer, as long as the total number of bytes in a character
is less than the sizeof the integer (best unsigned) type that you use.

I want to know, either, that convert decimal format numbers to hexademical
format.
Numbers as stored are simply binary, interpreted for humans as in some base.
Suppse you have an unsigned int
unsigned int a = 263;
We can display this as octal
printf("%#o\n",a); /* displays 0407 */
or hex
printf("%#x\n",a); /* displays 0x107 */
or decimal
printf("%u\n",a); /* displays 263 */

For example, I'll show an source.

example.cpp:

#define MAX 16777215
void main(){
main always returns an int. "void" is wrong. Don't do this.
if (MAX==a) printf("wrong\n";);
Even though this is an example of an input file, it is best not to post
hopeless code.
The variable 'a' is undeclared.
The C++ people may object that "printf" is too un-C++-like and complain
that <cstdio> is not #included.
The C people might complain that <stdio.h> is not #included. People
using compilers without C99 conformance (almost all), may complain
that main should actually return a value; 0 is common for successful
completion and EXIT_SUCCESS and EXIT_FAILURE are available if
<stdlib.h> is #included.
}

example_I_wanted.cpp
#define MAX 0xFFFFFF <------*this part*
void main(){
main always returns an int. "void" is wrong. Don't do this.
if (MAX==a) printf("wrong\n";);
}

To do so, C or C++ source parsing->converting DEC to HEX->saving CPP file
with converted characters are needed.

To parse an input file containing a C program is probably beyond you at the
moment. You will need to detect sequence of characters that might be an
integer, determine that it is one (this requires examining its context),
and probably checking the use for signedness.

It is probably better for you to edit these files by hand. It is largely
because of the occurances "void main()" that I presume that your computing
skills are not up to writing such a program. If I am in error, I apologize.

--
Martin Ambuhl

Nov 14 '05 #8

Joona I Palaste

^_^ <cl********@hotmail.com> scribbled the following
on comp.lang.c:

Why I want 0x???? is easy reading.

Oh, now I see. Well, I don't have any ready-made solution for changing
the decimal values to hexadecimal ones. Sorry for wasting your time
answering the wrong question.

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"The question of copying music from the Internet is like a two-barreled sword."
- Finnish rap artist Ezkimo

Nov 14 '05 #9

^_^

Why I want 0x???? is easy reading.
"Joona I Palaste" <pa*****@cc.helsinki.fi> wrote in message
news:bt**********@oravannahka.helsinki.fi...

^_^ <cl********@hotmail.com> scribbled the following
on comp.lang.c:
What I want is to convert Unicode characters in source code to 0x??? format.
Then it is going to be saved as ASCII format a documents.
Written in not-Latin format Unicode characters in source code cause that
English OS users can not read it without fonts.

If source code's format were saved as UTF8, compiler reads it

automatically.
But I don't want this method.

I want to know, either, that convert decimal format numbers to

hexademical format.

For example, I'll show an source.

example.cpp:

#define MAX 16777215
void main(){
if (MAX==a) printf("wrong\n";);
}

example_I_wanted.cpp
#define MAX 0xFFFFFF <------*this part*
void main(){
if (MAX==a) printf("wrong\n";);
}

You don't *HAVE* to do this. As numbers, 16777215 and 0xFFFFFF are
completely interchangable within a C or C++ program. The runtime
program will only see them as a pattern of bits anyway.

And void main() is an illegal form of main(). Use int main().

So, the answer to your question is: your programs should work fine as
they are.

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"Life without ostriches is like coffee with milk."
- Mika P. Nieminen

Nov 14 '05 #10

^_^

Thank you very much.

"Martin Ambuhl" <ma*****@earthlink.net> wrote in message
news:gZ******************@newsread3.news.atl.earth link.net...

^_^ wrote:
I'm sorry that I was rude to speak unpolite broken English.
It's my fault. I am not an English speaker
That's OK, it wasn't rude, nor was your English unpolite in any way. (By
the way, the normal English word is "impolite." "Unpolite" is perfectly
logical and understandable, but it disappeared from normal English use in
the early 18th century.) The problem is that you didn't give us a

question that we could understand. Many people who *are* native English speakers
fail to do this.

Posting to both C and C++ newsgroups is likely an error. C and C++ are
different languages, and, even when the languages admit the same forms of
code, the normal idioms in the two languages are different. It makes sense to post to both _only_ when the question has the same answers in both
languages. Since you can't know this, since you would then already know
the answer, it is best to post to a newsgroup for the language you are using.
Though, I can speak more correct expression, I was neglect.
As a side note, you might consider comp.usage.english as another newsgroup
you might post in, if improving your English is important to you. The
above line, for example, might more idiomatically be written, "However, I
can express myself better. I was negligent [or neglectful]."

What I want is to convert Unicode characters in source code to 0x??? format.
If you can read the Unicode characters into a buffer, you can convert those chars into an integer, as long as the total number of bytes in a character
is less than the sizeof the integer (best unsigned) type that you use.

I want to know, either, that convert decimal format numbers to
hexademical format.

Numbers as stored are simply binary, interpreted for humans as in some

base. Suppse you have an unsigned int
unsigned int a = 263;
We can display this as octal
printf("%#o\n",a); /* displays 0407 */
or hex
printf("%#x\n",a); /* displays 0x107 */
or decimal
printf("%u\n",a); /* displays 263 */

For example, I'll show an source.

example.cpp:

#define MAX 16777215
void main(){
main always returns an int. "void" is wrong. Don't do this.
if (MAX==a) printf("wrong\n";);

Even though this is an example of an input file, it is best not to post
hopeless code.
The variable 'a' is undeclared.
The C++ people may object that "printf" is too un-C++-like and complain
that <cstdio> is not #included.
The C people might complain that <stdio.h> is not #included. People
using compilers without C99 conformance (almost all), may complain
that main should actually return a value; 0 is common for successful
completion and EXIT_SUCCESS and EXIT_FAILURE are available if
<stdlib.h> is #included.
}

example_I_wanted.cpp
#define MAX 0xFFFFFF <------*this part*
void main(){

main always returns an int. "void" is wrong. Don't do this.
if (MAX==a) printf("wrong\n";);
}

To do so, C or C++ source parsing->converting DEC to HEX->saving CPP file with converted characters are needed.

To parse an input file containing a C program is probably beyond you at

the moment. You will need to detect sequence of characters that might be an
integer, determine that it is one (this requires examining its context),
and probably checking the use for signedness.

It is probably better for you to edit these files by hand. It is largely
because of the occurances "void main()" that I presume that your computing
skills are not up to writing such a program. If I am in error, I apologize.
--
Martin Ambuhl

Nov 14 '05 #11

Ben Pfaff

"^_^" <cl********@hotmail.com> writes:

I want to know, either, that convert decimal format numbers to hexademical
format.

Several of us have pointed out that you probably don't want to do
this and that it will be difficult even if you do. However, the
difficulty mainly stems from a desire to get the result
completely correct. If you're not concerned with complete
correctness, but would be willing to look over the results and
fix any mistakes (which would probably be rare), then I'd bet you
could write a fairly simple script in Perl or another scripting
langugae to do your translation; e.g., something like this, which
I have not tested at all and may contain bugs or simply be one
big bug:
#! /usr/bin/perl -p
while (/(?<!0x)([0-9]+)/) {
$_ = $` . sprintf("0x%x", $1) . $';
}
--
char a[]="\n .CJacehknorstu";int putchar(int);int main(void){unsigned long b[]
={0x67dffdff,0x9aa9aa6a,0xa77ffda9,0x7da6aa6a,0xa6 7f6aaa,0xaa9aa9f6,0x1f6},*p=
b,x,i=24;for(;p+=!*p;*p/=4)switch(x=*p&3)case 0:{return 0;for(p--;i--;i--)case
2:{i++;if(1)break;else default:continue;if(0)case 1:putchar(a[i&15]);break;}}}

Nov 14 '05 #12

LaDainian Tomlinson

"Ben Pfaff" <bl*@cs.stanford.edu> wrote:

"^_^" <cl********@hotmail.com> writes:
I want to know, either, that convert decimal format numbers to hexademical
format.

Several of us have pointed out that you probably don't want to do
this and that it will be difficult even if you do. However, the
difficulty mainly stems from a desire to get the result
completely correct. If you're not concerned with complete
correctness, but would be willing to look over the results and
fix any mistakes (which would probably be rare), then I'd bet you
could write a fairly simple script in Perl or another scripting
langugae to do your translation; e.g., something like this, which
I have not tested at all and may contain bugs or simply be one
big bug:
#! /usr/bin/perl -p
while (/(?<!0x)([0-9]+)/) {
$_ = $` . sprintf("0x%x", $1) . $';
}

Unfortunately, it actually hangs on the first line that _does_ contain a
number :-). It would also try to change

int data14;

to

int data0xe;

which probably isn't desirable (only slightly less desirable than 14 variables
named 'data'). The following is only slightly more robust, but the OP can run
with it if he feels inclined:
[~/perl: 137]% cat numbers
#define MAX 16777215

int main(void){
int x = 14;
int y17 = 9;
int z = 0x10A9;
return 0;
}
[~/perl: 138]% perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' numbers
#define MAX 0xFFFFFF

int main(void){
int x = 0xE;
int y17 = 0x9;
int z = 0x10A9;
return 0x0;
}
[~/perl: 139]%
Good luck,

Brandan L.
--
bclennox AT eos DOT ncsu DOT edu

Nov 14 '05 #13

Ben Pfaff

"LaDainian Tomlinson" <go@away.spam> writes:

"Ben Pfaff" <bl*@cs.stanford.edu> wrote:
"^_^" <cl********@hotmail.com> writes:
I want to know, either, that convert decimal format numbers to hexademical
format.

while (/(?<!0x)([0-9]+)/) {
$_ = $` . sprintf("0x%x", $1) . $';
}

[~/perl: 138]% perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' numbers

Ah, I'd forgotten about the `e' flag, thanks.
--
"The way I see it, an intelligent person who disagrees with me is
probably the most important person I'll interact with on any given
day."
--Billy Chambless

Nov 14 '05 #14

Jeremy Yallop

LaDainian Tomlinson wrote:

[~/perl: 138]% perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' numbers

$ perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' <<< 010
0xA

Nov 14 '05 #15

Christof Krueger

Jeremy Yallop wrote:

LaDainian Tomlinson wrote:
[~/perl: 138]% perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' numbers

$ perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' <<< 010
0xA

Right. Numbers in octal base should be excluded because the conversion
would change the value.

changint to
$ perl -pe 's/\b([1-9][0-9]+)\b/sprintf "0x\U%x", $1/ge'
should help
--
Regards,
Christof Krueger

Nov 14 '05 #16

Çã¼º¿í

test..

"LaDainian Tomlinson" <go@away.spam> wrote in message
news:2x4Lb.9491$6l1.6468@okepread03...

"Ben Pfaff" <bl*@cs.stanford.edu> wrote:
"^_^" <cl********@hotmail.com> writes:
I want to know, either, that convert decimal format numbers to hexademical format.
Several of us have pointed out that you probably don't want to do
this and that it will be difficult even if you do. However, the
difficulty mainly stems from a desire to get the result
completely correct. If you're not concerned with complete
correctness, but would be willing to look over the results and
fix any mistakes (which would probably be rare), then I'd bet you
could write a fairly simple script in Perl or another scripting
langugae to do your translation; e.g., something like this, which
I have not tested at all and may contain bugs or simply be one
big bug:
#! /usr/bin/perl -p
while (/(?<!0x)([0-9]+)/) {
$_ = $` . sprintf("0x%x", $1) . $';
}

Unfortunately, it actually hangs on the first line that _does_ contain a
number :-). It would also try to change

int data14;

to

int data0xe;

which probably isn't desirable (only slightly less desirable than 14

variables named 'data'). The following is only slightly more robust, but the OP can run with it if he feels inclined:
[~/perl: 137]% cat numbers
#define MAX 16777215

int main(void){
int x = 14;
int y17 = 9;
int z = 0x10A9;
return 0;
}
[~/perl: 138]% perl -pe 's/\b([0-9]+)\b/sprintf "0x\U%x", $1/ge' numbers
#define MAX 0xFFFFFF

int main(void){
int x = 0xE;
int y17 = 0x9;
int z = 0x10A9;
return 0x0;
}
[~/perl: 139]%
Good luck,

Brandan L.
--
bclennox AT eos DOT ncsu DOT edu

Nov 14 '05 #17

Similar topics

XML_RPC and unicode problems

by: Thomas | last post by:

I am currently passing email messages over XML_RPC as the payload for a certain function call. On some of these messages, XML_RPC blows up on the server side and says something to the effect of: ...

Python

Simple UNICODE question

by: Roger Thornhill | last post by:

Hi - I have a question that I am sure is a basic UNICODE question for anyone out there with UNICODE experience. I simply would like to see a non-Latin unicode character printed to my console....

C / C++

vargraphic column Question.

by: Irfan Bondre | last post by:

When I try to create a table with a vargraphic column I get the following error. CREATE TABLE EIITEST.IRFAN ("VARC" VARCHAR (48) , "VARGRA" VARGRAPHIC (96) ) DATA CAPTURE NONE IN USERSPACE1;...

DB2 Database

non-Unicode programs

by: LinasB | last post by:

Hi, How to read system setting value of "Language for non-Unicode programs" ? Or how to set it programmatically? LinasB

Visual Basic .NET

UTF-8 question.

by: Richard Connamacher | last post by:

I'm new to PostgreSQL, and from the looks of it, it's a great database, and I'll be using more of it in the future. I had a quick question if anyone could clear this up. The documentation for...

PostgreSQL Database

Question re byte arrays and strings

by: Steve Marshall | last post by:

Hi all, This is probably a real dumb question, but I just haven't come across the answer... Is there a simple way to treat a byte array as a string, or to convert it to a string? And the...

Visual Basic .NET

NewB question on text manipulation

by: ProvoWallis | last post by:

I'm totally stumped by this problem so I'm hoping someone can give me a little advice or point me in the right direction. I have a file that looks like this: <SC>APPEAL<XC>40-24; 40-46; 42-46;...

Python

Unicode Application Question

by: Rob | last post by:

Hello, We are in the process of writing an application that is unicode compliant. The question that I have is what unicode font should be used for all of our forms? Currently we are using...

Visual Basic .NET

Unicode to HTML converter/cleaner

by: Rob | last post by:

Is there a handy .NET call to convert a Unicode string to valid HTML so sticking in an <pinner HTML? Kind of thing converts "<" to &lt and multiple spaces to &nbsp etc. I need to display unicode...

ASP.NET

a question about Chinese characters in aPython Program

by: Liang Chen | last post by:

Hope you all had a nice weekend. I have a question that I hope someone can help me out. I want to run a Python program that uses Tkinter for the user interface (GUI). The program allows me to type...

Python

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware