character byte str[i] treated as signed, I need unsigned

Susan Rice

I'm comparing characters via

return(str1[i] - str2[i]);

and I'm having problems with 8-bit characters being treated as signed
instead of unsigned integers. The disassembly is using

movsx eax,byte ptr[edx]

to load my character in to EAX register. I need it to use movzx.
How can I recode this to treat my characters as unsigned instead of signed?

Nov 2 '06 #1

Subscribe Post Reply

1714

Ian Collins

Susan Rice wrote:

I'm comparing characters via

return(str1[i] - str2[i]);

and I'm having problems with 8-bit characters being treated as signed
instead of unsigned integers. The disassembly is using

movsx eax,byte ptr[edx]

to load my character in to EAX register. I need it to use movzx.
How can I recode this to treat my characters as unsigned instead of signed?

By defining (or if all else fails) casting them as unsigned. You
haven't shown the definition of str1 or str2.

--
Ian Collins.

Nov 2 '06 #2

Walter Roberson

In article <FZ*****************@newsfe10.phx>,
Susan Rice <sr****@cox.netwrote:

>I'm comparing characters via

return(str1[i] - str2[i]);

>and I'm having problems with 8-bit characters being treated as signed
instead of unsigned integers. The disassembly is using

movsx eax,byte ptr[edx]

>to load my character in to EAX register. I need it to use movzx.
How can I recode this to treat my characters as unsigned instead of signed?

Anything down at the assembly level is out of scope for this newsgroup,
which does not deal with implementation specifics.

Fortunately, you do not need to go down to that level. Try just

return( (unsigned char)str1[i] - (unsigned char)str2[i] );

--
"No one has the right to destroy another person's belief by
demanding empirical evidence." -- Ann Landers

Nov 2 '06 #3

Peter Nilsson

Susan Rice wrote:

I'm comparing characters via

return(str1[i] - str2[i]);

How are str1, str2 and i declared? What's the rest of the function?
How is the result meant to be used?

and I'm having problems with 8-bit characters

Why do you care how many bits in a character there are?

being treated as signed instead of unsigned integers.

You're telling us what you _think_ the problem is, rather than
explaining the problem itself, e.g. "i inputted this, the output
I got was this, the output I wanted was this, here is my code
and what it is meant to do."

[In other words, don't tell us the sign difference is your problem,
tell us
_why_ it's a problem.]

You should know that knee-jerk "this'll fix it" responses may not be
addressing other important issues of your code. For instance, your
methodology is not guaranteed to yeild alphabetical ordering.

[In other words, your minimalist presentation may mean you only get
a superficial (and possibly broken) solution to your problem, whilst
deeper issues with your code are left uncorrected.]

The disassembly is using

movsx eax,byte ptr[edx]

Learning C by examining the disassembly is the WORST thing you can
do. When you change architectures you may find that there's an awful
lot of assumtions on your part that you'll have to unlearn.

--
Peter

Nov 3 '06 #4

Old Wolf

Walter Roberson wrote:

Susan Rice <sr****@cox.netwrote:
I'm comparing characters via

return(str1[i] - str2[i]);

and I'm having problems with 8-bit characters being treated as signed
instead of unsigned integers.

return( (unsigned char)str1[i] - (unsigned char)str2[i] );

A note to the OP: this will still return a negative value for
cases such as '1' - '2'. If the intent is to return a positive
value mod 256 in all cases then write:

return (unsigned int)(str1[i] - str2[i]) % 256;

(note that the return statement does not need brackets around
its expression).

Nov 3 '06 #5

Susan Rice

Here's the real problem I was unaware of, as explained by
Kernighan & Ritchie (whom you probably know as K&R):

"There is one subtle point about the conversion of characters to
integers. The language does not specify whether variables of type
char are signed or unsigned quantities. When a char is converted
to an int, can it ever produce a negative result? The answer varies
from machine to machine, reflecting differences in architecture.
On some machines a char whose leftmost bit is 1 will be converted
to a negative integer ("sign extension"). On others, a char is
promoted to an int by adding zeros at the left end, and thus is
always positive."
--Kernighan & Ritchie: "The C Programming Language"
(K&R, the inventors of the language.)

Susan Rice wrote:

I'm comparing characters via

return(str1[i] - str2[i]);

and I'm having problems with 8-bit characters being treated as signed
instead of unsigned integers. The disassembly is using

movsx eax,byte ptr[edx]

to load my character in to EAX register. I need it to use movzx.
How can I recode this to treat my characters as unsigned instead of signed?

Nov 3 '06 #6

Nils O. Selåsdal

Susan Rice wrote:

I'm comparing characters via

return(str1[i] - str2[i]);

and I'm having problems with 8-bit characters being treated as signed
instead of unsigned integers. The disassembly is using

movsx eax,byte ptr[edx]

to load my character in to EAX register. I need it to use movzx.
How can I recode this to treat my characters as unsigned instead of signed?

By declaring your array to hold unsigned chars.

Nov 3 '06 #7

Frederick Gotham

Susan Rice:

I'm comparing characters via

return(str1[i] - str2[i]);

and I'm having problems with 8-bit characters being treated as signed
instead of unsigned integers. The disassembly is using

movsx eax,byte ptr[edx]

to load my character in to EAX register. I need it to use movzx.
How can I recode this to treat my characters as unsigned instead of

signed?
I can't say for sure without knowing exactly what you're trying to do (e.g.
do you want roll-around, etc.), but here's something simple:

return (char unsigned)( (unsigned)str1[i] - str2[i] );

Somebody else offered something akin to the following:

return (char unsigned)str1[i] - (char unsigned)str2[i];

, but the casts are redudant, as both operands will be promoted to either
"signed int" or "unsigned int" before the subtraction takes place.

Of course, I don't know what you're trying to do, but at first glance, it
looks like you're going the wrong way about it (e.g. why are you using
plain char in the first place?)

--

Frederick Gotham

Nov 3 '06 #8

Walter Roberson

In article <2G*******************@news.indigo.ie>,
Frederick Gotham <fg*******@SPAM.comwrote:

>I can't say for sure without knowing exactly what you're trying to do (e.g.
do you want roll-around, etc.), but here's something simple:

return (char unsigned)( (unsigned)str1[i] - str2[i] );

>Somebody else offered something akin to the following:

return (char unsigned)str1[i] - (char unsigned)str2[i];

>, but the casts are redudant, as both operands will be promoted to either
"signed int" or "unsigned int" before the subtraction takes place.

No, I used (unsigned char) not (char unsigned) .

You are being inconsistant in your reasoning for using (char unsigned) .
Your stated reasons have to do with your usage of Irish, which
(you have said) puts the most important information first. In this
case, the part that is most important is not the size of the item
but rather the unsigned-ness, so unsigned would go first in your
reasoning.

(You might, I suppose, argue that it is quite important in the cast
operation to know that you are casting to an integral type rather than
a floating type, and that on that basis that the char should go first.
However, there are no unsigned floating types, so the appearance
of unsigned already tells you that you cannot be working
with an integral type, so using unsigned first already provides
the "This will be an integral type" hint.)
--
There are some ideas so wrong that only a very intelligent person
could believe in them. -- George Orwell

Nov 3 '06 #9

Frederick Gotham

Walter Roberson:

No, I used (unsigned char) not (char unsigned).

ARE YOU BRAIN DEAD ?

If I misquote you as using "int const" rather than "const int", will you
roar from a mountain top that I got it wrong?

You are being inconsistant in your reasoning for using (char unsigned).
Your stated reasons have to do with your usage of Irish, which
(you have said) puts the most important information first. In this
case, the part that is most important is not the size of the item
but rather the unsigned-ness, so unsigned would go first in your
reasoning.

Have you drilled a hole into my skull and had a look at my brain?

Don't pretend to know how I think.

(You might, I suppose, argue that it is quite important in the cast
operation to know that you are casting to an integral type rather than
a floating type, and that on that basis that the char should go first.
However, there are no unsigned floating types, so the appearance
of unsigned already tells you that you cannot be working
with an integral type, so using unsigned first already provides
the "This will be an integral type" hint.)

How about you spend more time focusing on the functionality of the code
rather than whether the pretty ribbons are green or yellow, and whether
they curl clockwise or anticlockwise.

--

Frederick Gotham

Nov 3 '06 #10

Walter Roberson

In article <5g*******************@news.indigo.ie>,
Frederick Gotham <fg*******@SPAM.comwrote:

>How about you spend more time focusing on the functionality of the code
rather than whether the pretty ribbons are green or yellow, and whether
they curl clockwise or anticlockwise.

I would point out that your offering was functionally equivilent to
mine (the one that used explicit casts in both locations), so -you-
were the one worrying about prettiness, not functionality.

You were commenting on elements of my code that did not affect
the functionality but did affect the readability, so it was completely
fair for me to comment on the elements of your code that did not
affect the functionality but did affect the readability.

>You are being inconsistant in your reasoning for using (char unsigned).
Your stated reasons have to do with your usage of Irish, which
(you have said) puts the most important information first. In this
case, the part that is most important is not the size of the item
but rather the unsigned-ness, so unsigned would go first in your
reasoning.

>Have you drilled a hole into my skull and had a look at my brain?

Do I need to locate and cite your previous articles in which
you explain your choice of syntactical order? You *did* make such
an explanation, and your most recent usage was contrary to that
explanation. You did not apply the reasoning that you had earlier
stated. We must therefore conclude that you apply your
previously-stated reasons inconsistantly; or that your previously
stated reasons were not your real reasons; or that your previously
stated reasons were not your -complete- reasons.

>Don't pretend to know how I think.

You are correct that I made a misstatement. I should not have
said that,
"You are being inconsistant in your reasoning for using (char unsigned)",
I should have said,
"You are being inconsistant with your stated reasoning for using
(char unsigned)".

This allows for a possibility that I did not allow for earlier,
namely that your actual reasoning might be quite consistant but that
your actual reasoning does not match your statements about your
reasoning.

--
"It is important to remember that when it comes to law, computers
never make copies, only human beings make copies. Computers are given
commands, not permission. Only people can be given permission."
-- Brad Templeton

Nov 3 '06 #11

CBFalconer

Susan Rice wrote:

Susan Rice wrote:

>I'm comparing characters via

return(str1[i] - str2[i]);

and I'm having problems with 8-bit characters being treated as
signed instead of unsigned integers. The disassembly is using

movsx eax,byte ptr[edx]

to load my character in to EAX register. I need it to use movzx.
How can I recode this to treat my characters as unsigned instead
of signed?

Here's the real problem I was unaware of, as explained by
Kernighan & Ritchie (whom you probably know as K&R):

"There is one subtle point about the conversion of characters to
integers. The language does not specify whether variables of type
char are signed or unsigned quantities. When a char is converted
to an int, can it ever produce a negative result? The answer varies
from machine to machine, reflecting differences in architecture.
On some machines a char whose leftmost bit is 1 will be converted
to a negative integer ("sign extension"). On others, a char is
promoted to an int by adding zeros at the left end, and thus is
always positive."
--Kernighan & Ritchie: "The C Programming Language"
(K&R, the inventors of the language.)

Please don't top-post. Your answer belongs after (or intermixed
with) the material you quote, after snipping portions irrelevant to
your reply. I fixed this one.

As others have said, simply use unsigned chars.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

Nov 3 '06 #12

Frederick Gotham

Walter Roberson:

I would point out that your offering was functionally equivilent to
mine (the one that used explicit casts in both locations), so -you-
were the one worrying about prettiness, not functionality.

Actually, my intent was to point out a flaw. Let's start off with two
char's:

char a,b;

Let's say we want to add the two of these together, and for the result to
be unsigned. All we need do is:

(unsigned)a + b;

However, what _you_ proposed was:

(char unsigned)a + (char unsigned)b;

Which might be equivalent to:

(int)(char unsigned)a + (int)(char unsigned)b;

, depending on whether "char unsigned" promotes to "int" or "unsigned". On
the majority of implementations, it promotes to "int". On such systems, the
result will therefore be a signed int.

You were commenting on elements of my code that did not affect
the functionality but did affect the readability, so it was completely
fair for me to comment on the elements of your code that did not
affect the functionality but did affect the readability.

I pointed out the flaw. At times though, I also point out redundancies. If
I see:

double a;
long b,c;

a = (double)b / (double)c;

, then I'd point out that only one cast is required:

a = (double)b/c;

However I tend not to comment on things like:

int const Vs const int
i++ Vs ++i

>>Have you drilled a hole into my skull and had a look at my brain?

Do I need to locate and cite your previous articles in which
you explain your choice of syntactical order?

You suggested that my word order would change because of the context.

You *did* make such an explanation, and your most recent usage was
contrary to that explanation.

_You_ think so, because of the context. Perhaps was reasoning doesn't go so
far as to take the context into account, but rather picks one syntax that
should be used throughout. Who knows?! I stopped thinking about it a long
time ago and I just go with the flow now.

You did not apply the reasoning that you had earlier
stated. We must therefore conclude that you apply your
previously-stated reasons inconsistantly; or that your previously
stated reasons were not your real reasons; or that your previously
stated reasons were not your -complete- reasons.

Or you could conclude that you do not understand my thinking, or that my
thinking takes into account the probablity that Alaska will suffer flash-
floods on account of Global Warming.

This allows for a possibility that I did not allow for earlier,
namely that your actual reasoning might be quite consistant but that
your actual reasoning does not match your statements about your
reasoning.

I am done explaining why I like red ribbons that turn clockwise on my
bicycle. Please see past the ribbons and look at the actual bicycle, as
I've had my fill of explaining my preference.

--

Frederick Gotham

Nov 3 '06 #13

Richard Heathfield

Frederick Gotham said:

If I see:

double a;
long b,c;

a = (double)b / (double)c;

, then I'd point out that only one cast is required:

a = (double)b/c;

....and then I'd point out that *no* cast is required:

a = b;
a /= c;

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Nov 3 '06 #14

Frederick Gotham

Richard Heathfield:

...and then I'd point out that *no* cast is required:

a = b;
a /= c;

In performing an assignment, you give the idea that you need to store a
value. For instance, consider:

a = (Type)b+c;

in place of:

a = b;
a += c;

The latter version may result in less efficient code than the former
version, because when a compiler sees an assignment statement, it's first
thought will be "hmm, I have to store a value".

The former version explicitly demonstrates that both the value of b and c
can be discarded, leaving the door wide open for the compiler to do
whatever it likes (e.g. make use of CPU registers).

Of course, I'm sure you can find an optimiser which will make the same
machine code for both of them.

Both of our methods work. Perhaps you prefer _your_ method. Perhaps _I_
prefer _my_ method. Let's not argue over whether pretty green anticlockwise
ribbons are better than pretty red clockwise ribbons.

--

Frederick Gotham

Nov 3 '06 #15

Richard Heathfield

Frederick Gotham said:

<snip>

Both of our methods work. Perhaps you prefer _your_ method. Perhaps _I_
prefer _my_ method. Let's not argue over whether pretty green
anticlockwise ribbons are better than pretty red clockwise ribbons.

This isn't a matter of preference, but of fact. You claimed that one cast is
*required*. I merely demonstrated that your claim is false.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Nov 3 '06 #16

Ian Collins

Frederick Gotham wrote:

Richard Heathfield:

>>...and then I'd point out that *no* cast is required:

a = b;
a /= c;

In performing an assignment, you give the idea that you need to store a
value. For instance, consider:

a = (Type)b+c;

in place of:

a = b;
a += c;

The latter version may result in less efficient code than the former
version, because when a compiler sees an assignment statement, it's first
thought will be "hmm, I have to store a value".

Or it may (probably) won't, it could impede the optimiser, making it
less efficient. Don't get so hung up on speculative micro
optimisations, let the compiler do it's job.

--
Ian Collins.

Nov 4 '06 #17

Old Wolf

Frederick Gotham wrote:

>
Have you drilled a hole into my skull and had a look at my brain?
Don't pretend to know how I think.

Do you know something about neuroscience that the rest
of us don't ?

Nov 5 '06 #18

Frederick Gotham

Old Wolf:

>Have you drilled a hole into my skull and had a look at my brain?
Don't pretend to know how I think.

Do you know something about neuroscience that the rest
of us don't ?

No, but Wikipedia is your friend:

http://en.wikipedia.org/wiki/Neuroscience

--

Frederick Gotham

Nov 5 '06 #19

by: David Cook | last post by:

Java's InetAddress class has some methods that use a byte-array to hold what it describes as a 'raw IP address'. So, I assume that they mean an array like: byte ba = new byte; would hold an...

Java

byte array and long??

by: w3r3w0lf | last post by:

hello! I have a following situation: I have a byte array where at a certain location are stored 4 bytes, and these should be "put" into long variable (or any other 4 byte one). ie: byte...

C / C++

clarification on character handling

by: aegis | last post by:

7.4#1 states The header <ctype.h> declares several functions useful for classifying and mapping characters.166) In all cases the argument is an int, the value of which shall be representable as an...

C / C++

What's wrong with this code ? (struct serialization to raw byte str

by: Alfonso Morra | last post by:

Hi, I am at the end of my tether now - after spending several days trying to figure how to do this. I have finally written a simple "proof of concept" program to test serializing a structure...

C / C++

Character Constants

by: Akhil | last post by:

Since a character constant is an int value represented as a character in single quotes,so it is treated as a 1 byte integer now look at the following snippet. #include<stdio.h> int main(void)...

C / C++

How do i declare a byte variable?

by: Manuel | last post by:

hi, I have a problem, a stupid problem. I can't declare a variable of type byte. The g++ said that i have syntactic error in this line. The code is this: byte * variable; well, i think...

C / C++

signed/unsigned byte

by: Lamefif | last post by:

how can the computer tell the difference between the two? i mean a byte is 8 bit can be 1 or 0 11111111 = 255 unsigned byte 10000000 = -128 or 128 ?

C / C++

using character as array subscript

by: Ivan | last post by:

Hi, What is the best syntax to use a char to index into an array. /////////////////////////////////// For example int data; data = 1;

C / C++

Multi-character constants

by: Mirco Wahab | last post by:

After reading through some (open) Intel (CPU detection) C++ source (www.intel.com/cd/ids/developer/asmo-na/eng/276611.htm) I stumbled upon a sketchy use of multibyte characters - - - - - - - - -...

C / C++

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

character byte str[i] treated as signed, I need unsigned

Similar topics