In article <16******************@x.x> ben <x@x.x> wrote:
i have a bit of code, that works absolutely fine as is, but seems over
complicated/long winded. is there anyway to shorten/simplify it?
Certainly.
(See original for longer description and original code and variable
names -- the idea is to copy 4-bit units from C99-style 8-bit
uint8_t storage units, from source to destination, not necessarily
at the same 4-bit offset. The mask is apparently 0x0f, then 0xf0,
then 0x0f, then 0xf0, etc., based on nybble offset of 0, 1, 2, etc.,
as counted by "st" and "st2".)
// 'st' is an int for adding to datatocopy to step along it (the number
// of 4bit blocks from the left (starting at 0)).
while( ((*(datatocopy + ((st % 2 ? st-1 : st)/2 ))) >> (st % 2 ? 0 :
4) & 0xf) != 0 ) { // steps along and reads 4bits at a time
Note that if st is always nonnegative, there is no need to test
"st % 2" before dividing by 2, as 0/2 is 0, 1/2 is 0, 2/2 is 1, 3/2
is 1, and so on. Just making use of that fact would simplify the
code quite a bit.
If the loop is expected to run for relatively many 4-bit "nybbles",
though, I would compute an initial set of loop invariants, e.g.:
s: points to the uint8_t that contains the source nybble (4-bit unit)
d: points to the uint8_t that contains the destination nybble
sm: source mask, 0x0f or 0xf0 based on whether st is even or odd
dm: destination mask, 0x0f or 0xf0, based on st2 respectively
s = &datatocopy[st / 2];
d = &tocopyto[st2 / 2];
sm = st % 2 ? 0xf0 : 0x0f;
dm = st2 % 2 ? 0xf0 : 0x0f;
At this point you have a choice of "compact code" vs "fast code".
There are two possibilities: sm == dm, or sm != dm, and if you
expect the loop to run quite a lot, you might want to check. If
the two masks are the same, you can copy 8-bit units at a time
with no masking at all (since uint8_t is exactly 8 bits) until
you hit the first 8-bit byte that contains a 4-bit nybble that
is zero:
if (sm == dm) {
... take care of first upper nybble if required
(but note that this might be 0!) ...
/* then do whole bytes while possible */
if (did not already stop)
while ((*s | (*s >> 4)) & 0x0f) {
*d++ = *s++;
st += 2;
st2 += 2;
}
... then take care of last nybble if required ...
} else {
...
}
Even when the masks differ you can still read and copy quickly,
although this gets into "diminishing returns" territory, by reading
two source 8-bit-bytes, composing a destination byte, storing that,
and then reading one new source byte and reusing the "left over"
nybble in the other previously-read source byte.
Assuming you go for "compact code" instead, though, we just set up
the loop invariants as shown above, then do this:
while ((bits = *s & sm) != 0) {
/*
* We have some nonzero bits to copy, saved up in "bits".
* Move them to whichever half is required...
*/
if (sm != dm) {
if (sm & 0xf0) /* same as "if st % 2" */
bits >>= 4;
else
bits <<= 4;
}
*d = (*d & ~dm) | bits;
st++;
st2++;
#if 0
/* restore invariants -- this is one way, but not as compact */
if (sm & 0x0f) /* same as "if st%2 == 0" */
sm = 0xf0;
else {
sm = 0x0f;
s++;
}
if (dm & 0x0f)
dm = 0xf0;
else {
dm = 0x0f;
d++;
}
#else
/*
* The other way is to just recompute them from scratch,
* same as at the top of the loop. It is likely that the
* compiler will re-use that code (and/or we could compute just
* s and sm here, and compute d and dm just after the "while").
*/
s = &datatocopy[st / 2];
d = &tocopyto[st2 / 2];
sm = st % 2 ? 0xf0 : 0x0f; /* or see below */
dm = st2 % 2 ? 0xf0 : 0x0f;
#endif
}
Finally, it is worth considering just doing one "offsets do not
match" special case, because it removes a number of inside-the-loop
tests even when copying 4 bits at a time. We also get to use some
special properties: in 8 bits, ~0x0f is 0xf0, and vice versa; and
since st and st2 are (presumably) never negative, st%2 and st&1
produce the same values. (The compiler should optimize this away
for you, of course, if st and st2 have unsigned types, so it just
becomes a question of which you think is more readable.)
s = &datatocopy[st / 2];
d = &tocopyto[st2 / 2];
mask = st & 1 ? 0xf0 : 0x0f;
if ((st & 1) == (st2 & 1)) {
/*
* Nybble masks match, so do it the "easy way" (no shifting).
* Note that we want to increment the 8-bit pointers only when
* the values in st and st2 are odd, which is also easy to do
* by an arithmetic trick (provided we add to s and d *before*
* increment st and st2).
*/
while ((bits = *s & mask) != 0) {
*d = (*d & ~mask) | bits;
s += st & 1; /* +1 if odd, +0 if even */
d += st2 & 1;
st++;
st2++;
mask = ~mask; /* 0x0f => 0xf0; 0xf0 => 0x0f */
}
} else {
/*
* Nybble masks do not match; we have to alternate shifting
* left and right by 4. We can either unroll the loop a bit,
* or just test either st or st2 inside it (as here).
*/
while ((bits = *s & mask) != 0) {
if (st & 1) {
/*
* Source is odd (upper) nybble so move down.
* We know that (source) mask == 0xf0 here so
* dest mask should be 0x0f. Might as well
* increment s (and not d) here too.
*/
bits >>= 4;
*d = (*d & ~0x0f) | bits;
s++;
} else {
/*
* Source is even (lower) nybble so move up.
* Again could use constants...
*/
bits <<= 4;
*d = (*d & ~0xf0) | bits;
d++;
}
st++;
st2++;
mask = ~mask;
}
}
Unrolling the "do not mask" loop is ugly, but just for illustration,
here is a version using a goto (the other way is to have two separate
unrolled loops!):
} else {
if (st & 1)
goto opposite_mask_st_odd;
/* unrolled loop, in which st is even and st2 is odd at top: */
for (;;) {
/* here st is even and st2 is odd */
if ((bits = *s & 0x0f) == 0)
break;
*d = (*d & ~0xf0) | (bits << 4);
st++, st2++, d++;
opposite_mask_st_odd:
/* here st is odd and st2 is even */
if ((bits = *s & 0xf0) == 0)
break;
*d = (*d & ~0x0f) | (bits >> 4);
st++, st2++, s++;
}
}
Unrolling actually eliminates the "mask" variable entirely since
we just do each half, and eliminates the even/odd test inside the
loop -- so it winds up being about as long, code-wise.
Note that one can unroll even the "mask is the same" loop, but in
that case it is not quite as clever -- it just lets you use constants
for the masks, and eliminate s++ and d++ in one half of the unrolled
loop. If this were assembly, I probably would do it; in C... well,
*maybe* if profiling showed a lot of time going here.
(Incidentally, I will add here that I spent a lot of time optimizing
assembly-language memcpy() on SPARCV9, where you use some special
registers and instructions that require 64-*byte* alignment. This
needs a lot of leading and trailing mopping-up work, but is
fundamentally the same problem.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it
http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.