Hi, just trying to avoid wheel reinvention. I have need of an unsigned 32 bit
arithmetic type to carry out a checksum operation and wondered if anyone had
already defined such a beast.
Our current code works with 32 bit cpu's, but is failing with 64 bit
comparisons; it's clearly wrong as we are comparing a number with a negated
number; the bits might drop off in 32 bits, but not in 64.
--
Robin Becker 8 4164
Robin Becker schrieb:
Hi, just trying to avoid wheel reinvention. I have need of an unsigned
32 bit arithmetic type to carry out a checksum operation and wondered if
anyone had already defined such a beast.
Our current code works with 32 bit cpu's, but is failing with 64 bit
comparisons; it's clearly wrong as we are comparing a number with a
negated number; the bits might drop off in 32 bits, but not in 64.
Not sure what operations you are doing: In Python, bits never drop off
(at least not in recent versions).
If you need to drop bits, you need to do so explicitly, by using the
bit mask operations. I could tell you more if you'd tell us what
the specific operations are.
Regards,
Martin
Martin v. Löwis wrote:
Robin Becker schrieb:
>Hi, just trying to avoid wheel reinvention. I have need of an unsigned 32 bit arithmetic type to carry out a checksum operation and wondered if anyone had already defined such a beast.
Our current code works with 32 bit cpu's, but is failing with 64 bit comparisons; it's clearly wrong as we are comparing a number with a negated number; the bits might drop off in 32 bits, but not in 64.
Not sure what operations you are doing: In Python, bits never drop off
(at least not in recent versions).
If you need to drop bits, you need to do so explicitly, by using the
bit mask operations. I could tell you more if you'd tell us what
the specific operations are.
This code is in a contribution to the reportlab toolkit that handles TTF fonts.
The fonts contain checksums computed using 32bit arithmetic. The original
Cdefintion is as follows
ULONG CalcTableChecks um(ULONG *Table, ULONG Length)
{
ULONG Sum = 0L;
ULONG *Endptr = Table+((Length+ 3) & ~3) / sizeof(ULONG);
while (Table < EndPtr)
Sum += *Table++;
return Sum;
}
so effectively we're doing only additions and letting bits roll off the end.
Of course the actual semantics is dependent on what C unsigned arithmetic does
so we're relying on that being the same everywhere.
This algorithm was pretty simple in Python until 2.3 when shifts over the end of
ints started going wrong. For some reason we didn't do the obvious and just do
everything in longs and just mask off the upper bits. For some reason (probably
my fault) we seem to have accumulated code like
def _L2U32(L):
'''convert a long to u32'''
return unpack('l',pack ('L',L))[0]
if sys.hexversion> =0x02030000:
def add32(x, y):
"Calculate (x + y) modulo 2**32"
return _L2U32((long(x) +y) & 0xffffffffL)
else:
def add32(x, y):
"Calculate (x + y) modulo 2**32"
lo = (x & 0xFFFF) + (y & 0xFFFF)
hi = (x >16) + (y >16) + (lo >16)
return (hi << 16) | (lo & 0xFFFF)
def calcChecksum(da ta):
"""Calculat es TTF-style checksums"""
if len(data)&3: data = data + (4-(len(data)&3))* "\0"
sum = 0
for n in unpack(">%dl" % (len(data)>>2), data):
sum = add32(sum,n)
return sum
and also silly stuff like
def testAdd32(self) :
"Test add32"
self.assertEqua ls(add32(10, -6), 4)
self.assertEqua ls(add32(6, -10), -4)
self.assertEqua ls(add32(_L2U32 (0x80000000L), -1), 0x7FFFFFFF)
self.assertEqua ls(add32(0x7FFF FFFF, 1), _L2U32(0x800000 00L))
def testChecksum(se lf):
"Test calcChecksum function"
self.assertEqua ls(calcChecksum (""), 0)
self.assertEqua ls(calcChecksum ("\1"), 0x01000000)
self.assertEqua ls(calcChecksum ("\x01\x02\x03\ x04\x10\x20\x30 \x40"), 0x11223344)
self.assertEqua ls(calcChecksum ("\x81"), _L2U32(0x810000 00L))
_L2U32(0x800000 00L))
where while it might be reasonable to do testing it seems the tests aren't very
sensible eg what is -6 doing in a u32 test? This stuff just about works on a 32
bit machine, but is failing miserably on a 64bit AMD. As far as I can see I just
need to use masked longs throughout.
In a C extension I can still do the computation exfficiently on a 32bit machine,
but I need to do masking for a 64 bit machine.
--
Robin Becker
Robin Becker schrieb:
Of course the actual semantics is dependent on what C unsigned
arithmetic does so we're relying on that being the same everywhere.
Assuming that ULONG has the same width on all systems, the outcome
is actually mandated by the C standard: unsigned arithmetic is
defined to operate modulo (max_uint+1) (even if that is not a power
of two).
This algorithm was pretty simple in Python until 2.3 when shifts over
the end of ints started going wrong.
Actually, they start going *right* :-) Addition of two positive numbers
never gives a negative result, in mathematics.
where while it might be reasonable to do testing it seems the tests
aren't very sensible eg what is -6 doing in a u32 test? This stuff just
about works on a 32 bit machine, but is failing miserably on a 64bit
AMD. As far as I can see I just need to use masked longs throughout.
Exactly.
In a C extension I can still do the computation exfficiently on a 32bit
machine, but I need to do masking for a 64 bit machine.
Well, no. You just need to find a 32-bit unsigned integer type on the
64-bit machine. Typically, "unsigned int" should work fine (with
only the Cray being a notable exception, AFAIK). IOW, replace ULONG
with uint32_t wherever you really mean an unsigned 32-bit type,
then use stdint.h where available, else define it to unsigned int
(with a build-time or run-time test whether sizeof(unsigned int)==4).
Regards,
Martin
Robin Becker wrote:
>
ULONG CalcTableChecks um(ULONG *Table, ULONG Length)
{
ULONG Sum = 0L;
ULONG *Endptr = Table+((Length+ 3) & ~3) / sizeof(ULONG);
while (Table < EndPtr)
Sum += *Table++;
return Sum;
}
Is this what you want?
import numpy
def CalcTableChecks um(Table, Length=None):
tmp = numpy.array(Tab le,dtype=numpy. uint32)
if Length == None: Length = tmp.size
endptr = ((Length+3) & ~3) / 4
return (tmp[0:endptr]).sum()
as nx
type(nx.array([1,2,3],dtype=nx.uint3 2)[0])
so effectively we're doing only additions and letting bits roll off the end.
Of course the actual semantics is dependent on what C unsigned arithmetic does
so we're relying on that being the same everywhere.
This algorithm was pretty simple in Python until 2.3 when shifts over the end of
ints started going wrong. For some reason we didn't do the obvious and just do
everything in longs and just mask off the upper bits. For some reason (probably
my fault) we seem to have accumulated code like
def _L2U32(L):
'''convert a long to u32'''
return unpack('l',pack ('L',L))[0]
if sys.hexversion> =0x02030000:
def add32(x, y):
"Calculate (x + y) modulo 2**32"
return _L2U32((long(x) +y) & 0xffffffffL)
else:
def add32(x, y):
"Calculate (x + y) modulo 2**32"
lo = (x & 0xFFFF) + (y & 0xFFFF)
hi = (x >16) + (y >16) + (lo >16)
return (hi << 16) | (lo & 0xFFFF)
def calcChecksum(da ta):
"""Calculat es TTF-style checksums"""
if len(data)&3: data = data + (4-(len(data)&3))* "\0"
sum = 0
for n in unpack(">%dl" % (len(data)>>2), data):
sum = add32(sum,n)
return sum
and also silly stuff like
def testAdd32(self) :
"Test add32"
self.assertEqua ls(add32(10, -6), 4)
self.assertEqua ls(add32(6, -10), -4)
self.assertEqua ls(add32(_L2U32 (0x80000000L), -1), 0x7FFFFFFF)
self.assertEqua ls(add32(0x7FFF FFFF, 1), _L2U32(0x800000 00L))
def testChecksum(se lf):
"Test calcChecksum function"
self.assertEqua ls(calcChecksum (""), 0)
self.assertEqua ls(calcChecksum ("\1"), 0x01000000)
self.assertEqua ls(calcChecksum ("\x01\x02\x03\ x04\x10\x20\x30 \x40"), 0x11223344)
self.assertEqua ls(calcChecksum ("\x81"), _L2U32(0x810000 00L))
_L2U32(0x800000 00L))
where while it might be reasonable to do testing it seems the tests aren't very
sensible eg what is -6 doing in a u32 test? This stuff just about works on a 32
bit machine, but is failing miserably on a 64bit AMD. As far as I can see I just
need to use masked longs throughout.
In a C extension I can still do the computation exfficiently on a 32bit machine,
but I need to do masking for a 64 bit machine.
--
Robin Becker
Robin Becker wrote:
ULONG CalcTableChecks um(ULONG *Table, ULONG Length)
{
ULONG Sum = 0L;
ULONG *Endptr = Table+((Length+ 3) & ~3) / sizeof(ULONG);
while (Table < EndPtr)
Sum += *Table++;
return Sum;
}
Is this what you want?
import numpy
def CalcTableChecks um(Table, Length=None):
tmp = numpy.array(Tab le,dtype=numpy. uint32)
if Length == None: Length = tmp.size
endptr = ((Length+3) & ~3) / 4
return (tmp[0:endptr]).sum()
sturlamolden wrote:
import numpy
def CalcTableChecks um(Table, Length=None):
tmp = numpy.array(Tab le,dtype=numpy. uint32)
if Length == None: Length = tmp.size
endptr = ((Length+3) & ~3) / 4
return (tmp[0:endptr]).sum()
it's probably wonderful, but I don't think I can ask people to add numpy to the
list of requirements for reportlab :)
I used to love its predecessor Numeric, but it was quite large.
--
Robin Becker
Robin Becker wrote:
it's probably wonderful, but I don't think I can ask people to add numpy to the
list of requirements for reportlab :)
Maybe NumPy makes it into the core Python tree one day. At some point
other Python users than die-hard scientists and mathematicans will
realise that for and while loops are the root of all evil when doing
CPU bound operations in an interpreted language. Array slicing and
vectorised statements can be faster by astronomical proportions.
Here is one example: http://tinyurl.com/y79zhc
This statement that required twenty seconds to execute
dim = size(infocbcr);
image = zeros(dim(1), dim(2));
for i = 1:dim(1)
for j = 1:dim(2)
cb = double(infocbcr (i,j,2));
cr = double(infocbcr (i,j,3));
x = [(cb-media_b); (cr-media_r)];
%this gives a mult of 1*2 * 2*2 * 2*1
image(i,j) = exp(-0.5* x'*inv(brcov)* x);
end
end
could be replaced with an equivalent condensed statement that only
required a fraction of a second:
image = reshape(exp(-0.5*sum(((chol( brcov)')\ ...
((reshape(doubl e(infocbcr(:,:, 2:3)),dim(1)*di m(2),2)')...
-repmat([media_b;media_r],1,dim(1)*dim(2 )))).^2)'),dim( 1),dim(2));
This was Matlab, but the same holds for Python and NumPy. The overhead
in the first code sniplet comes from calling the interpreter inside a
tight loop. That is why loops are the root of evilness when doung CPU
bound tasks in an interpreted language. I would think that 9 out of 10
tasks most Python users think require a C extension is actually more
easily solved with NumPy. This is old knowledge from the Matlab
community: even if you think you need a "MEX file" (that is, a C
extension for Matlab), you probably don't. Vectorize and it will be
fast enough.
sturlamolden wrote:
Robin Becker wrote:
>it's probably wonderful, but I don't think I can ask people to add numpy to the list of requirements for reportlab :)
.........
This was Matlab, but the same holds for Python and NumPy. The overhead
in the first code sniplet comes from calling the interpreter inside a
tight loop. That is why loops are the root of evilness when doung CPU
bound tasks in an interpreted language. I would think that 9 out of 10
tasks most Python users think require a C extension is actually more
easily solved with NumPy. This is old knowledge from the Matlab
community: even if you think you need a "MEX file" (that is, a C
extension for Matlab), you probably don't. Vectorize and it will be
fast enough.
I think you're preaching to the converted. The very first serious thing I did in
python involved a generational accounting model calculation that was translated
from matlab into Numeric/python. It ran about 10 times faster than matlab and
about 5 times faster than a matlab compiler.
--
Robin Becker This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: John Harrison |
last post by:
I knew that unsigned integral data types were the cause of scads of mostly
spurious warning messages, but I didn't realise that they were a security
risk too (see here
http://www.securitytracker.com/alerts/2004/Feb/1009067.html). All for one
measly extra bit.
So has the time come for C++ to deprecate unsigned integral types?
john
|
by: Rade |
last post by:
Following a discussion on another thread here... I have tried to understand
what is actually standardized in C++ regarding the representing of integers
(signed and unsigned) and their conversions. The reference should be 3.9.1
(Fundamental types), and 4.7 (Integral conversions).
It seems to me that the Standard doesn't specify:
1) The "value representation" of any of these types, except that (3.9.1/3)
"... The range of nonnegative...
|
by: Peter Ammon |
last post by:
When I add an unsigned long long and an int, what type do each of the
values get promoted to before the addition is performed? What is the
type of the resulting expression? What occurs if the addition overflows
or underflows?
Thanks,
-Peter
|
by: TTroy |
last post by:
Hello, I'm relatively new to C and have gone through more than 4 books
on it. None mentioned anything about integral promotion, arithmetic
conversion, value preserving and unsigned preserving. And K&R2
mentions "signed extension" everywhere.
Reading some old clc posts, I've beginning to realize that these books
are over-generalizing the topic. I am just wondering what the
difference between the following pairs of terms are:
1)...
|
by: LuB |
last post by:
This isn't a C++ question per se ... but rather, I'm posting this bcs I
want the answer from a C++ language perspective. Hope that makes sense.
I was reading Peter van der Linden's "Expert C Programming: Deep C
Secrets" and came across the following statement:
"Avoid unnecessary complexity by minimizing your use of unsigned types.
Specifically, don't use an unsigned type to represent a quantity just
because it will never be negative...
| |
by: luke |
last post by:
Hi everybody,
please, can someone explain me this behaviour.
I have the following piece of code:
long long ll;
unsigned int i = 2;
ll = -1 * i;
printf("%lld\n", ll);
|
by: techie |
last post by:
I have defined a number of unsigned integer types as follows:
typedef unsigned char uint8;
typedef unsigned short uint16;
typedef unsigned int uint32;
typedfe long long uint64;
Is it necessary to explicitly cast from one type of unsigned integer type to
another even though they do so implicitly?
|
by: somenath |
last post by:
Hi All,
I am trying to undestand "Type Conversions" from K&R book.I am not
able to understand the
bellow mentioned text
"Conversion rules are more complicated when unsigned operands are
involved. The problem
is that comparisons between signed and unsigned values are machine-
dependent, because
they depend on the sizes of the various integer types. For example,
suppose that int is 16 bits
|
by: Steven |
last post by:
Hello, everyone!
I find a version of strcpy(), I don't know why it return the unsigned
char value.
Can I change it into return *s1-*s2?
int strcmp(const char *s1, const char *s2)
{
while (*s1 == *s2)
{
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |