473,765 Members | 2,061 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

speed of int vs bool for large matrix

raj
Hi,
I saw it mentioned that "int" is the fastest data-type for use in C
,that is data storage/retrieval would be the fastest if I use int among
the following 4 situations in a 32 bit machine with 4-byte ints:

int m[1000][1000];
bool m[1000][1000]; // assuming I use C++
char m[1000][1000];
unsigned char m[1000][1000];

I need to know if this statement is true. Also, if int is indeed the
fastest, in practice, how slower would the bool matrix be ? What if I
had to dynamically allocate all the 4 cases above - would the same
statements still hold?

Actually I need a large matrix for storing just a binary value (so
bool would be most appropriate) - but speed is my utmost concern, space
required comes next.

Some pointers ?

Thanks in advance.
Rajorshi

Nov 14 '05 #1
16 4693
raj wrote:

Hi,
I saw it mentioned that "int" is the fastest data-type for use in C I need to know if this statement is true.


It's almost true.
This is what the rules of C actually say:

N869
6.2.5 Types
[#5]
A ``plain'' int object has the natural size suggested by the
architecture of the execution environment (large enough to
contain any value in the range INT_MIN to INT_MAX as defined
in the header <limits.h>).

--
pete
Nov 14 '05 #2
raj wrote:
Hi,
I saw it mentioned that "int" is the fastest data-type for use in C
,that is data storage/retrieval would be the fastest if I use int among
the following 4 situations in a 32 bit machine with 4-byte ints:

int m[1000][1000];
bool m[1000][1000]; // assuming I use C++
char m[1000][1000];
unsigned char m[1000][1000];

I need to know if this statement is true. Also, if int is indeed the
fastest, in practice, how slower would the bool matrix be ? What if I
had to dynamically allocate all the 4 cases above - would the same
statements still hold?

Actually I need a large matrix for storing just a binary value (so
bool would be most appropriate) - but speed is my utmost concern, space
required comes next.

Some pointers ?

Thanks in advance.
Rajorshi


If you declare your table as a table of unsigned chars, memory
requitrements will be 1 000 000 bytes, in the other case will be
4 times as much (supposing a 32 bit machine), i.e. there is 4 times
as much data that will fit the Level 1 cache.

Input output requirements will be much more reduced and the
program will be surely much faster.

If you have a purely boolean matrix, I would use a packed
representation with 8 bits/byte, making your matrix just
125K, i.e. 8 times less than if you use the int data type.

Note that reducing space requirements increases speed, since
filling the cache from RAM introduces an enormous amount of wait
states. Memory access go through the bus at 300 MHZ, with a CPU
running at 3 000 MHZ, i.e. a memory access can be 10 times slower
when it doesn't hit the L1 cache.

Of course this analysis implies a sequential access to each
element of the matrix. In other cases where you just access
randomly elements of the matrix, an integer representation would
beat all others since you are NOT limited by I/O.

MAIN POINT
----------
Speculating about speed will never replace actual measurements.

1: Build a test matrix.
2: Simulate the access pattern of your application.
3: Time it using unsigned char, int.

Then you will know for sure.

jacob
Nov 14 '05 #3
"raj" <ra******@fastm ail.fm> wrote:
# Hi,
# I saw it mentioned that "int" is the fastest data-type for use in C

That's wrong. The correct answer is MU. Suppose you're solving PDEs; the fastest
data type is float or double. You can do it in int, using various integers to
hold the exponents and fractions, but it's going to be a lot faster using what
is encoded in the hardware and math libraries. What is fastest depends on the
operations performed on what data in what order on what hardware with what caches
and what kind of virtual memory (if any) management. There's is no universally
correct answer.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
Who's leading this mob?
Nov 14 '05 #4
jacob navia <ja***@jacob.re mcomp.fr> writes:
raj wrote:
I saw it mentioned that "int" is the fastest data-type for use in C
,that is data storage/retrieval would be the fastest if I use int among
the following 4 situations in a 32 bit machine with 4-byte ints:

int m[1000][1000];
bool m[1000][1000]; // assuming I use C++
char m[1000][1000];
unsigned char m[1000][1000];

I need to know if this statement is true. Also, if int is indeed the
fastest, in practice, how slower would the bool matrix be ? What if I
had to dynamically allocate all the 4 cases above - would the same
statements still hold?

Actually I need a large matrix for storing just a binary value (so
bool would be most appropriate) - but speed is my utmost concern, space
required comes next.
[...]
If you declare your table as a table of unsigned chars, memory
requitrements will be 1 000 000 bytes, in the other case will be
4 times as much (supposing a 32 bit machine), i.e. there is 4 times
as much data that will fit the Level 1 cache.
You're assuming there is a "Level 1 cache".
Input output requirements will be much more reduced and the
program will be surely much faster.
What input output requirements? The OP didn't indicate that the
matrix would be read from or written to an external file.
If you have a purely boolean matrix, I would use a packed
representation with 8 bits/byte, making your matrix just
125K, i.e. 8 times less than if you use the int data type.
Which will likely make access to individual elements slower.
Note that reducing space requirements increases speed, since
filling the cache from RAM introduces an enormous amount of wait
states. Memory access go through the bus at 300 MHZ, with a CPU
running at 3 000 MHZ, i.e. a memory access can be 10 times slower
when it doesn't hit the L1 cache.
That's all very system-specific.
Of course this analysis implies a sequential access to each
element of the matrix. In other cases where you just access
randomly elements of the matrix, an integer representation would
beat all others since you are NOT limited by I/O.

MAIN POINT
----------
Speculating about speed will never replace actual measurements.

1: Build a test matrix.
2: Simulate the access pattern of your application.
3: Time it using unsigned char, int.

Then you will know for sure.


You could have safely skipped everything before "MAIN POINT".

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #5
In article <ch********@odb k17.prod.google .com>,
"raj" <ra******@fastm ail.fm> wrote:
Hi,
I saw it mentioned that "int" is the fastest data-type for use in C
,that is data storage/retrieval would be the fastest if I use int among
the following 4 situations in a 32 bit machine with 4-byte ints:

int m[1000][1000];
bool m[1000][1000]; // assuming I use C++
char m[1000][1000];
unsigned char m[1000][1000];

I need to know if this statement is true. Also, if int is indeed the
fastest, in practice, how slower would the bool matrix be ? What if I
had to dynamically allocate all the 4 cases above - would the same
statements still hold?

Actually I need a large matrix for storing just a binary value (so
bool would be most appropriate) - but speed is my utmost concern, space
required comes next.


Why don't you just try it out. Write four functions, one with each kind
of matrix, and then set every element to zero, then to one, and repeat
this thousand times. Measure the time. Observe the difference.
Nov 14 '05 #6
Keith Thompson wrote:
jacob navia <ja***@jacob.re mcomp.fr> writes:

If you declare your table as a table of unsigned chars, memory
requitremen ts will be 1 000 000 bytes, in the other case will be
4 times as much (supposing a 32 bit machine), i.e. there is 4 times
as much data that will fit the Level 1 cache.

You're assuming there is a "Level 1 cache".


I am assuming of course a modern CPU, either a PPC, x86 or similar.
Please lets be realistic: handling 1-4MB matrices in a DSP
with 4K of RAM would be ridiculous...
Input output requirements will be much more reduced and the
program will be surely much faster.

What input output requirements? The OP didn't indicate that the
matrix would be read from or written to an external file.


Input output to/from the CPU of course!
I am not speaking about file I/O but RAM I/O.
If you have a purely boolean matrix, I would use a packed
representatio n with 8 bits/byte, making your matrix just
125K, i.e. 8 times less than if you use the int data type.

Which will likely make access to individual elements slower.


No, this is my point. If you have to read from RAM 8 times less
data, each read producing at least 10-20 wait states, the few
more instructions to access the individual bits would be
compensated by the smaller I/O.
Note that reducing space requirements increases speed, since
filling the cache from RAM introduces an enormous amount of wait
states. Memory access go through the bus at 300 MHZ, with a CPU
running at 3 000 MHZ, i.e. a memory access can be 10 times slower
when it doesn't hit the L1 cache.

That's all very system-specific.


Not really. As the CPUs of modern micro computers go to
ever higher clock speeds, memory doesn't catch up at all,
and I/O from RAM becomes the limiting factor.
Of course this analysis implies a sequential access to each
element of the matrix. In other cases where you just access
randomly elements of the matrix, an integer representation would
beat all others since you are NOT limited by I/O.

MAIN POINT
----------
Speculating about speed will never replace actual measurements.

1: Build a test matrix.
2: Simulate the access pattern of your application.
3: Time it using unsigned char, int.

Then you will know for sure.

You could have safely skipped everything before "MAIN POINT".


No, I wanted to describe alternatives and leave the OP the choice
but of an informed choice.
Nov 14 '05 #7
raj wrote:
int m[1000][1000];
bool m[1000][1000]; // assuming I use C++
char m[1000][1000];
unsigned char m[1000][1000];


Like many people, you make the mistake of assuming that bools occupy a
single bit (also note that C99 includes a bool type much like the C++
one.) There is a difficult-to-predict performance tradeoff here that
will vary from machine to machine. Here's a comparison of the advantages
and disadvantages:

int m[1000][1000];

This stores a single value in each word, consuming at least one million
words. Random access requires only the address calculation plus a single
read/write operation on most machines, with no need for bit operations.
That makes this alternative good for random access on machines with weak
bit-level capabilities. If the array is going to be traversed
sequentially on a machine with a data cache though, it will produce many
more cache misses than other approaches.

char m[1000][1000];

This stores a single value in each byte, so that you only need about
million bytes. If you traverse the array in order on a machine with a
data cache, you'll get a lot less cache misses with this version, and on
byte-addressable machines with instructions to fetch bytes, no bit
operations are needed on top of address calculation. On word-addressable
machines with no special instructions for loading bytes, however, this
may turn out to compile into something as slow as the bit array below.
It also still takes 8 times as much space as the bit array, and has
about 8 times as many cache misses when traversed in order on machines
with data caches.

#define INT_BITS (sizeof(int)*8)
#define CEIL_DIV(x,y) ((x+y-1)/y)
int m[1000][CEIL_DIV(1000,I NT_BITS)];
#define M_GET(r,c) (m[(r)][(c)/INT_BITS] & (1 << ((c) % INT_BITS)))
#define M_SET(r,c) (m[(r)][(c)/INT_BITS] |= (1 << ((c) % INT_BITS)))
#define M_CLEAR(r,c) (m[(r)][(c)/INT_BITS] &= ~(1 << ((c) % INT_BITS)))

This creates an array of bits and some macros for accessing it. Although
the division and modulus operations will probably compile to bit
operations, each random access to a non-constant index pair will require
on most machines at least two ANDs and two shifts that the word version
doesn't. Accesses to fixed or partially fixed indexes are faster, since
it can precompute the index and bit mask (shown here for an INT_BITS of 32):

M_GET(40,65)
(m[(40)][(65)/INT_BITS] & (1 << ((65) % INT_BITS)))
m[40][65/32] & (1 << (65 % 32))
m[40][2] & (1 << 1)
m[40][2] & 2

Sequential access actually can yield very good performance, since you
can do that with a loop like this:

unsigned int current_word, i, j, mask;
for(i=0; i<1000; i++) {
for(j=0; j<CEIL_DIV(1000 , INT_BITS); j++) {
current_word = m[i][j];
for(mask = 1; mask != 0; mask <<= 1) {
int current_bit = current_word & mask; /*get current bit*/
current_word |= mask; /* set the current bit */
current_word &= ~mask; /* clear the current bit */
}
m[i][j] = current_word;
}
}

This amounts to two bit operations and 2/INT_BITS word accesses per
element on average, which is not only often faster than two word
accesses but cache misses on machines with caches are reduced by about
INT_BIT times.

Even if you only do random access, considering that you reduce your
space requirements by INT_BIT times, a few extra cycles per access is
justified in many applications.
--
Derrick Coetzee
I grant this newsgroup posting into the public domain. I disclaim all
express or implied warranty and all liability. I am not a professional.
Nov 14 '05 #8
In article <ch**********@n ews-reader5.wanadoo .fr>, ja***@jacob.rem comp.fr
says...
I am assuming of course a modern CPU, either a PPC, x86 or similar.
Intel is STILL selling Celerons. *sigh*
No, this is my point. If you have to read from RAM 8 times less
data, each read producing at least 10-20 wait states, the few
more instructions to access the individual bits would be
compensated by the smaller I/O.


I don't know that's a given. Hypertransport 4-way Opteron boxes
can move almost 8GB/s to/from memory with current DDR modules when
spinning through 64GB of total memory, using 4GB block move/compares.

I'm not at all sure that 8GB/s (being close to 4X what a Xeon 4-way can
do) is going to care much for typical application read/write sizes
whether you use char's, int's, etc. As usual, it depends on the
specifics, the hardware, etc. There is no general rule for this sort of
thing that isn't so general as to be practically worthless. Measure,
then you'll know.

--
Randy Howard
To reply, remove FOOBAR.
Nov 14 '05 #9
jacob navia <ja***@jacob.re mcomp.fr> writes:
Keith Thompson wrote:
jacob navia <ja***@jacob.re mcomp.fr> writes:
If you declare your table as a table of unsigned chars, memory
requitrements will be 1 000 000 bytes, in the other case will be
4 times as much (supposing a 32 bit machine), i.e. there is 4 times
as much data that will fit the Level 1 cache.

You're assuming there is a "Level 1 cache".


I am assuming of course a modern CPU, either a PPC, x86 or similar.
Please lets be realistic: handling 1-4MB matrices in a DSP
with 4K of RAM would be ridiculous...
Input output requirements will be much more reduced and the
program will be surely much faster.

What input output requirements? The OP didn't indicate that the
matrix would be read from or written to an external file.


Input output to/from the CPU of course!
I am not speaking about file I/O but RAM I/O.


Hmm. I don't think that's what most people mean by "I/O".
If you have a purely boolean matrix, I would use a packed
representati on with 8 bits/byte, making your matrix just
125K, i.e. 8 times less than if you use the int data type.

Which will likely make access to individual elements slower.


No, this is my point. If you have to read from RAM 8 times less
data, each read producing at least 10-20 wait states, the few
more instructions to access the individual bits would be
compensated by the smaller I/O.


If the matrix is read randomly, the code will still have to read
memory for each access, *plus* the overhead of extracting the desired
bit (and similarly for write access).

[...]
MAIN POINT
----------
Speculatin g about speed will never replace actual measurements.

1: Build a test matrix.
2: Simulate the access pattern of your application.
3: Time it using unsigned char, int.

Then you will know for sure.

You could have safely skipped everything before "MAIN POINT".


No, I wanted to describe alternatives and leave the OP the choice
but of an informed choice.


I may have been slightly too harsh. My real complaint is that you
didn't qualify your statements. The amount of cache, if any, is still
system-specific, even if it happens that most modern systems have at
least some particular amount. If you had made that clearer, I
wouldn't have minded as much.

--
Keith Thompson (The_Other_Keit h) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
7588
by: VijaKhara | last post by:
Hi all, Is there any method which can implememt the matrix multiplication faster than using the formula as we often do by hand? I am writing the following code and my matrice: one is 3x40000 and the other one 40000x3. the program runs too slow: /*multiply two matrice: xyz_trans * xyz , the output is w 3x3*/
14
7128
by: James Stroud | last post by:
Hello All, I'm using numpy to calculate determinants of matrices that look like this (13x13):
8
1802
by: pieceofcake | last post by:
Dear all, I am new to Perl. I have a large matrix. The easiest way to describe is it is like a calender where there are days along the top, numbers of weeks down the side, and the numbers of dates in the rest of the table. Imagine this calendar to be populated with different data but with the same concept, but more like 10,000s across and 10,000s down. How would I parse this huge dataset into a hash. I know how to do the smaller...
3
3570
by: Amber | last post by:
We are using 8.2.9 Windows 64 edtion, in one of our projects we need to recreate a few lager tables which have many millions of rows each, we have used concurrent Java threads to read from data source and do batch inserts to the target tables concurrently, to speed up the inserts we have used the following configuration policies: 1. Use large page size (32K) and extend size (256) on the target db, 2. Use large buffer pools for the...
0
9568
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10163
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10007
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9957
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
7379
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6649
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5276
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
3924
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3532
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.