Note up front: the issues here, while they crop up on any system, have
pretty little to do with C per se, so you should rather take this to a
group dedicated to programming under Debian (Note that it's a name, thus
the capital letter, and that I'm assuming you are using Debian/Linux).
Lastly, asking about C++ in a C newsgroup shows that that you actually
didn't take the time to get familiar with Usenet behaviour of first finding
out what a group is about and what is considered on-topic there (search
for "Usenet etiquette"). Please do that before further postings to the
Usenet.
Xiaoning He wrote:
hi currently i'm using a crawler called larbin to get some pages, it
hashes each url to an integer. [...] Currently i set the hash value to be
a 31-bit integer, which is used to be the index of a bit string in
memory, thus needs 256MB memory.
Okay, so you have a bit for every 31-bit hash value that tells you if the
URL was e.g. already visited. While this works, this actually fails when
you have hash collisions. Depending on the way things are used, using a
hash map (or maybe even just a std::map<>, but that requires C++ which is
not the topic here) which works correctly even with hash collisions allows
this to work, though the overhead is bigger. I'd use that, until my
requirements actually say that the overhead is too large and that
collisions don't matter.
Now for a completely unrelated topic...
i have 4GB+ free memory, so my question is can i call for 4GB memory in
my program in c++? or sixteen 256MB arrays. What about 8GB??
The amount of allocatable memory depends on the available virtual address
space. For 32 bit Linux systems that is at most 2 or 3 GiB (I'm not sure
which, but I think it depends on some settings with which the kernel was
compiled), on 64 bit systems it is much larger. Note that not all this
memory must be backed up by RAM, so the amount of available RAM will not
affect if this works (rather, RAM+swap set the limit) but it will affect
how well it performs. Lastly, the amount of available contiguous memory
might be limited by the fragmentation of the virtual address space. This is
independent of the programming language but rather depends on the operating
system, so it's off topic here.
Uli