Are there any techniques I can use to strip a dictionary dataYou could use a different hash algorithm yielding a smaller value (crc32,
structure down to the smallest memory overhead possible?
I'm working on a project where my available RAM is limited to 2G
and I would like to use very large dictionaries vs. a traditional
database.
Background: I'm trying to identify duplicate records in very
large text based transaction logs. I'm detecting duplicate
records by creating a SHA1 checksum of each record and using this
checksum as a dictionary key. This works great except for several
files whose size is such that their associated checksum
dictionaries are too big for my workstation's 2G of RAM.
by example, fits on an integer). At the expense of having more collisions,
and more processing time to check those possible duplicates.
--
Gabriel Genellina