Martin v. Löwis wrote:
0 the ideal hash
:)
can't be argued with
>.......
So: what are your input data, and what is the
distribution among them?
Regards,
Martin
I'm trying to create UniqueID's for dynamic postscript fonts. According to my
resources we don't actually need to use these, but if they are required by a
particular postscript program (perhaps to make a print run efficient) then the
private range of these ID's is 4000000<=UID<=4999999 ie a range of one million.
So I probably really need an 18 bit hash
The data going into the font consists of
fontBBox '[-415 -431 2014 2033]'
charmaps ['dup (\000) 0 get /C0 put',......]
metrics ['/C0 1251 def',.....]
bboxes ['/C29 [0 0 512 0] def',.......]
chardefs ['/C0 {newpath 224 418 m 234 336 ......def}',......]
ie a bunch of lists of strings which are eventually joined together and written
out with a template to make the postscript definition.
The UniqueID is used by PS interpreters to avoid recreating particular glyphs so
ideally I would number these fonts sequentially using a global count, but in
practice several processes separated by application and time can produce
postscript which eventually gets merged back together.
If the UID's clash then the printer produces very strange output.
I'm fairly sure there's no obvious python way to ensure the separated processes
can communicate except via the printer. So either I use a python based scheme
which reduces the risk of clashes ie random or some data based hash scheme or I
attempt to produce a postscript solution like looking for a private global
sequence number.
I'm not sure my postscript is really good enough to do the latter so I hoped to
pursue a python based approach which has a low probability of busting.
Originally I thought the range was a 16bit number which is why I started with
16bit hashes.
--
Robin Becker