473,769 Members | 1,730 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Generating a Hash

Is it possible to hash a 100 bytes string to a integer? I found a few .NET
classes for that
such as Sha1Managed.Com puteHash but they return bytes. I am just not sure
about the idea of converting 100 bytes to four or eight without loosing
uniqueness.

The issue has come up because I am storing bills with customers in database
and I would like to reuse customers, so that not every bill has its own
customer. In order to do that I need to make a some sort of unique code for
each customer based on name, address, city, state, zip. I want to use the
whole customer name, because very often there are customers in the same city
with the name only different in the last few characters.

Thanks,

-Stan
Jul 21 '05 #1
6 3109
Stan <no****@yahoo.c om> wrote:
Is it possible to hash a 100 bytes string to a integer? I found a few .NET
classes for that such as Sha1Managed.Com puteHash but they return bytes.
Sure, but you can convert 4 bytes into an integer or 8 bytes into a
long pretty easily.
I am just not sure about the idea of converting 100 bytes to four or
eight without loosing uniqueness.
Well obviously you can't do that - there are far more sequences of 100
bytes than there are of 4 or 8.
The issue has come up because I am storing bills with customers in database
and I would like to reuse customers, so that not every bill has its own
customer. In order to do that I need to make a some sort of unique code for
each customer based on name, address, city, state, zip. I want to use the
whole customer name, because very often there are customers in the same city
with the name only different in the last few characters.


Rather than assume the hash code itself is identical, just assign each
customer a unique ID and look it up based on name, address, city, state
and zip when you need to retrieve it.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #2
> Rather than assume the hash code itself is identical, just assign each
customer a unique ID and look it up based on name, address, city, state
and zip when you need to retrieve it.
Then I will have this query:

select * ..... from ... where name = @name and address = @address and city
= @city
and state = @state and zip = @zip

It is by far more efficient to have

select * ..... from ... where code = @code

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Jul 21 '05 #3
In article <uG************ **@TK2MSFTNGP11 .phx.gbl>, no****@yahoo.co m
says...
select * ..... from ... where name = @name and address = @address and city
= @city
and state = @state and zip = @zip

It is by far more efficient to have

select * ..... from ... where code = @code


I don't think you quite understand what a hash is, Stan. Hashes are not
guaranteed to be unique. They're just a way of localizing sparse data.
You *always* have to check for collisions with a hash.

As Jon mentioned, how could you possibly generate unique 8 (or 4) byte
values for each possible value of a 100-byte string? Think about it.

Why not look up "hashing with linear probing" to see a possible solution
for your problem.

-- Rick

Jul 21 '05 #4
Stan <no****@yahoo.c om> wrote:
Rather than assume the hash code itself is identical, just assign each
customer a unique ID and look it up based on name, address, city, state
and zip when you need to retrieve it.


Then I will have this query:

select * ..... from ... where name = @name and address = @address and city
= @city
and state = @state and zip = @zip

It is by far more efficient to have

select * ..... from ... where code = @code


Sure - if you don't mind the fact that your code won't necessarily be
unique...

Of course, it's *unlikely* that you'll get a hash collision, if you
only have a few thousand entries - but that may not be good enough.

(What you could do is search by hash and then verify each field
separately, of course.)

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #5
> I don't think you quite understand what a hash is, Stan. Hashes are not
guaranteed to be unique. They're just a way of localizing sparse data.
You *always* have to check for collisions with a hash.


Yes, I thought hash is guaranteed to be unique - similar to when NT encrypts
user's passwords and stores them as hash...

What I probably need is not hashing but compressing or compacting
name+address+ci ty+
state+zip. Even without spaces I end up with 100-150 characters... There is
got to be some algoritms that do that (similar to ZIP, ARJ, etc)...
Jul 21 '05 #6
Stan <no****@yahoo.c om> wrote:
I don't think you quite understand what a hash is, Stan. Hashes are not
guaranteed to be unique. They're just a way of localizing sparse data.
You *always* have to check for collisions with a hash.
Yes, I thought hash is guaranteed to be unique - similar to when NT encrypts
user's passwords and stores them as hash...


That doesn't guarantee it to be unique, I rather suspect. One way
hashes like that are basically used so that an attacker has a *very,
very small* chance of getting access without having the right password,
and the password itself doesn't need to be stored in plain text.
What I probably need is not hashing but compressing or compacting
name+address+ci ty+
state+zip. Even without spaces I end up with 100-150 characters... There is
got to be some algoritms that do that (similar to ZIP, ARJ, etc)...


Well, hashing would be a good start, if you wanted something small to
search on: write a hash into your database (and make sure it's up to
date!) but having retrieved results by hashcode, check that you get the
right record (by the individual fields) before doing anything else.

Note that although compression algorithms like zip etc will *usually*
save space, there's no guarantee that they will - and there *can't* be,
for exactly the same reason you can't get a unique hash when you're
going from x bytes to y bytes and y is smaller than x.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
3044
by: Derrick | last post by:
I need to generate an MD5 hash in ASP, not ASP.NET, and have that hash be the same as what is produced by the .NET MD5CryptoServiceProvider, any ideas? I found a few examples that return MD5 hash as hex string, how can I resolve that with the byte that .NET MD5 implementation gives me? Thanks in advance! Derrick
0
1359
by: Bshowers | last post by:
I need to generate an MD5 hash in ASP, not ASP.NET, and have that hash be the same as what is produced by the .NET MD5CryptoServiceProvider. This is the vb.net code I am using... Function GenerateMD5Hash(ByVal SourceText As String) As String 'Create an encoding object to ensure the encoding standard for the source text Dim Ue As New UnicodeEncoding 'Retrieve a byte array based on the source text Dim ByteSourceText() As Byte =...
7
7288
by: eric.gagnon | last post by:
In a program randomly generating 10 000 000 alphanumeric codes of 16 characters in length (Ex.: "ZAZAZAZAZAZAZ156"), what would be an efficient way to ensure that I do not generate duplicates? STL set, map? Could you give me a little code example? Thank you.
3
2050
by: Peter Fox | last post by:
The recent thread in generating PINs reminded me: Suppose I want to give someone a random password or ID then this is what I do: (1) Generate a hash, eg. MD5 form something, possibly a random number. This gives a string in hex. (2) Truncate it to the required number of characters THEN (3) Substitute the characters 0,1,5,8,B,C with eg h,k,p,r,t,w,x,y.
1
1144
by: Mauricio Correa L. | last post by:
Hello, I was proving algorithms to generate hash from string (MD5 32 characters), but those that I have found return hash to very long (because the user must work with this number, does not have to be very long), exists some form to generate a hash with length 16 characters? greetings and thanks
6
286
by: Stan | last post by:
Is it possible to hash a 100 bytes string to a integer? I found a few .NET classes for that such as Sha1Managed.ComputeHash but they return bytes. I am just not sure about the idea of converting 100 bytes to four or eight without loosing uniqueness. The issue has come up because I am storing bills with customers in database and I would like to reuse customers, so that not every bill has its own customer. In order to do that I need to...
6
5913
by: Intiha | last post by:
Hello all, I am trying to generate random seeds for my simulations. currently i was using srand(time(NULL); for this purpose. But for confidence in my results i ran it using a script in a loop. Since the time b/w execution is very similar, many simulation runs resulted in exact same results. Is there a better way of seeding the random number generator in c/c++
2
2776
by: Simon Wittber | last post by:
I'm building a web application using sqlalchemy in my db layer. Some of the tables require single integer primary keys which might be exposed in some parts of the web interface. If users can guess the next key in a sequence, it might be possible for them to 'game' or manipulate the system in unexpected ways. I want to avoid this by generating a random key for each row ID, and have decided to use the same approach for all my single key...
5
2082
by: lavu | last post by:
I am trying to provide some security to text files, by adding a signature at the end of each text file. this signature needs to be generated by some kind of hashing algorithm. so while sending the file, i will sign the text file and at the receiving end, the text file will be checked to make sure that the hash code matches. ( not that it matters for this but my environment is c++ in vs.net2003) any ideas about what kind of hash...
0
9589
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9423
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10211
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
9994
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9863
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
8870
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5298
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5447
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
2815
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.