473,466 Members | 1,503 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

how to generate unique Hash Code for string

As MSDN is not giving us guarantee upon uniqueness of Hash Code, so
could any one suggest me that how to generate a unique Hash Code for
same string always, and generate different-2 Hash Code Different-2
string.

Dec 21 '07 #1
15 37541

"Ashish Khandelwal" <AK***************@gmail.comwrote in message
news:e5**********************************@x29g2000 prg.googlegroups.com...
As MSDN is not giving us guarantee upon uniqueness of Hash Code, so
could any one suggest me that how to generate a unique Hash Code for
same string always, and generate different-2 Hash Code Different-2
string.
MSDN can't guarantee this, because it is not possible to give such a
guarantee.

Consider that the size of the hash is an int, and therefore 32 bits only (or
64 bits on a 64-bit system). This means that the contents of a string must
be "squeezed" into those bits, which in turn means a lot of data is lost.
For any given string bigger than the size of the hash, total uniqueness
cannot be guaranteed. I have never run into any problems with GetHashCode(),
but if you have, consider implementing your own MD5 (or whatever you feel
appropriate.)
Dec 21 '07 #2
Thanks for reply.
but if you have, consider implementing your own MD5 (or whatever you feel
appropriate.)
Yes i am facing such problem, for exp. i am giving you "blair" and
"brainlessness" strings, these both strings are returning the same
Hash Code using GetHashCode() method, but as you said, use MD5, so can
you please give me some input related to MD5 like can i reply on the
output of this, and also as you were saying "or whatever you feel
appropriate" so here just same was my question that what should be the
way to get unique Hash Code, any idea...
Dec 21 '07 #3
Ashish Khandelwal <AK***************@gmail.comwrote:
As MSDN is not giving us guarantee upon uniqueness of Hash Code, so
could any one suggest me that how to generate a unique Hash Code for
same string always, and generate different-2 Hash Code Different-2
string.
Sure, so long as you've got an infinite range of numbers... My guess is
that you haven't though.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk
Dec 21 '07 #4
Hi Ashish,

use the Managed MD5 or SHA1 Class to make a unique
hash of your string. See this, it works with file streams,
but the technique is the same:

http://download.chip.eu/de/KHash-Tools-1.0_1317168.html

I wrote this, because i needed a very fast and console based
file hashing tool. There was none simple in www, so i wrote this.
It is open-source,...

Hashes generated with MD5 and or SHA1 are always unique!

Regards

Kerem
P.S: Where does your name come from...i am just curious,...

--
-----------------------
Beste Grüsse / Best regards / Votre bien devoue
Kerem Gümrükcü
Microsoft Live Space: http://kerem-g.spaces.live.com/
Latest Open-Source Projects: http://entwicklung.junetz.de
-----------------------
"This reply is provided as is, without warranty express or implied."
Dec 21 '07 #5
Kerem Gümrükcü wrote:
Hi Ashish,

use the Managed MD5 or SHA1 Class to make a unique
hash of your string. See this, it works with file streams,
but the technique is the same:

http://download.chip.eu/de/KHash-Tools-1.0_1317168.html

I wrote this, because i needed a very fast and console based
file hashing tool. There was none simple in www, so i wrote this.
It is open-source,...

Hashes generated with MD5 and or SHA1 are always unique!
No, they're not

It's just very improbable that you can either find two files with
different content and the same hash, or manage to change a file and keep
its original hash.

But they're not unique.

The only guaranteed way to produce a unique hash for a stream of X bytes
is just to copy the entire stream.
--
Lasse Vågsæther Karlsen
mailto:la***@vkarlsen.no
http://presentationmode.blogspot.com/
Dec 21 '07 #6
Lasse Vågsæther Karlsen wrote:
Kerem Gümrükcü wrote:
>Hashes generated with MD5 and or SHA1 are always unique!

No, they're not

It's just very improbable that you can either find two files with
different content and the same hash, or manage to change a file and keep
its original hash.

But they're not unique.

The only guaranteed way to produce a unique hash for a stream of X bytes
is just to copy the entire stream.

And of course then it wouldn't be named a "hash".

--
Lasse Vågsæther Karlsen
mailto:la***@vkarlsen.no
http://presentationmode.blogspot.com/
Dec 21 '07 #7
Hi Lasse,
>The only guaranteed way to produce a unique hash for a stream of X bytes is
just to copy the entire stream.
and thats what i asume here and why i wrote this!

Regards

Kerem

--
-----------------------
Beste Grüsse / Best regards / Votre bien devoue
Kerem Gümrükcü
Microsoft Live Space: http://kerem-g.spaces.live.com/
Latest Open-Source Projects: http://entwicklung.junetz.de
-----------------------
"This reply is provided as is, without warranty express or implied."
Dec 21 '07 #8

"Kerem Gümrükcü" <ka*******@hotmail.comwrote in message
news:Of**************@TK2MSFTNGP04.phx.gbl...
Hi Ashish,

use the Managed MD5 or SHA1 Class to make a unique
hash of your string. See this, it works with file streams,
but the technique is the same:

http://download.chip.eu/de/KHash-Tools-1.0_1317168.html

I wrote this, because i needed a very fast and console based
file hashing tool. There was none simple in www, so i wrote this.
It is open-source,...

Hashes generated with MD5 and or SHA1 are always unique!
No, they are not always unique, which is why they are called "one-way hash
algorithms". They are not reversible, because data is lost. The problem
still exists, but becomes less frequent (since MD5, for example, uses 128
bits instead of GetHashCode()'s 32 bits on a 32-bit system).

The reason for hash collisions is that n bits cannot be fit into x bits
where n x. Thus, when you hash a string of say 256 bits into 32 bits, 224
bits get lost because there is (obviously) no way to fit 256 bits into 32.
What you can do, however, is combine the 256 bits into the 32 bits in a
certain way (ie, the algorithm) to make it more likely that a unique
bitpattern is produced. This is what the hashing algorithms are all about.

http://en.wikipedia.org/wiki/MD5
>
Regards

Kerem
P.S: Where does your name come from...i am just curious,...

--
-----------------------
Beste Grüsse / Best regards / Votre bien devoue
Kerem Gümrükcü
Microsoft Live Space: http://kerem-g.spaces.live.com/
Latest Open-Source Projects: http://entwicklung.junetz.de
-----------------------
"This reply is provided as is, without warranty express or implied."

Dec 21 '07 #9
Hi Mannen,
>What you can do, however, is combine the 256 bits into the 32 bits in a
certain way (ie, the algorithm) to make it more likely that a unique
bitpattern is produced. This is what the hashing algorithms are all about.
Sure, then you have to implement your own hashing class that works with
MD5 as its base. I am familliar with MD5 and SHA1, i know its "pitfalls",...

Regards

Kerem

--
-----------------------
Beste Grüsse / Best regards / Votre bien devoue
Kerem Gümrükcü
Microsoft Live Space: http://kerem-g.spaces.live.com/
Latest Open-Source Projects: http://entwicklung.junetz.de
-----------------------
"This reply is provided as is, without warranty express or implied."
Dec 21 '07 #10
Kerem Gümrükcü <ka*******@hotmail.comwrote:
The only guaranteed way to produce a unique hash for a stream of X bytesis
just to copy the entire stream.
and thats what i asume here and why i wrote this!
But that goes completely against your other statement of:

"Hashes generated with MD5 and or SHA1 are always unique!"

which is entirely false.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk
Dec 21 '07 #11
On Dec 21, 1:22*pm, Lasse Vågsæther Karlsen <la...@vkarlsen.nowrote:
Kerem Gümrükcü wrote:
Hi Ashish,
use the Managed MD5 or SHA1 Class to make a unique
hash of your string. See this, it works with file streams,
but the technique is the same:
http://download.chip.eu/de/KHash-Tools-1.0_1317168.html
I wrote this, because i needed a very fast and console based
file hashing tool. There was none simple in www, so i wrote this.
It is open-source,...
Hashes generated with MD5 and or SHA1 are always unique!

No, they're not

It's just very improbable that you can either find two files with
different content and the same hash, or manage to change a file and keep
its original hash.
Unfortunately it's becoming more and more "probable" as time goes
on...

http://www.cits.rub.de/MD5Collisions/

...but this sort of meddling is unlikely to significantly change the
OPs best-fit solution
Dec 21 '07 #12
On Fri, 21 Dec 2007 05:22:59 -0800, Lasse Vågsæther Karlsen
<la***@vkarlsen.nowrote:
[...]
The only guaranteed way to produce a unique hash for a stream of X bytes
is just to copy the entire stream.
Pedantically speaking, that's not exactly true. There are a number of
algorithsm that generate unique representations for the same data.
Unique, that is, in the sense that using _a given_ algorithm, different
input data will always produce different output data.

That's why file compression works. In fact, compressing a file is one
obvious way to generate a "unique hash for a stream of X bytes" that is
not in fact a copy of the entire stream.

Of course, both a compressed version of the file and an exact copy of the
file are both not really what we'd call a hash anyway. But if you're
going to accept an exact representation of the file as a hash, you have to
accept any mapping of that representation to some other representation as
a hash as well. :)

Pete
Dec 21 '07 #13
Thanks all for reply...

Now on the same issus,

-----See below code,
string str = "blair";
string strValue = "Ashish";
string str1 = "brainlessness";
string strValue1 = "Khandelwal";
int hash = str.GetHashCode() ; // Returns 175803953
int hash1 = str1.GetHashCode(); // Returns 175803953
Hashtable ht = new Hashtable();
ht.Add(hash ,strValue);
ht.Add(hash1,strValue1); // ****ERROR****
string strTmp = (string) ht[str];
string strTmp1 = (string) ht[hash1];

In Above code when i try to call GetHashCode() for both str and str1,
it returns me same Hash Code '175803953', and that's why when i try to
add into hashtable, exception generates which is normal (i know we
cannot add same key twice). Now.... see below code
string str = "blair";
string strValue = "Ashish";
string str1 = "brainlessness";
string strValue1 = "Khandelwal";
Hashtable ht = new Hashtable();
ht.Add(str,strValue);
ht.Add(str1,strValue1);

the above code runs perfectly without any error, so now here i want to
understand one thing, as HashTable calls GetHashCode() method to get
the Hash Code of passed key and as we show in the 1st example that the
both strings are generating the same Hash Code so why there is no
exception in the 2nd example,

Does HashTable use some other algorithm to generate the Hash Code of
passed key? if so, i think then its always better to assign object
directly as a key in stand of first generate the Hash Code and then
assign it to HashTable as a key.

(My main concentration on String as a Key)

Please help me to understand...
Dec 24 '07 #14
Peter Duniho wrote:
On Fri, 21 Dec 2007 05:22:59 -0800, Lasse Vågsæther Karlsen
<la***@vkarlsen.nowrote:
>[...]
The only guaranteed way to produce a unique hash for a stream of X
bytes is just to copy the entire stream.

Pedantically speaking, that's not exactly true. There are a number of
algorithsm that generate unique representations for the same data.
Unique, that is, in the sense that using _a given_ algorithm, different
input data will always produce different output data.

That's why file compression works. In fact, compressing a file is one
<snip>

You are right of course. Keeping information entropy but reducing the
total size is a good way to produce a smaller, unique, version of the
original data.

Though, the definition of a hash function is that it typically produces
a finite and fixed range of output bits for an infinite (or rather,
unfixed) amount of input bits. As such, compression is not really
hashing, but yes, it can produce something that is unique, but occupy
fewer bits than the original data.

--
Lasse Vågsæther Karlsen
mailto:la***@vkarlsen.no
http://presentationmode.blogspot.com/
Jan 3 '08 #15
On Thu, 03 Jan 2008 02:38:17 -0800, Lasse Vågsæther Karlsen
<la***@vkarlsen.nowrote:
Peter Duniho wrote:
>On Fri, 21 Dec 2007 05:22:59 -0800, Lasse Vågsæther Karlsen
<la***@vkarlsen.nowrote:
>>[...]
The only guaranteed way to produce a unique hash for a stream of X
bytes is just to copy the entire stream.

[...]
That's why file compression works. In fact, compressing a file is one
<snip>

[...]
Though, the definition of a hash function is that it typically produces
a finite and fixed range of output bits for an infinite (or rather,
unfixed) amount of input bits. As such, compression is not really hashing
Neither is "to copy the entire stream", per your suggestion. :)

Which is why I wrote "if you're going to accept an exact representation of
the file as a hash, you have to accept any mapping of that representation
to some other representation as a hash as well". Suggesting that
compression is a valid hash algorithm is only correct in a context where
one assumes your suggestion is valid as well; it's not actually a context
I personally think is worth considering, but I was happy to accept it on
your behalf for the sake of discussion.

I'm more than happy to agree that neither "copy the entire stream" (your
suggestion ) or "compress the stream" (my alternative suggestion) is
actually a hash algorithm. :)

Pete
Jan 3 '08 #16

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: Mamuninfo | last post by:
Hello, Have any function in the DB2 database that can generate unique id for each string like oracle, mysql,sybase,sqlserver database. In mysql:- select md5(concat_ws("Row name")) from...
1
by: hikums | last post by:
I am posting this here, just in case anyone may need this. Step 1: CREATE SEQUENCE ID_SEQ START WITH 1050000 INCREMENT BY 1 MAXVALUE 9999999 NO CYCLE NO CACHE ORDER
29
by: Lauren Wilson | last post by:
Does anyone know how the following info is extracted from the user's computer by a Front Page form? HTTP User Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107...
2
by: Breda Photo Fair | last post by:
I am looking voor a database software tool which automatically generates unique barcodes to each contact. I should be able to manage the contacts by reading the barcode. The tools will be used to...
1
by: BobAchgill | last post by:
How can I make a hash code that I can use as a unique Windows file name.
8
by: mortb | last post by:
Hi, How do I write a GenerateHashcode function that will generate guaranteed unique hashcodes for my classes? cheers, mortb
16
by: Mark S. | last post by:
I'm a fan of the GUID, but the current project is looking to use a genuinely unique integer. Does the following do that? Math.Abs(System.Guid.NewGuid().GetHashCode()) TIA
23
by: raylopez99 | last post by:
A quick sanity check, and I think I am correct, but just to make sure: if you have a bunch of objects that are very much like one another you can uniquely track them simply by using an ArrayList...
6
by: er | last post by:
hi, here's why i'm trying to do: header1.hpp namespace{ struct A{};} struct B1{ A a; }; header2.hpp namespace{ struct A{};}
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.