By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
429,301 Members | 3,575 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 429,301 IT Pros & Developers. It's quick & easy.

Calculating CRC32 for uploaded files

P: n/a
Hi,

I'm working on a file upload script. I need to calculate the CRC32 of the
file(s) which are successfully uploaded. How can I do this? PHP only have
CRC32 function for strings. However, the uploaded file(s) are mostly
binaries, and assumed have large size (5-12 MB per file).

Are there any ways other than CRC32 (which supported by default PHP
installation) to generate a unique hash of an arbitrary files? (the users
of my script are assumed have no knowledge of unique file hash, so I can't
depend on them to generate them prior upload)

TIA
Jul 17 '05 #1
Share this Question
Share on Google+
8 Replies


P: n/a
Ricky Romaya wrote:

Are there any ways other than CRC32 (which supported by default PHP
installation) to generate a unique hash of an arbitrary files? (the users
of my script are assumed have no knowledge of unique file hash, so I can't
depend on them to generate them prior upload)

There are other hash functions, but for files of this size you'd be better
shelling out and running a program specifically designed for the function.

crc32 is hardly the cutting edge of file hashes. MD5 works quite well and is
supported by must systems (and free source code is available)

HTH

C.

Jul 17 '05 #2

P: n/a
Colin McKinnon <co**************@andthis.mms3.com> wrote in
news:ck*******************@news.demon.co.uk:
There are other hash functions, but for files of this size you'd be
better shelling out and running a program specifically designed for
the function.
Uh, the problem is I don't own the server and used a webhosting instead. If
only I could shell out and run a file hashing program, my life will be
easier.
crc32 is hardly the cutting edge of file hashes. MD5 works quite well
and is supported by must systems (and free source code is available)

Hmm, care to elaborate about what is the 'cutting edge' file hashes
algorithm?

TIA
Jul 17 '05 #3

P: n/a

"Colin McKinnon" <co**************@andthis.mms3.com> wrote in message
news:ck*******************@news.demon.co.uk...
crc32 is hardly the cutting edge of file hashes. MD5 works quite well and is supported by must systems (and free source code is available)


Plus there's md5_file(), so you don't have to load the entire file into
memory to calculating the hash.
Jul 17 '05 #4

P: n/a
In article <Xn********************************@66.250.146.159 >,
Ricky Romaya <so*******@somewhere.com> wrote:
Colin McKinnon <co**************@andthis.mms3.com> wrote in
news:ck*******************@news.demon.co.uk:
There are other hash functions, but for files of this size you'd be
better shelling out and running a program specifically designed for
the function.

Uh, the problem is I don't own the server and used a webhosting instead. If
only I could shell out and run a file hashing program, my life will be
easier.
crc32 is hardly the cutting edge of file hashes. MD5 works quite well
and is supported by must systems (and free source code is available)

Hmm, care to elaborate about what is the 'cutting edge' file hashes
algorithm?

TIA


Aren't we a lazy-ass bum this afternoon...

Doing a simple Goggle search on CRC32 and MD5 gives some choice hits:

http://us4.php.net/crc32 (this what the OP originally ask for)
http://www.freesoft.org/CIE/RFC/1510/78.htm
http://www.kb.cert.org/vuls/id/945216

http://us4.php.net/md5 (use to calculate md5 on a file)
http://userpages.umbc.edu/~mabzug1/cs/md5/md5.html
http://www.faqs.org/rfcs/rfc1321.html

Basically, crc32 hashes aren't unique while md5 hashes are. SUN offers
md5 checksums of all the files in the Solaris distributions as a
'fingerprint' to verify if a file is authentic. That way a sysadmin can
verify if the "ls" or "ps" they're using is the original from SUN.

--
DeeDee, don't press that button! DeeDee! NO! Dee...

Jul 17 '05 #5

P: n/a
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Michael Vilain wrote:

[snip]
Basically, crc32 hashes aren't unique while md5 hashes are. SUN
offers md5 checksums of all the files in the Solaris distributions
as a
'fingerprint' to verify if a file is authentic. That way a sysadmin
can verify if the "ls" or "ps" they're using is the original from
SUN.


Hi,
I'm sorry, but MD5 hashes are *not* unique. An MD5 hash is 128 bits
long; therefore, for any input length > 128 bits, there must be at
*least* two possible inputs which produce the same output. For the
given file lengths measured in megabytes, there would be an immense
number of possible inputs that give the same output: the only thing
is, it's relatively difficult to arbitrarily *find* another file with
the same MD5 as a given input. They do exist, however, as a little
math demonstrates:

Number of possible MD5 hashes=2^128=3.4028236692093846e+38
Number of possible 1 kilobit files=2^1024=1.7976931348623159e+308

where ^ means "to the power of"

As you see, if the input is only a kilobit long, there are *immensely*
more possible inputs than possible outputs. Since every possible
input is mapped to some output, obviously multiple inputs must be
mapped to the same output. This is called a "hash collision". As far
as I know, MD5 is not perfectly secure about this (these are just
news items I read recently, I didn't look in detail at the subject);
however, a more secure hash, such as SHA-1, although obviously still
suffering from the *existence* of hash collisions, makes *looking*
for them very difficult (i.e. you just have to try every possible
input until you get a collision).

Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBZgl/gxSrXuMbw1YRAtoWAJkBV342ESDMMhRmcJ28QX/wmUweUwCg+HI8
irJmD8Aelju4mJwxXN586Xo=
=d+rO
-----END PGP SIGNATURE-----
Jul 17 '05 #6

P: n/a
Regarding this well-known quote, often attributed to Chris's famous "Fri,
08 Oct 2004 03:29:03 GMT" speech:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Michael Vilain wrote:

[snip]
Basically, crc32 hashes aren't unique while md5 hashes are. SUN
offers md5 checksums of all the files in the Solaris distributions
as a
'fingerprint' to verify if a file is authentic. That way a sysadmin
can verify if the "ls" or "ps" they're using is the original from
SUN.


Hi,
I'm sorry, but MD5 hashes are *not* unique. An MD5 hash is 128 bits
long; therefore, for any input length > 128 bits, there must be at
*least* two possible inputs which produce the same output. For the
given file lengths measured in megabytes, there would be an immense
number of possible inputs that give the same output: the only thing
is, it's relatively difficult to arbitrarily *find* another file with
the same MD5 as a given input. They do exist, however, as a little
math demonstrates:

(snipped: big files/small hashes--some will be the same)


But the idea, IIRC, is that although there may be collisions, the chance of
two *legible* inputs with the same MD5 are immensely small. Most collisions
will just be one intelligible value, and one with unusable garbage. Hence,
MD5's usefulness in calculating file integrity (it would be very difficult,
and quite detectable, to inject malware into a file and keep the MD5), and
its dubious state as a password-security mechanism (since a password needs
to be legible in no other way except to pass the MD5 check).

--
-- Rudy Fleminger
-- sp@mmers.and.evil.ones.will.bow-down-to.us
(put "Hey!" in the Subject line for priority processing!)
-- http://www.pixelsaredead.com
Jul 17 '05 #7

P: n/a
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

FLEB wrote:

<snip>
But the idea, IIRC, is that although there may be collisions, the
chance of two *legible* inputs with the same MD5 are immensely
small. Most collisions will just be one intelligible value, and one
with unusable garbage. Hence, MD5's usefulness in calculating file
integrity (it would be very difficult, and quite detectable, to
inject malware into a file and keep the MD5), and its dubious state
as a password-security mechanism (since a password needs to be
legible in no other way except to pass the MD5 check).


Agreed. Of course, as I read somewhere (and makes a lot of sense), all
you *need* is gibberish that passes the MD5 test if it's an MD5
validating a BIOS flash. Kind of doesn't matter *what* you put there
if all you want to do is break the computer, does it? (I know BIOS
chips can be replaced; however, this would be a major undertaking for
many people who just might possibly consider flashing their BIOSes).

Of course, for uploading plain old files to a server, it's probably
excellent - and besides, the original purpose was to make sure the
file wasn't *accidentally* changed, for which it's excellent.

Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBZ5PxgxSrXuMbw1YRAiLKAKCfyvkn4pXWMhjCWWwc1K aNWqZi5wCggkTt
C8U8/ToYrvsL+6CHgq8JIz0=
=3vRO
-----END PGP SIGNATURE-----
Jul 17 '05 #8

P: n/a
"Chung Leong" <ch***********@hotmail.com> wrote in
news:UL********************@comcast.com:

Plus there's md5_file(), so you don't have to load the entire file
into memory to calculating the hash.

Finally, someone answers my original question. Thx for the pointer. Why
didn't I see it when scouring the manual.
Jul 17 '05 #9

This discussion thread is closed

Replies have been disabled for this discussion.