browse: forums | FAQ
Connecting Tech Pros Worldwide

Hey there! Do you need PHP help?

Get answers from our community of PHP experts on BYTES! It's free.

Calculating CRC32 for uploaded files

Ricky Romaya
Guest
 
Posts: n/a
#1: Jul 17 '05
Hi,

I'm working on a file upload script. I need to calculate the CRC32 of the
file(s) which are successfully uploaded. How can I do this? PHP only have
CRC32 function for strings. However, the uploaded file(s) are mostly
binaries, and assumed have large size (5-12 MB per file).

Are there any ways other than CRC32 (which supported by default PHP
installation) to generate a unique hash of an arbitrary files? (the users
of my script are assumed have no knowledge of unique file hash, so I can't
depend on them to generate them prior upload)

TIA



Colin McKinnon
Guest
 
Posts: n/a
#2: Jul 17 '05

re: Calculating CRC32 for uploaded files


Ricky Romaya wrote:[color=blue]
>
> Are there any ways other than CRC32 (which supported by default PHP
> installation) to generate a unique hash of an arbitrary files? (the users
> of my script are assumed have no knowledge of unique file hash, so I can't
> depend on them to generate them prior upload)
>[/color]
There are other hash functions, but for files of this size you'd be better
shelling out and running a program specifically designed for the function.

crc32 is hardly the cutting edge of file hashes. MD5 works quite well and is
supported by must systems (and free source code is available)

HTH

C.

Ricky Romaya
Guest
 
Posts: n/a
#3: Jul 17 '05

re: Calculating CRC32 for uploaded files


Colin McKinnon <colin.deletethis@andthis.mms3.com> wrote in
news:ck2uel$l5m$1$8300dec7@news.demon.co.uk:
[color=blue]
> There are other hash functions, but for files of this size you'd be
> better shelling out and running a program specifically designed for
> the function.
>[/color]
Uh, the problem is I don't own the server and used a webhosting instead. If
only I could shell out and run a file hashing program, my life will be
easier.
[color=blue]
> crc32 is hardly the cutting edge of file hashes. MD5 works quite well
> and is supported by must systems (and free source code is available)
>[/color]
Hmm, care to elaborate about what is the 'cutting edge' file hashes
algorithm?

TIA
Chung Leong
Guest
 
Posts: n/a
#4: Jul 17 '05

re: Calculating CRC32 for uploaded files



"Colin McKinnon" <colin.deletethis@andthis.mms3.com> wrote in message
news:ck2uel$l5m$1$8300dec7@news.demon.co.uk...[color=blue]
> crc32 is hardly the cutting edge of file hashes. MD5 works quite well and[/color]
is[color=blue]
> supported by must systems (and free source code is available)[/color]

Plus there's md5_file(), so you don't have to load the entire file into
memory to calculating the hash.


Michael Vilain
Guest
 
Posts: n/a
#5: Jul 17 '05

re: Calculating CRC32 for uploaded files


In article <Xns957C30E996E19rickyralexandriacc@66.250.146.159 >,
Ricky Romaya <something@somewhere.com> wrote:
[color=blue]
> Colin McKinnon <colin.deletethis@andthis.mms3.com> wrote in
> news:ck2uel$l5m$1$8300dec7@news.demon.co.uk:
>[color=green]
> > There are other hash functions, but for files of this size you'd be
> > better shelling out and running a program specifically designed for
> > the function.
> >[/color]
> Uh, the problem is I don't own the server and used a webhosting instead. If
> only I could shell out and run a file hashing program, my life will be
> easier.
>[color=green]
> > crc32 is hardly the cutting edge of file hashes. MD5 works quite well
> > and is supported by must systems (and free source code is available)
> >[/color]
> Hmm, care to elaborate about what is the 'cutting edge' file hashes
> algorithm?
>
> TIA[/color]

Aren't we a lazy-ass bum this afternoon...

Doing a simple Goggle search on CRC32 and MD5 gives some choice hits:

http://us4.php.net/crc32 (this what the OP originally ask for)
http://www.freesoft.org/CIE/RFC/1510/78.htm
http://www.kb.cert.org/vuls/id/945216

http://us4.php.net/md5 (use to calculate md5 on a file)
http://userpages.umbc.edu/~mabzug1/cs/md5/md5.html
http://www.faqs.org/rfcs/rfc1321.html

Basically, crc32 hashes aren't unique while md5 hashes are. SUN offers
md5 checksums of all the files in the Solaris distributions as a
'fingerprint' to verify if a file is authentic. That way a sysadmin can
verify if the "ls" or "ps" they're using is the original from SUN.

--
DeeDee, don't press that button! DeeDee! NO! Dee...



Chris
Guest
 
Posts: n/a
#6: Jul 17 '05

re: Calculating CRC32 for uploaded files


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Michael Vilain wrote:

[snip][color=blue]
> Basically, crc32 hashes aren't unique while md5 hashes are. SUN
> offers md5 checksums of all the files in the Solaris distributions
> as a
> 'fingerprint' to verify if a file is authentic. That way a sysadmin
> can verify if the "ls" or "ps" they're using is the original from
> SUN.
>[/color]

Hi,
I'm sorry, but MD5 hashes are *not* unique. An MD5 hash is 128 bits
long; therefore, for any input length > 128 bits, there must be at
*least* two possible inputs which produce the same output. For the
given file lengths measured in megabytes, there would be an immense
number of possible inputs that give the same output: the only thing
is, it's relatively difficult to arbitrarily *find* another file with
the same MD5 as a given input. They do exist, however, as a little
math demonstrates:

Number of possible MD5 hashes=2^128=3.4028236692093846e+38
Number of possible 1 kilobit files=2^1024=1.7976931348623159e+308

where ^ means "to the power of"

As you see, if the input is only a kilobit long, there are *immensely*
more possible inputs than possible outputs. Since every possible
input is mapped to some output, obviously multiple inputs must be
mapped to the same output. This is called a "hash collision". As far
as I know, MD5 is not perfectly secure about this (these are just
news items I read recently, I didn't look in detail at the subject);
however, a more secure hash, such as SHA-1, although obviously still
suffering from the *existence* of hash collisions, makes *looking*
for them very difficult (i.e. you just have to try every possible
input until you get a collision).

Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBZgl/gxSrXuMbw1YRAtoWAJkBV342ESDMMhRmcJ28QX/wmUweUwCg+HI8
irJmD8Aelju4mJwxXN586Xo=
=d+rO
-----END PGP SIGNATURE-----
FLEB
Guest
 
Posts: n/a
#7: Jul 17 '05

re: Calculating CRC32 for uploaded files


Regarding this well-known quote, often attributed to Chris's famous "Fri,
08 Oct 2004 03:29:03 GMT" speech:
[color=blue]
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Michael Vilain wrote:
>
> [snip][color=green]
>> Basically, crc32 hashes aren't unique while md5 hashes are. SUN
>> offers md5 checksums of all the files in the Solaris distributions
>> as a
>> 'fingerprint' to verify if a file is authentic. That way a sysadmin
>> can verify if the "ls" or "ps" they're using is the original from
>> SUN.
>>[/color]
>
> Hi,
> I'm sorry, but MD5 hashes are *not* unique. An MD5 hash is 128 bits
> long; therefore, for any input length > 128 bits, there must be at
> *least* two possible inputs which produce the same output. For the
> given file lengths measured in megabytes, there would be an immense
> number of possible inputs that give the same output: the only thing
> is, it's relatively difficult to arbitrarily *find* another file with
> the same MD5 as a given input. They do exist, however, as a little
> math demonstrates:
>
> (snipped: big files/small hashes--some will be the same)[/color]

But the idea, IIRC, is that although there may be collisions, the chance of
two *legible* inputs with the same MD5 are immensely small. Most collisions
will just be one intelligible value, and one with unusable garbage. Hence,
MD5's usefulness in calculating file integrity (it would be very difficult,
and quite detectable, to inject malware into a file and keep the MD5), and
its dubious state as a password-security mechanism (since a password needs
to be legible in no other way except to pass the MD5 check).

--
-- Rudy Fleminger
-- sp@mmers.and.evil.ones.will.bow-down-to.us
(put "Hey!" in the Subject line for priority processing!)
-- http://www.pixelsaredead.com
Chris
Guest
 
Posts: n/a
#8: Jul 17 '05

re: Calculating CRC32 for uploaded files


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

FLEB wrote:

<snip>[color=blue]
> But the idea, IIRC, is that although there may be collisions, the
> chance of two *legible* inputs with the same MD5 are immensely
> small. Most collisions will just be one intelligible value, and one
> with unusable garbage. Hence, MD5's usefulness in calculating file
> integrity (it would be very difficult, and quite detectable, to
> inject malware into a file and keep the MD5), and its dubious state
> as a password-security mechanism (since a password needs to be
> legible in no other way except to pass the MD5 check).
>[/color]

Agreed. Of course, as I read somewhere (and makes a lot of sense), all
you *need* is gibberish that passes the MD5 test if it's an MD5
validating a BIOS flash. Kind of doesn't matter *what* you put there
if all you want to do is break the computer, does it? (I know BIOS
chips can be replaced; however, this would be a major undertaking for
many people who just might possibly consider flashing their BIOSes).

Of course, for uploading plain old files to a server, it's probably
excellent - and besides, the original purpose was to make sure the
file wasn't *accidentally* changed, for which it's excellent.

Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBZ5PxgxSrXuMbw1YRAiLKAKCfyvkn4pXWMhjCWWwc1K aNWqZi5wCggkTt
C8U8/ToYrvsL+6CHgq8JIz0=
=3vRO
-----END PGP SIGNATURE-----
Ricky Romaya
Guest
 
Posts: n/a
#9: Jul 17 '05

re: Calculating CRC32 for uploaded files


"Chung Leong" <chernyshevsky@hotmail.com> wrote in
news:ULWdnfxcneZdWvjcRVn-sA@comcast.com:[color=blue]
>
> Plus there's md5_file(), so you don't have to load the entire file
> into memory to calculating the hash.
>
>
>[/color]
Finally, someone answers my original question. Thx for the pointer. Why
didn't I see it when scouring the manual.
Closed Thread