473,378 Members | 1,146 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,378 software developers and data experts.

Calculating CRC32 for uploaded files

Hi,

I'm working on a file upload script. I need to calculate the CRC32 of the
file(s) which are successfully uploaded. How can I do this? PHP only have
CRC32 function for strings. However, the uploaded file(s) are mostly
binaries, and assumed have large size (5-12 MB per file).

Are there any ways other than CRC32 (which supported by default PHP
installation) to generate a unique hash of an arbitrary files? (the users
of my script are assumed have no knowledge of unique file hash, so I can't
depend on them to generate them prior upload)

TIA
Jul 17 '05 #1
8 4317
Ricky Romaya wrote:

Are there any ways other than CRC32 (which supported by default PHP
installation) to generate a unique hash of an arbitrary files? (the users
of my script are assumed have no knowledge of unique file hash, so I can't
depend on them to generate them prior upload)

There are other hash functions, but for files of this size you'd be better
shelling out and running a program specifically designed for the function.

crc32 is hardly the cutting edge of file hashes. MD5 works quite well and is
supported by must systems (and free source code is available)

HTH

C.

Jul 17 '05 #2
Colin McKinnon <co**************@andthis.mms3.com> wrote in
news:ck*******************@news.demon.co.uk:
There are other hash functions, but for files of this size you'd be
better shelling out and running a program specifically designed for
the function.
Uh, the problem is I don't own the server and used a webhosting instead. If
only I could shell out and run a file hashing program, my life will be
easier.
crc32 is hardly the cutting edge of file hashes. MD5 works quite well
and is supported by must systems (and free source code is available)

Hmm, care to elaborate about what is the 'cutting edge' file hashes
algorithm?

TIA
Jul 17 '05 #3

"Colin McKinnon" <co**************@andthis.mms3.com> wrote in message
news:ck*******************@news.demon.co.uk...
crc32 is hardly the cutting edge of file hashes. MD5 works quite well and is supported by must systems (and free source code is available)


Plus there's md5_file(), so you don't have to load the entire file into
memory to calculating the hash.
Jul 17 '05 #4
In article <Xn********************************@66.250.146.159 >,
Ricky Romaya <so*******@somewhere.com> wrote:
Colin McKinnon <co**************@andthis.mms3.com> wrote in
news:ck*******************@news.demon.co.uk:
There are other hash functions, but for files of this size you'd be
better shelling out and running a program specifically designed for
the function.

Uh, the problem is I don't own the server and used a webhosting instead. If
only I could shell out and run a file hashing program, my life will be
easier.
crc32 is hardly the cutting edge of file hashes. MD5 works quite well
and is supported by must systems (and free source code is available)

Hmm, care to elaborate about what is the 'cutting edge' file hashes
algorithm?

TIA


Aren't we a lazy-ass bum this afternoon...

Doing a simple Goggle search on CRC32 and MD5 gives some choice hits:

http://us4.php.net/crc32 (this what the OP originally ask for)
http://www.freesoft.org/CIE/RFC/1510/78.htm
http://www.kb.cert.org/vuls/id/945216

http://us4.php.net/md5 (use to calculate md5 on a file)
http://userpages.umbc.edu/~mabzug1/cs/md5/md5.html
http://www.faqs.org/rfcs/rfc1321.html

Basically, crc32 hashes aren't unique while md5 hashes are. SUN offers
md5 checksums of all the files in the Solaris distributions as a
'fingerprint' to verify if a file is authentic. That way a sysadmin can
verify if the "ls" or "ps" they're using is the original from SUN.

--
DeeDee, don't press that button! DeeDee! NO! Dee...

Jul 17 '05 #5
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Michael Vilain wrote:

[snip]
Basically, crc32 hashes aren't unique while md5 hashes are. SUN
offers md5 checksums of all the files in the Solaris distributions
as a
'fingerprint' to verify if a file is authentic. That way a sysadmin
can verify if the "ls" or "ps" they're using is the original from
SUN.


Hi,
I'm sorry, but MD5 hashes are *not* unique. An MD5 hash is 128 bits
long; therefore, for any input length > 128 bits, there must be at
*least* two possible inputs which produce the same output. For the
given file lengths measured in megabytes, there would be an immense
number of possible inputs that give the same output: the only thing
is, it's relatively difficult to arbitrarily *find* another file with
the same MD5 as a given input. They do exist, however, as a little
math demonstrates:

Number of possible MD5 hashes=2^128=3.4028236692093846e+38
Number of possible 1 kilobit files=2^1024=1.7976931348623159e+308

where ^ means "to the power of"

As you see, if the input is only a kilobit long, there are *immensely*
more possible inputs than possible outputs. Since every possible
input is mapped to some output, obviously multiple inputs must be
mapped to the same output. This is called a "hash collision". As far
as I know, MD5 is not perfectly secure about this (these are just
news items I read recently, I didn't look in detail at the subject);
however, a more secure hash, such as SHA-1, although obviously still
suffering from the *existence* of hash collisions, makes *looking*
for them very difficult (i.e. you just have to try every possible
input until you get a collision).

Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBZgl/gxSrXuMbw1YRAtoWAJkBV342ESDMMhRmcJ28QX/wmUweUwCg+HI8
irJmD8Aelju4mJwxXN586Xo=
=d+rO
-----END PGP SIGNATURE-----
Jul 17 '05 #6
Regarding this well-known quote, often attributed to Chris's famous "Fri,
08 Oct 2004 03:29:03 GMT" speech:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Michael Vilain wrote:

[snip]
Basically, crc32 hashes aren't unique while md5 hashes are. SUN
offers md5 checksums of all the files in the Solaris distributions
as a
'fingerprint' to verify if a file is authentic. That way a sysadmin
can verify if the "ls" or "ps" they're using is the original from
SUN.


Hi,
I'm sorry, but MD5 hashes are *not* unique. An MD5 hash is 128 bits
long; therefore, for any input length > 128 bits, there must be at
*least* two possible inputs which produce the same output. For the
given file lengths measured in megabytes, there would be an immense
number of possible inputs that give the same output: the only thing
is, it's relatively difficult to arbitrarily *find* another file with
the same MD5 as a given input. They do exist, however, as a little
math demonstrates:

(snipped: big files/small hashes--some will be the same)


But the idea, IIRC, is that although there may be collisions, the chance of
two *legible* inputs with the same MD5 are immensely small. Most collisions
will just be one intelligible value, and one with unusable garbage. Hence,
MD5's usefulness in calculating file integrity (it would be very difficult,
and quite detectable, to inject malware into a file and keep the MD5), and
its dubious state as a password-security mechanism (since a password needs
to be legible in no other way except to pass the MD5 check).

--
-- Rudy Fleminger
-- sp@mmers.and.evil.ones.will.bow-down-to.us
(put "Hey!" in the Subject line for priority processing!)
-- http://www.pixelsaredead.com
Jul 17 '05 #7
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

FLEB wrote:

<snip>
But the idea, IIRC, is that although there may be collisions, the
chance of two *legible* inputs with the same MD5 are immensely
small. Most collisions will just be one intelligible value, and one
with unusable garbage. Hence, MD5's usefulness in calculating file
integrity (it would be very difficult, and quite detectable, to
inject malware into a file and keep the MD5), and its dubious state
as a password-security mechanism (since a password needs to be
legible in no other way except to pass the MD5 check).


Agreed. Of course, as I read somewhere (and makes a lot of sense), all
you *need* is gibberish that passes the MD5 test if it's an MD5
validating a BIOS flash. Kind of doesn't matter *what* you put there
if all you want to do is break the computer, does it? (I know BIOS
chips can be replaced; however, this would be a major undertaking for
many people who just might possibly consider flashing their BIOSes).

Of course, for uploading plain old files to a server, it's probably
excellent - and besides, the original purpose was to make sure the
file wasn't *accidentally* changed, for which it's excellent.

Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBZ5PxgxSrXuMbw1YRAiLKAKCfyvkn4pXWMhjCWWwc1K aNWqZi5wCggkTt
C8U8/ToYrvsL+6CHgq8JIz0=
=3vRO
-----END PGP SIGNATURE-----
Jul 17 '05 #8
"Chung Leong" <ch***********@hotmail.com> wrote in
news:UL********************@comcast.com:

Plus there's md5_file(), so you don't have to load the entire file
into memory to calculating the hash.

Finally, someone answers my original question. Thx for the pointer. Why
didn't I see it when scouring the manual.
Jul 17 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: PaullyB | last post by:
Hi, I am attempting to convert the following code written in c to equivalent java code. This is the CRC32 algorithm used by a GPS received I am interfacing with. Unfortunately, the CRC32 class...
2
by: nobody | last post by:
1) Does anyone know if the CRC32 algorithm in binascii has a name? There seem to be a TON of different CRC32 methods; different polynomials, different byte orders, different seeds, some flip the...
6
by: Weiguang Shi | last post by:
Hi there, I'm thinking of using binascii.crc32 as a hash-function when I read in the reference http://www.python.org/doc/current/lib/module-binascii.html: crc32( data) Compute CRC-32, the...
14
by: Don | last post by:
Hi NG. Does anyone know of a place where I could download/get a C implementation of a CRC32 check. I would like a simple function that, for example, had a pointer to where the data to be CRC32...
9
by: UnixUser | last post by:
I am looking for some source code to run on Linux that will enable me to calculate and return a CRC32 value from a string of text. I have found one from snippets.org, but I cannot get it to...
6
by: Paul M. | last post by:
Hello, does anyone have either a User Function Library (or the source for one) to create a CRC32 checksum for a given string? I want to use the function in a crystal formula thus: formula =...
12
by: Larry Bates | last post by:
I'm trying to get the results of binascii.crc32 to match the results of another utility that produces 32 bit unsigned CRCs. binascii.crc32 returns results in the range of -2**31-1 and 2**21-1....
1
by: Alex | last post by:
Hi, Is there anyway to calculate a checksum (MD5, CRC32, etc) of files using VB2005? What we need is a simple program that can go through a directory and store the path and file name, checksum,...
2
by: tlsk | last post by:
Hi I need to calculate the crc32 value for an unsigned array in C++.It goes like this.. unsigned char Myarray; //Myarray contains hex value ... b525b4d0ad533acee2d6a214453a279e Need to...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.