473,606 Members | 2,101 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Calculating CRC32 for uploaded files

Hi,

I'm working on a file upload script. I need to calculate the CRC32 of the
file(s) which are successfully uploaded. How can I do this? PHP only have
CRC32 function for strings. However, the uploaded file(s) are mostly
binaries, and assumed have large size (5-12 MB per file).

Are there any ways other than CRC32 (which supported by default PHP
installation) to generate a unique hash of an arbitrary files? (the users
of my script are assumed have no knowledge of unique file hash, so I can't
depend on them to generate them prior upload)

TIA
Jul 17 '05 #1
8 4339
Ricky Romaya wrote:

Are there any ways other than CRC32 (which supported by default PHP
installation) to generate a unique hash of an arbitrary files? (the users
of my script are assumed have no knowledge of unique file hash, so I can't
depend on them to generate them prior upload)

There are other hash functions, but for files of this size you'd be better
shelling out and running a program specifically designed for the function.

crc32 is hardly the cutting edge of file hashes. MD5 works quite well and is
supported by must systems (and free source code is available)

HTH

C.

Jul 17 '05 #2
Colin McKinnon <co************ **@andthis.mms3 .com> wrote in
news:ck******** ***********@new s.demon.co.uk:
There are other hash functions, but for files of this size you'd be
better shelling out and running a program specifically designed for
the function.
Uh, the problem is I don't own the server and used a webhosting instead. If
only I could shell out and run a file hashing program, my life will be
easier.
crc32 is hardly the cutting edge of file hashes. MD5 works quite well
and is supported by must systems (and free source code is available)

Hmm, care to elaborate about what is the 'cutting edge' file hashes
algorithm?

TIA
Jul 17 '05 #3

"Colin McKinnon" <co************ **@andthis.mms3 .com> wrote in message
news:ck******** ***********@new s.demon.co.uk.. .
crc32 is hardly the cutting edge of file hashes. MD5 works quite well and is supported by must systems (and free source code is available)


Plus there's md5_file(), so you don't have to load the entire file into
memory to calculating the hash.
Jul 17 '05 #4
In article <Xn************ *************** *****@66.250.14 6.159>,
Ricky Romaya <so*******@some where.com> wrote:
Colin McKinnon <co************ **@andthis.mms3 .com> wrote in
news:ck******** ***********@new s.demon.co.uk:
There are other hash functions, but for files of this size you'd be
better shelling out and running a program specifically designed for
the function.

Uh, the problem is I don't own the server and used a webhosting instead. If
only I could shell out and run a file hashing program, my life will be
easier.
crc32 is hardly the cutting edge of file hashes. MD5 works quite well
and is supported by must systems (and free source code is available)

Hmm, care to elaborate about what is the 'cutting edge' file hashes
algorithm?

TIA


Aren't we a lazy-ass bum this afternoon...

Doing a simple Goggle search on CRC32 and MD5 gives some choice hits:

http://us4.php.net/crc32 (this what the OP originally ask for)
http://www.freesoft.org/CIE/RFC/1510/78.htm
http://www.kb.cert.org/vuls/id/945216

http://us4.php.net/md5 (use to calculate md5 on a file)
http://userpages.umbc.edu/~mabzug1/cs/md5/md5.html
http://www.faqs.org/rfcs/rfc1321.html

Basically, crc32 hashes aren't unique while md5 hashes are. SUN offers
md5 checksums of all the files in the Solaris distributions as a
'fingerprint' to verify if a file is authentic. That way a sysadmin can
verify if the "ls" or "ps" they're using is the original from SUN.

--
DeeDee, don't press that button! DeeDee! NO! Dee...

Jul 17 '05 #5
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Michael Vilain wrote:

[snip]
Basically, crc32 hashes aren't unique while md5 hashes are. SUN
offers md5 checksums of all the files in the Solaris distributions
as a
'fingerprint' to verify if a file is authentic. That way a sysadmin
can verify if the "ls" or "ps" they're using is the original from
SUN.


Hi,
I'm sorry, but MD5 hashes are *not* unique. An MD5 hash is 128 bits
long; therefore, for any input length > 128 bits, there must be at
*least* two possible inputs which produce the same output. For the
given file lengths measured in megabytes, there would be an immense
number of possible inputs that give the same output: the only thing
is, it's relatively difficult to arbitrarily *find* another file with
the same MD5 as a given input. They do exist, however, as a little
math demonstrates:

Number of possible MD5 hashes=2^128=3. 402823669209384 6e+38
Number of possible 1 kilobit files=2^1024=1. 797693134862315 9e+308

where ^ means "to the power of"

As you see, if the input is only a kilobit long, there are *immensely*
more possible inputs than possible outputs. Since every possible
input is mapped to some output, obviously multiple inputs must be
mapped to the same output. This is called a "hash collision". As far
as I know, MD5 is not perfectly secure about this (these are just
news items I read recently, I didn't look in detail at the subject);
however, a more secure hash, such as SHA-1, although obviously still
suffering from the *existence* of hash collisions, makes *looking*
for them very difficult (i.e. you just have to try every possible
input until you get a collision).

Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBZgl/gxSrXuMbw1YRAto WAJkBV342ESDMMh RmcJ28QX/wmUweUwCg+HI8
irJmD8Aelju4mJw xXN586Xo=
=d+rO
-----END PGP SIGNATURE-----
Jul 17 '05 #6
Regarding this well-known quote, often attributed to Chris's famous "Fri,
08 Oct 2004 03:29:03 GMT" speech:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Michael Vilain wrote:

[snip]
Basically, crc32 hashes aren't unique while md5 hashes are. SUN
offers md5 checksums of all the files in the Solaris distributions
as a
'fingerprint' to verify if a file is authentic. That way a sysadmin
can verify if the "ls" or "ps" they're using is the original from
SUN.


Hi,
I'm sorry, but MD5 hashes are *not* unique. An MD5 hash is 128 bits
long; therefore, for any input length > 128 bits, there must be at
*least* two possible inputs which produce the same output. For the
given file lengths measured in megabytes, there would be an immense
number of possible inputs that give the same output: the only thing
is, it's relatively difficult to arbitrarily *find* another file with
the same MD5 as a given input. They do exist, however, as a little
math demonstrates:

(snipped: big files/small hashes--some will be the same)


But the idea, IIRC, is that although there may be collisions, the chance of
two *legible* inputs with the same MD5 are immensely small. Most collisions
will just be one intelligible value, and one with unusable garbage. Hence,
MD5's usefulness in calculating file integrity (it would be very difficult,
and quite detectable, to inject malware into a file and keep the MD5), and
its dubious state as a password-security mechanism (since a password needs
to be legible in no other way except to pass the MD5 check).

--
-- Rudy Fleminger
-- sp@mmers.and.ev il.ones.will.bo w-down-to.us
(put "Hey!" in the Subject line for priority processing!)
-- http://www.pixelsaredead.com
Jul 17 '05 #7
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

FLEB wrote:

<snip>
But the idea, IIRC, is that although there may be collisions, the
chance of two *legible* inputs with the same MD5 are immensely
small. Most collisions will just be one intelligible value, and one
with unusable garbage. Hence, MD5's usefulness in calculating file
integrity (it would be very difficult, and quite detectable, to
inject malware into a file and keep the MD5), and its dubious state
as a password-security mechanism (since a password needs to be
legible in no other way except to pass the MD5 check).


Agreed. Of course, as I read somewhere (and makes a lot of sense), all
you *need* is gibberish that passes the MD5 test if it's an MD5
validating a BIOS flash. Kind of doesn't matter *what* you put there
if all you want to do is break the computer, does it? (I know BIOS
chips can be replaced; however, this would be a major undertaking for
many people who just might possibly consider flashing their BIOSes).

Of course, for uploading plain old files to a server, it's probably
excellent - and besides, the original purpose was to make sure the
file wasn't *accidentally* changed, for which it's excellent.

Chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBZ5PxgxS rXuMbw1YRAiLKAK Cfyvkn4pXWMhjCW Wwc1KaNWqZi5wCg gkTt
C8U8/ToYrvsL+6CHgq8J Iz0=
=3vRO
-----END PGP SIGNATURE-----
Jul 17 '05 #8
"Chung Leong" <ch***********@ hotmail.com> wrote in
news:UL******** ************@co mcast.com:

Plus there's md5_file(), so you don't have to load the entire file
into memory to calculating the hash.

Finally, someone answers my original question. Thx for the pointer. Why
didn't I see it when scouring the manual.
Jul 17 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
10017
by: PaullyB | last post by:
Hi, I am attempting to convert the following code written in c to equivalent java code. This is the CRC32 algorithm used by a GPS received I am interfacing with. Unfortunately, the CRC32 class provided in the java API does not suit my needs because it does not allow manipulation of the polynomial being used. Below is the original c-code followed by my attempt to convert the code to java. Unfortunately, my code does not produce the...
2
3413
by: nobody | last post by:
1) Does anyone know if the CRC32 algorithm in binascii has a name? There seem to be a TON of different CRC32 methods; different polynomials, different byte orders, different seeds, some flip the bits and some XOR the result with something else. I've been looking around and a lot of documents (RFCs, etc) refer to ISO3309 and/or ITU-T V.42 (neither of which appear to be available without paid access), yet are often incompatible with each...
6
5825
by: Weiguang Shi | last post by:
Hi there, I'm thinking of using binascii.crc32 as a hash-function when I read in the reference http://www.python.org/doc/current/lib/module-binascii.html: crc32( data) Compute CRC-32, the 32-bit checksum of data, starting with an initial crc. This is consistent with the ZIP file checksum.
14
8766
by: Don | last post by:
Hi NG. Does anyone know of a place where I could download/get a C implementation of a CRC32 check. I would like a simple function that, for example, had a pointer to where the data to be CRC32 calculated reside, an indication of the length of the data and perhaps the polynomium as input arguments and then would return the calculated crc32 value like e.g. an unsigned long. Don
9
10819
by: UnixUser | last post by:
I am looking for some source code to run on Linux that will enable me to calculate and return a CRC32 value from a string of text. I have found one from snippets.org, but I cannot get it to compile. Please help me find something that is simple to install, includes all header and language files and that will compile.
6
4376
by: Paul M. | last post by:
Hello, does anyone have either a User Function Library (or the source for one) to create a CRC32 checksum for a given string? I want to use the function in a crystal formula thus: formula = CRC32("123456789") This would then display on the report the CRC32 checksum for "123456789".
12
9818
by: Larry Bates | last post by:
I'm trying to get the results of binascii.crc32 to match the results of another utility that produces 32 bit unsigned CRCs. binascii.crc32 returns results in the range of -2**31-1 and 2**21-1. Has anyone ever worked out any "bit twiddling" code to get a proper unsigned 32 bit result from binascii.crc32? Output snip from test on three files: binascii.crc32=-1412119273, oldcrc32= 2221277246
1
1692
by: Alex | last post by:
Hi, Is there anyway to calculate a checksum (MD5, CRC32, etc) of files using VB2005? What we need is a simple program that can go through a directory and store the path and file name, checksum, and date. Thanks --- Alex
2
3935
by: tlsk | last post by:
Hi I need to calculate the crc32 value for an unsigned array in C++.It goes like this.. unsigned char Myarray; //Myarray contains hex value ... b525b4d0ad533acee2d6a214453a279e Need to calculate the crc32 value for this Myarray. ------------------------------------------------------------------------------------------
0
8010
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
7942
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8433
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8429
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8300
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6761
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5461
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
3922
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
1287
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.