Reading image dimensions with PIL

Will McGugan

Hi,

I'm writing an app that downloads images. It rejects images that are
under a certain size - whithout downloading them completely. I've
implemented this using PIL, by downloading the first K and trying to
create a PIL image with it. PIL raises an exception because the file is
incomplete, but the image object is initialised with the image
dimensions, which is what I need. It actualy works well enough, but I'm
concerened about side-effects - since it seems an unconventional way of
working with PIL. Can anyone see any problems with doing this? Or a
better method?
Thanks,

Will McGugan
--
http://www.willmcgugan.com
"".join( [ {'*':'@','^':'.'}.get(c,None) or chr(97+(ord(c)-84)%26) for c
in "jvyy*jvyyzpthtna^pbz" ] )

Jul 19 '05 #1

Subscribe Post Reply

6246

Dave Brueck

Will McGugan wrote:

I'm writing an app that downloads images. It rejects images that are
under a certain size - whithout downloading them completely. I've
implemented this using PIL, by downloading the first K and trying to
create a PIL image with it. PIL raises an exception because the file is
incomplete, but the image object is initialised with the image
dimensions, which is what I need. It actualy works well enough, but I'm
concerened about side-effects - since it seems an unconventional way of
working with PIL. Can anyone see any problems with doing this? Or a
better method?

If you're tossing images that are too _small_, is there any benefit to not
downloading the whole image, checking it, and then throwing it away?

Checking just the first 1K probably won't save you too much time unless you're
over a modem. Are you using a byte-range HTTP request to pull down the images or
just a normal GET (via e.g. urllib)? If you're not using a byte-range request,
then all of the data is already on its way so maybe you could go ahead and get
it all.

But hey, if your current approach works... :) It _is_ a bit unconventional, so
to reduce the risk you could test it on a decent mix of image types (normal
JPEG, progressive JPEG, normal & progressive GIF, png, etc.) - just to make sure
PIL is able to handle partial data for all different types you might encounter.

Also, if PIL can't handle the partial data, can you reliably detect that
scenario? If so, you could detect that case and use the
download-it-all-and-check approach as a failsafe.

-Dave

Jul 19 '05 #2

Will McGugan

Dave Brueck wrote:

If you're tossing images that are too _small_, is there any benefit to
not downloading the whole image, checking it, and then throwing it away?
Its a 'webscraper' app that downloads images based on search criteria.
The user may want only images above 640x480, although the general case
will be something like 200x200 to avoid downloading thumbnails

Checking just the first 1K probably won't save you too much time unless
you're over a modem. Are you using a byte-range HTTP request to pull
down the images or just a normal GET (via e.g. urllib)? If you're not
using a byte-range request, then all of the data is already on its way
so maybe you could go ahead and get it all.
I'm not familiar with byte-range requests. Is this a standard feature of
webservers? I know there will be more that one K in the pipeline if I do
a read, but if I close the file object from urllib it will stop the
download if there is data remaining - wont it?

But hey, if your current approach works... :) It _is_ a bit
unconventional, so to reduce the risk you could test it on a decent mix
of image types (normal JPEG, progressive JPEG, normal & progressive GIF,
png, etc.) - just to make sure PIL is able to handle partial data for
all different types you might encounter.

Also, if PIL can't handle the partial data, can you reliably detect that
scenario? If so, you could detect that case and use the
download-it-all-and-check approach as a failsafe.

The PIL code worked with most of the images I threw at it (just jpegs),
if there was no 'size' attribute then I just continue to download the
entire image. It may have caused a memory leak though, with this code in
memory usage increased continuously..

Actualy, this may all be moot now. Originally I looked at reading the
image dimensions from the jpeg header, but that turned out to be
non-trivial and I gave up. Fortunately I found some Perl code that does
it, and converted it to Python (and I dont even know Perl!). Here's the
code if anyone is interested..

import struct
def GetJpegSize(data):

idata = iter(data)

width = None
height = None

try:

B1 = ord(idata.next())
B2 = ord(idata.next())

if B1 != 0xFF or B2 != 0xD8:
return -1, -1

while True:

byte = ord(idata.next())

while byte != 0xFF:
byte = ord(idata.next())

while byte == 0xFF:
byte = ord(idata.next())

if byte >= 0xc0 and byte <= 0xc3:
idata.next()
idata.next()
idata.next()
height, width = struct.unpack( '>HH',
"".join(idata.next() for b in range(4)) )
break
else:
offset = struct.unpack('>H', idata.next() +
idata.next())[0] - 2
for _ in xrange(offset):
idata.next()

except StopIteration:
pass

return width, height
if __name__ == "__main__":

first_k = file("test.jpg","rb").read(1024)

print GetJpegSize(first_k)
Returns (-1, -1) for a non-jpeg, or (None, None) if the size wasn't
contained in the data supplied (some jpegs have embedded thumbnails), or
(width, height) if the dimensions were found.

And the original source: http://wiki.tcl.tk/757
Thanks,

Will
--
http://www.willmcgugan.com
"".join( [ {'*':'@','^':'.'}.get(c,None) or chr(97+(ord(c)-84)%26) for c
in "jvyy*jvyyzpthtna^pbz" ] )

Jul 19 '05 #3

Fredrik Lundh

Will McGugan wrote:

I'm writing an app that downloads images. It rejects images that are
under a certain size - whithout downloading them completely. I've
implemented this using PIL, by downloading the first K and trying to
create a PIL image with it. PIL raises an exception because the file is
incomplete, but the image object is initialised with the image
dimensions, which is what I need. It actualy works well enough, but I'm
concerened about side-effects - since it seems an unconventional way of
working with PIL. Can anyone see any problems with doing this? Or a
better method?

the "right" way to do this is to use the ImageFile.Parser class. see the
last snippet on this page for an example:

http://effbot.org/zone/pil-image-size.htm

</F>

Jul 19 '05 #4

Will McGugan

Fredrik Lundh wrote:

the "right" way to do this is to use the ImageFile.Parser class. see the
last snippet on this page for an example:

http://effbot.org/zone/pil-image-size.htm

Excellent, thanks.

Will
--
http://www.willmcgugan.com
"".join( [ {'*':'@','^':'.'}.get(c,None) or chr(97+(ord(c)-84)%26) for c
in "jvyy*jvyyzpthtna^pbz" ] )

Jul 19 '05 #5

by: dave | last post by:

Hello there, I am at my wit's end ! I have used the following script succesfully to upload an image to my web space. But what I really want to be able to do is to update an existing record in a...

PHP

image grows outside a box

by: Abs | last post by:

Hi! I have an image inside a DIV box with its dimensions specified. The problem is that I don't know the dimensions of the image beforehand, therefore the image grows outside the box if it's...

HTML / CSS

GDI/reading image dimensions

by: Thomas | last post by:

Hi all, I'm in search of a fast solution to reading image dimensions. I know you can get the image dimensions by loading a file into an image object, and then reading the height/width...

.NET Framework

Rotating images and different image dimensions

by: John | last post by:

I am rotating images at one location of my web site. My problem is if I set the width and height of the new image before I show the new image, the old image is stretched first to the new image...

Javascript

Image rotation and dimensions

by: John | last post by:

I am rotating images of different dimensions. My problem is that when a new image is displayed in a new position which had an image of a different dimension, the old image is first stretched to the...

Javascript

Opening image in new window without padding

by: D. Alvarado | last post by:

Hello, I am trying to open a window containing an image and I would like the image to be flush against the window -- i.e. have no padding or border. Can I make this happen with a single call to a...

HTML / CSS

Image Dimensions

by: MurrayTh | last post by:

Is there any way to determine the dimensions of an image using the image's URL? ie load image based on URL, and then get dimensions? or perhaps better method?

C# / C Sharp

reading a jpg file and resizing it on the fly ?

by: Koen Hoorelbeke | last post by:

Hi there, I want to read a jpg-file from a url (f.e. http://someserverWhichIsNOTLocal/images/test.jpg), get the dimensions, resize it, save it to my local disk on the webserver as a gif-image and...

ASP.NET

retrieve / find out an image's dimensions

by: Adam Teale | last post by:

hey guys Is there a builtin/standard install method in python for retrieving or finding out an image's dimensions? A quick google found me this:...

Python

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Reading image dimensions with PIL

Similar topics