Hi,
I'm writing an app that downloads images. It rejects images that are
under a certain size - whithout downloading them completely. I've
implemented this using PIL, by downloading the first K and trying to
create a PIL image with it. PIL raises an exception because the file is
incomplete, but the image object is initialised with the image
dimensions, which is what I need. It actualy works well enough, but I'm
concerened about side-effects - since it seems an unconventional way of
working with PIL. Can anyone see any problems with doing this? Or a
better method?
Thanks,
Will McGugan
-- http://www.willmcgugan.com
"".join( [ {'*':'@','^':'. '}.get(c,None) or chr(97+(ord(c)-84)%26) for c
in "jvyy*jvyyzptht na^pbz" ] ) 4 6273
Will McGugan wrote: I'm writing an app that downloads images. It rejects images that are under a certain size - whithout downloading them completely. I've implemented this using PIL, by downloading the first K and trying to create a PIL image with it. PIL raises an exception because the file is incomplete, but the image object is initialised with the image dimensions, which is what I need. It actualy works well enough, but I'm concerened about side-effects - since it seems an unconventional way of working with PIL. Can anyone see any problems with doing this? Or a better method?
If you're tossing images that are too _small_, is there any benefit to not
downloading the whole image, checking it, and then throwing it away?
Checking just the first 1K probably won't save you too much time unless you're
over a modem. Are you using a byte-range HTTP request to pull down the images or
just a normal GET (via e.g. urllib)? If you're not using a byte-range request,
then all of the data is already on its way so maybe you could go ahead and get
it all.
But hey, if your current approach works... :) It _is_ a bit unconventional, so
to reduce the risk you could test it on a decent mix of image types (normal
JPEG, progressive JPEG, normal & progressive GIF, png, etc.) - just to make sure
PIL is able to handle partial data for all different types you might encounter.
Also, if PIL can't handle the partial data, can you reliably detect that
scenario? If so, you could detect that case and use the
download-it-all-and-check approach as a failsafe.
-Dave
Dave Brueck wrote:
If you're tossing images that are too _small_, is there any benefit to not downloading the whole image, checking it, and then throwing it away?
Its a 'webscraper' app that downloads images based on search criteria.
The user may want only images above 640x480, although the general case
will be something like 200x200 to avoid downloading thumbnails Checking just the first 1K probably won't save you too much time unless you're over a modem. Are you using a byte-range HTTP request to pull down the images or just a normal GET (via e.g. urllib)? If you're not using a byte-range request, then all of the data is already on its way so maybe you could go ahead and get it all.
I'm not familiar with byte-range requests. Is this a standard feature of
webservers? I know there will be more that one K in the pipeline if I do
a read, but if I close the file object from urllib it will stop the
download if there is data remaining - wont it? But hey, if your current approach works... :) It _is_ a bit unconventional, so to reduce the risk you could test it on a decent mix of image types (normal JPEG, progressive JPEG, normal & progressive GIF, png, etc.) - just to make sure PIL is able to handle partial data for all different types you might encounter.
Also, if PIL can't handle the partial data, can you reliably detect that scenario? If so, you could detect that case and use the download-it-all-and-check approach as a failsafe.
The PIL code worked with most of the images I threw at it (just jpegs),
if there was no 'size' attribute then I just continue to download the
entire image. It may have caused a memory leak though, with this code in
memory usage increased continuously..
Actualy, this may all be moot now. Originally I looked at reading the
image dimensions from the jpeg header, but that turned out to be
non-trivial and I gave up. Fortunately I found some Perl code that does
it, and converted it to Python (and I dont even know Perl!). Here's the
code if anyone is interested..
import struct
def GetJpegSize(dat a):
idata = iter(data)
width = None
height = None
try:
B1 = ord(idata.next( ))
B2 = ord(idata.next( ))
if B1 != 0xFF or B2 != 0xD8:
return -1, -1
while True:
byte = ord(idata.next( ))
while byte != 0xFF:
byte = ord(idata.next( ))
while byte == 0xFF:
byte = ord(idata.next( ))
if byte >= 0xc0 and byte <= 0xc3:
idata.next()
idata.next()
idata.next()
height, width = struct.unpack( '>HH',
"".join(idata.n ext() for b in range(4)) )
break
else:
offset = struct.unpack(' >H', idata.next() +
idata.next())[0] - 2
for _ in xrange(offset):
idata.next()
except StopIteration:
pass
return width, height
if __name__ == "__main__":
first_k = file("test.jpg" ,"rb").read(102 4)
print GetJpegSize(fir st_k)
Returns (-1, -1) for a non-jpeg, or (None, None) if the size wasn't
contained in the data supplied (some jpegs have embedded thumbnails), or
(width, height) if the dimensions were found.
And the original source: http://wiki.tcl.tk/757
Thanks,
Will
-- http://www.willmcgugan.com
"".join( [ {'*':'@','^':'. '}.get(c,None) or chr(97+(ord(c)-84)%26) for c
in "jvyy*jvyyzptht na^pbz" ] )
Will McGugan wrote: I'm writing an app that downloads images. It rejects images that are under a certain size - whithout downloading them completely. I've implemented this using PIL, by downloading the first K and trying to create a PIL image with it. PIL raises an exception because the file is incomplete, but the image object is initialised with the image dimensions, which is what I need. It actualy works well enough, but I'm concerened about side-effects - since it seems an unconventional way of working with PIL. Can anyone see any problems with doing this? Or a better method?
the "right" way to do this is to use the ImageFile.Parse r class. see the
last snippet on this page for an example: http://effbot.org/zone/pil-image-size.htm
</F>
Fredrik Lundh wrote: the "right" way to do this is to use the ImageFile.Parse r class. see the last snippet on this page for an example:
http://effbot.org/zone/pil-image-size.htm
Excellent, thanks.
Will
-- http://www.willmcgugan.com
"".join( [ {'*':'@','^':'. '}.get(c,None) or chr(97+(ord(c)-84)%26) for c
in "jvyy*jvyyzptht na^pbz" ] ) This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: dave |
last post by:
Hello there,
I am at my wit's end ! I have used the following script succesfully to
upload an image to my web space. But what I really want to be able to do is
to update an existing record in a table in MySQL with the path & filename to
the image.
I have successfully uploaded and performed an update query on the database,
but the problem I have is I cannot retain the primary key field in a
variable which is then used in a SQL update...
|
by: Abs |
last post by:
Hi!
I have an image inside a DIV box with its dimensions specified. The
problem is that I don't know the dimensions of the image beforehand,
therefore the image grows outside the box if it's bigger than the box.
Is there a way to force the image to resize itself automatically to fit
the box with CSS ?
Thanks in advance
|
by: Thomas |
last post by:
Hi all,
I'm in search of a fast solution to reading image dimensions. I know
you can get the image dimensions by loading a file into an image
object, and then reading the height/width properties, but this is
really slow (relatively speaking).
Another poster suggested I could read the image headers manually by
streaming the first X number of bytes of the file, but this is fairly
consuming approach and hoping there's another option.
|
by: John |
last post by:
I am rotating images at one location of my web site. My problem is if
I set the width and height of the new image before I show the new
image, the old image is stretched first to the new image dimensions,
and if I show the new image before setting its dimensions, the new
image is stretched first to the old image dimension before it is
adjusted to its own dimension.
I would like to load a new image with its own dimension at the same
time....
|
by: John |
last post by:
I am rotating images of different dimensions. My problem is that when
a new image is displayed in a new position which had an image of a
different dimension, the old image is first stretched to the dimension
of the new image, before the new image is loaded. How can I stop this
stretching behavior? See the banner images on the right at
http://finialworld.com
thank you
John
| |
by: D. Alvarado |
last post by:
Hello,
I am trying to open a window containing an image and I would like
the image to be flush against the window -- i.e. have no padding or
border. Can I make this happen with a single call to a window.open
function? I would prefer not to create a separate HTML page. So far
all I have is the basic
var cwin = window.open('images/KJV-THANKS.gif',
'Thanks', 'width=243,height=420,');
cwin.focus();
|
by: MurrayTh |
last post by:
Is there any way to determine the dimensions of an image using the image's
URL?
ie load image based on URL, and then get dimensions? or perhaps better method?
|
by: Koen Hoorelbeke |
last post by:
Hi there,
I want to read a jpg-file from a url (f.e. http://someserverWhichIsNOTLocal/images/test.jpg), get the dimensions, resize it, save it to my local disk on the webserver as a gif-image and next show it on my own (mobile) pages. (Actually I'm reading some images from my normal website, which is on another machine, resize them, and show them on my mobile site ...).
The main problem lies in reading the image from a url ... can't seem...
|
by: Adam Teale |
last post by:
hey guys
Is there a builtin/standard install method in python for retrieving or
finding out an image's dimensions?
A quick google found me this:
http://www.pythonware.com/library/pil/handbook/introduction.htm
but it looks like it is something I will need to install - I'd like to
be able to pass my script around to people without them needing any
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |