473,327 Members | 2,094 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,327 software developers and data experts.

Max files in unix folder from PIL process

Hi. I am creating a python application that uses PIL to generate
thumbnails and sized images. It is beginning to look the volume of
images will be large. This has got me to thinking. Is there a number
that Unix can handle in a single directory. I am using FreeBSD4.x at
the moment. I am thinking the number could be as high 500,000 images in
a single directory but more likely in the range of 6,000 to 30,000 for
most. I did not want to store these in Postgres. Should this pose a
problem on the filesystem? I realize less a python issue really but I
though some one might have an idea on the list.

Regards,
David.
Jul 18 '05 #1
4 1868
I ran into a similar situation with a massive directory of PIL
generated images (around 10k). No problems on the filesystem/Python
side of things but other tools (most noteably 'ls') don't cope very
well. As it happens my data has natural groups so I broke the big
dir into subdirs to sidestep the problem.

Jul 18 '05 #2
Kane wrote:
I ran into a similar situation with a massive directory of PIL
generated images (around 10k). No problems on the filesystem/Python
side of things but other tools (most noteably 'ls') don't cope very
well.


My experience suggests that 'ls' has a lousy sort routine or
that it takes a long time to get the metadata.

When I've had to deal with a huge number of files in a directory
I can get the list very quickly in Python using os.listdir
even though ls is slow. If you're in that situation again, see
if the '-f' for unsorted flag makes a difference or use '-1'
to see if it's all the stat calls.

Andrew
da***@dalkescientific.com

Jul 18 '05 #3
David Pratt wrote:
Hi. I am creating a python application that uses PIL to generate
thumbnails and sized images. It is beginning to look the volume of
images will be large. This has got me to thinking. Is there a number
that Unix can handle in a single directory. I am using FreeBSD4.x at the
moment. I am thinking the number could be as high 500,000 images in a
single directory but more likely in the range of 6,000 to 30,000 for
most. I did not want to store these in Postgres. Should this pose a
problem on the filesystem? I realize less a python issue really but I
though some one might have an idea on the list.


It all depends on the file system you are using, and somewhat on the
operations you are typically performing. I assume this is ufs/ffs, so
the directory is a linear list of all files.

This causes some performance concerns for accessing: if you want to
access an individual file, you need to scan the entire directory. The
size of a directory entry depends on the length of a name. Assuming
file names of 10 characters, in which case each entry is 20 bytes, a
directory with 500,000 images file names requires 10MB on disk. So
each directory lookup would potentially require to read 10MB from
disk, which might be noticable. For 6,000 entries, the directory
size is 120kB, which might not be noticable.

In 4.4+, there is a kernel compile time option UFS_DIRHASH,
which causes creation of an in-memory hashtable for directories,
speeding up lookups significantly. This requires, of course, enough
main memory to actually keep the hashtable.

Regards,
Martin
Jul 18 '05 #4
Yes I'm talking Linux not BSD so with any luck you won't have the same
'ls' issue; it is not a crash but painfully slow. The only other issue
I recall is wildcards fail if they encompass too many files (presumably
a bash/max command line size).

I would expect the various GUI file managers may give unpredictable
results; I would also not rely on remotely mounting the bigdir
cross-platform.

Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Xah Lee | last post by:
suppose you want to do find & replace of string of all files in a directory. here's the code: ©# -*- coding: utf-8 -*- ©# Python © ©import os,sys © ©mydir= '/Users/t/web'
8
by: vinesh | last post by:
I have sample Asp.Net Web Application project. Let me know how to keep the files related to this project (like the webform.aspx, WebForm1.aspx.vb, WebForm1.aspx.resx) in a separate folder within a...
4
by: Michael A. Covington | last post by:
I'm developing an application that will handle files in groups of 4, namely 3 video files plus a script saying how to put them together. These are all files that I will deliver with the app, so I...
7
by: jonathandrott | last post by:
sorry newbie question probably. i'm trying to open an specific folder. open each file with in the folder individually and process each one. all the processing code has been written. i'm looking...
4
by: jonathan184 | last post by:
Hi I have a perl script, basically what it is suppose to do is check a folder with files. Now the files are checked using a timestamp with the command ls -l so the timestamp in this format is...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.