By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,222 Members | 2,416 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,222 IT Pros & Developers. It's quick & easy.

Max files in unix folder from PIL process

P: n/a
Hi. I am creating a python application that uses PIL to generate
thumbnails and sized images. It is beginning to look the volume of
images will be large. This has got me to thinking. Is there a number
that Unix can handle in a single directory. I am using FreeBSD4.x at
the moment. I am thinking the number could be as high 500,000 images in
a single directory but more likely in the range of 6,000 to 30,000 for
most. I did not want to store these in Postgres. Should this pose a
problem on the filesystem? I realize less a python issue really but I
though some one might have an idea on the list.

Regards,
David.
Jul 18 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
I ran into a similar situation with a massive directory of PIL
generated images (around 10k). No problems on the filesystem/Python
side of things but other tools (most noteably 'ls') don't cope very
well. As it happens my data has natural groups so I broke the big
dir into subdirs to sidestep the problem.

Jul 18 '05 #2

P: n/a
Kane wrote:
I ran into a similar situation with a massive directory of PIL
generated images (around 10k). No problems on the filesystem/Python
side of things but other tools (most noteably 'ls') don't cope very
well.


My experience suggests that 'ls' has a lousy sort routine or
that it takes a long time to get the metadata.

When I've had to deal with a huge number of files in a directory
I can get the list very quickly in Python using os.listdir
even though ls is slow. If you're in that situation again, see
if the '-f' for unsorted flag makes a difference or use '-1'
to see if it's all the stat calls.

Andrew
da***@dalkescientific.com

Jul 18 '05 #3

P: n/a
David Pratt wrote:
Hi. I am creating a python application that uses PIL to generate
thumbnails and sized images. It is beginning to look the volume of
images will be large. This has got me to thinking. Is there a number
that Unix can handle in a single directory. I am using FreeBSD4.x at the
moment. I am thinking the number could be as high 500,000 images in a
single directory but more likely in the range of 6,000 to 30,000 for
most. I did not want to store these in Postgres. Should this pose a
problem on the filesystem? I realize less a python issue really but I
though some one might have an idea on the list.


It all depends on the file system you are using, and somewhat on the
operations you are typically performing. I assume this is ufs/ffs, so
the directory is a linear list of all files.

This causes some performance concerns for accessing: if you want to
access an individual file, you need to scan the entire directory. The
size of a directory entry depends on the length of a name. Assuming
file names of 10 characters, in which case each entry is 20 bytes, a
directory with 500,000 images file names requires 10MB on disk. So
each directory lookup would potentially require to read 10MB from
disk, which might be noticable. For 6,000 entries, the directory
size is 120kB, which might not be noticable.

In 4.4+, there is a kernel compile time option UFS_DIRHASH,
which causes creation of an in-memory hashtable for directories,
speeding up lookups significantly. This requires, of course, enough
main memory to actually keep the hashtable.

Regards,
Martin
Jul 18 '05 #4

P: n/a
Yes I'm talking Linux not BSD so with any luck you won't have the same
'ls' issue; it is not a crash but painfully slow. The only other issue
I recall is wildcards fail if they encompass too many files (presumably
a bash/max command line size).

I would expect the various GUI file managers may give unpredictable
results; I would also not rely on remotely mounting the bigdir
cross-platform.

Jul 18 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.