By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,813 Members | 1,257 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,813 IT Pros & Developers. It's quick & easy.

Storing 12 millions images

P: n/a
I am looking at options/ways to store 12 million gif/jpg images in a
database. Either we store a link to the file or we store the image
itself in the database. Images will range from 4k to 35k in size and
there will be 12 millions of them (at the beginning)... I expect a 8%
growth every year.

We will also have to perform some cleanup jobs to delete images that
are not longer referenced by the master table. We'll also have to
consider backup.

Anyone of you has the same problem/database or something similar? I'm
interested by your comments... Things to do, things not to do, pros
and cons, tips to backup, tips to cleanup, comments, etc...

We're on MySQL 3.23.58 and Red Hat. Our database currently has over
100 million records but we're planning on having images for the 12
million most important ones... :)

Waiting for you input.

Thanks!
Jul 20 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
Benoit St-Jean wrote:
I am looking at options/ways to store 12 million gif/jpg images in a
database. Either we store a link to the file or we store the image
itself in the database. Images will range from 4k to 35k in size and
there will be 12 millions of them (at the beginning)... I expect a 8%
growth every year.

We will also have to perform some cleanup jobs to delete images that
are not longer referenced by the master table. We'll also have to
consider backup.

Anyone of you has the same problem/database or something similar? I'm
interested by your comments... Things to do, things not to do, pros
and cons, tips to backup, tips to cleanup, comments, etc...

We're on MySQL 3.23.58 and Red Hat. Our database currently has over
100 million records but we're planning on having images for the 12
million most important ones... :)


I personally think putting images into a database is a really bad idea,
and that you're best to store a link to the filename in the database.

The problem when you're dealing with that many files is that you don't
want to clutter one directory with millions of files. Your operating
system / file system may have limits on the number of files in a
directory, and access to those files will be slow if there are so many.

To get around this, you can have multiple levels of directories based on
the first few characters of the filename. So for example if the file is
called foo3455.jpg for example, its file location might
be /images/f/o/o/foo3455.jpg which should leave you with only a few
hundred images in each third level directory, depending on file naming
conventions.

--
Chris Hope - The Electric Toolbox - http://www.electrictoolbox.com/
Jul 20 '05 #2

P: n/a
Chris Hope wrote:
To get around this, you can have multiple levels of directories based on
the first few characters of the filename. So for example if the file is
called foo3455.jpg for example, its file location might
be /images/f/o/o/foo3455.jpg which should leave you with only a few
hundred images in each third level directory, depending on file naming


Of if filenames are too similar for that, or you are naming the files by
the application side, you could just give id for each file and name the
file according to the id (1.jpg, 2.jpg, 3.jpg , etc.) And place the
files into numbered folders, like this:

1-1000000
|- 1-100000
|- 1-10000
|- 20001-30000
|- 30001-40000
|- ...
|- 100001-200000
|- 200001-300000
|- 300001-400000
|- ...
1000001-2000000
|- ...
2000002-3000000
|- ...
So folder named "1-1000000" would contain all sub folders and files that
contain images between 1 to 1000000. And sub folder named "1-100000"
would contain all sub folders and files between 1 to 100000 etc. The
example system would allow you to have exatctly 10000 images for each
folder, but you could expand it to have for example 100 images for each
folder if you like to. And while this system has only 10 sub folders
under each folder, you could instead have 100 or more. Like this:

1-1000000
|- 1-10000
|- 20001-30000
|- 30001-40000
|- ...
|- 980001-990000
|- 990001-1000000
1000001-2000000
|- 1000001-1010000
|- 1020001-1030000
|- ...
2000002-3000000
|- ...

10000 might be too much files under each folder, so you might want to
fix this a little more to something like 1000 sub folders under each
folder and 1000 images under each folder, or something like that.

Just a suggestion anyway ;)
Jul 20 '05 #3

P: n/a
Aggro wrote:
Chris Hope wrote:
To get around this, you can have multiple levels of directories based
on the first few characters of the filename. So for example if the
file is called foo3455.jpg for example, its file location might
be /images/f/o/o/foo3455.jpg which should leave you with only a few
hundred images in each third level directory, depending on file
naming


Of if filenames are too similar for that, or you are naming the files
by the application side, you could just give id for each file and name
the file according to the id (1.jpg, 2.jpg, 3.jpg , etc.) And place
the files into numbered folders, like this:

1-1000000
|- 1-100000
|- 1-10000
|- 20001-30000
|- 30001-40000
|- ...
|- 100001-200000
|- 200001-300000
|- 300001-400000
|- ...
1000001-2000000
|- ...
2000002-3000000
|- ...
So folder named "1-1000000" would contain all sub folders and files
that contain images between 1 to 1000000. And sub folder named
"1-100000" would contain all sub folders and files between 1 to 100000
etc. The example system would allow you to have exatctly 10000 images
for each folder, but you could expand it to have for example 100
images for each folder if you like to. And while this system has only
10 sub folders under each folder, you could instead have 100 or more.
Like this:

1-1000000
|- 1-10000
|- 20001-30000
|- 30001-40000
|- ...
|- 980001-990000
|- 990001-1000000
1000001-2000000
|- 1000001-1010000
|- 1020001-1030000
|- ...
2000002-3000000
|- ...

10000 might be too much files under each folder, so you might want to
fix this a little more to something like 1000 sub folders under each
folder and 1000 images under each folder, or something like that.

Just a suggestion anyway ;)


Good thinking too. My idea was just really a starting point because it
really all depends on what sort of naming exists for the filenames.

The first time I came across this method of storing data/files was when
I was studying and one of our lecturers was talking about a huge phone
number database. I'm not sure why they didn't just use a relational
database, but for whatever reason they stored the data on the file
system and used a directory structure like eg 1/2/3/5/5/5/7/8/90.dat
where the first three digits were the area code and the rest of them
were the start of the phone number etc.

--
Chris Hope - The Electric Toolbox - http://www.electrictoolbox.com/
Jul 20 '05 #4

P: n/a
Chris Hope wrote:
To get around this, you can have multiple levels of directories based on
the first few characters of the filename. So for example if the file is
called foo3455.jpg for example, its file location might
be /images/f/o/o/foo3455.jpg which should leave you with only a few
hundred images in each third level directory, depending on file naming
conventions.


I agree in principle with this plan, and I can think of another
advantage: because Linux filesystems can mount at any directory, you
can make /images/f be mounted from a separate hard drive partition from
/images/a, /images/b, /images/c, /images/d, etc. Or even do the mounts
at the next level down.

This gives you a lot more flexibility of mapping logical storage to
physical storage than MySQL table files generally give you.

Regards,
Bill K.
Jul 20 '05 #5

P: n/a
"Benoit St-Jean" <bs*****@yahoo.com> wrote in message
news:35**************************@posting.google.c om...
I am looking at options/ways to store 12 million gif/jpg images in a
database. Either we store a link to the file or we store the image
itself in the database. Images will range from 4k to 35k in size and
there will be 12 millions of them (at the beginning)... I expect a 8%
growth every year.

We will also have to perform some cleanup jobs to delete images that
are not longer referenced by the master table. We'll also have to
consider backup.

Anyone of you has the same problem/database or something similar? I'm
interested by your comments... Things to do, things not to do, pros
and cons, tips to backup, tips to cleanup, comments, etc...

We're on MySQL 3.23.58 and Red Hat. Our database currently has over
100 million records but we're planning on having images for the 12
million most important ones... :)

Waiting for you input.

Thanks!


I've seen the other posts and have to disagree. Keeping images (alot of
them) in the filesystem is too cumbersome. Storing them in the database is
actually a very simple matter and if you take the proper precautions you can
keep your database stable and backups can be easily done. I too have a
situation where I had to make the decision, and I chose database.
Norman
--
Avatar hosting at www.easyavatar.com

Jul 20 '05 #6

P: n/a
Norman Peelman wrote:
I've seen the other posts and have to disagree. Keeping images (alot of
them) in the filesystem is too cumbersome. Storing them in the database is
actually a very simple matter and if you take the proper precautions you can
keep your database stable and backups can be easily done. I too have a
situation where I had to make the decision, and I chose database.


That's a perfectly reasonable strategy. I put the images on the
filesystem in a system I designed in 1992, when database capacities were
smaller, and the programming means to store and fetch blobs were more
awkward than they are today.

But one should be aware of the limitations of MySQL tables even today,
when talking about millions of images. If the average image is about
100KB, and you need to store 12 million of them, that's 1.12 terabytes.

MySQL has docs on table size limitations and limitations imposed by
operating system filesystem types:
http://dev.mysql.com/doc/mysql/en/Table_size.html.

Also one can use MERGE tables to treat multiple tables as one table.

Regards,
Bill K.
Jul 20 '05 #7

P: n/a
"Bill Karwin" <bi**@karwin.com> wrote in message
news:co*********@enews4.newsguy.com...
Norman Peelman wrote:
I've seen the other posts and have to disagree. Keeping images (alot of them) in the filesystem is too cumbersome. Storing them in the database is actually a very simple matter and if you take the proper precautions you can keep your database stable and backups can be easily done. I too have a
situation where I had to make the decision, and I chose database.


That's a perfectly reasonable strategy. I put the images on the
filesystem in a system I designed in 1992, when database capacities were
smaller, and the programming means to store and fetch blobs were more
awkward than they are today.

But one should be aware of the limitations of MySQL tables even today,
when talking about millions of images. If the average image is about
100KB, and you need to store 12 million of them, that's 1.12 terabytes.

MySQL has docs on table size limitations and limitations imposed by
operating system filesystem types:
http://dev.mysql.com/doc/mysql/en/Table_size.html.

Also one can use MERGE tables to treat multiple tables as one table.

Regards,
Bill K.


I figure it this way, due to the way 'files' are stored in the filesystem
(cluster size) where a file takes a multiple of 4k (NTFS), you will always
have 'wasted' disk space and with users constantly uploading and deleting
files you get fragmented really fast - which can lead to problems/slowdowns.
I developed a way to circumvent this by using some PHP code and the
database. Works like a charm.

Norman
--
Avatar hosting at www.easyavatar.com

Jul 20 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.