364,085 Members | 5234 Browsing Online
Community for Developers & IT Professionals
Bytes IT Community

Storing 12 millions images

Benoit St-Jean
P: n/a
Benoit St-Jean
I am looking at options/ways to store 12 million gif/jpg images in a
database. Either we store a link to the file or we store the image
itself in the database. Images will range from 4k to 35k in size and
there will be 12 millions of them (at the beginning)... I expect a 8%
growth every year.

We will also have to perform some cleanup jobs to delete images that
are not longer referenced by the master table. We'll also have to
consider backup.

Anyone of you has the same problem/database or something similar? I'm
interested by your comments... Things to do, things not to do, pros
and cons, tips to backup, tips to cleanup, comments, etc...

We're on MySQL 3.23.58 and Red Hat. Our database currently has over
100 million records but we're planning on having images for the 12
million most important ones... :)

Waiting for you input.

Thanks!
Jul 20 '05 #1
Share this Question
Share on Google+
7 Replies


Chris Hope
P: n/a
Chris Hope
Benoit St-Jean wrote:
[color=blue]
> I am looking at options/ways to store 12 million gif/jpg images in a
> database. Either we store a link to the file or we store the image
> itself in the database. Images will range from 4k to 35k in size and
> there will be 12 millions of them (at the beginning)... I expect a 8%
> growth every year.
>
> We will also have to perform some cleanup jobs to delete images that
> are not longer referenced by the master table. We'll also have to
> consider backup.
>
> Anyone of you has the same problem/database or something similar? I'm
> interested by your comments... Things to do, things not to do, pros
> and cons, tips to backup, tips to cleanup, comments, etc...
>
> We're on MySQL 3.23.58 and Red Hat. Our database currently has over
> 100 million records but we're planning on having images for the 12
> million most important ones... :)[/color]

I personally think putting images into a database is a really bad idea,
and that you're best to store a link to the filename in the database.

The problem when you're dealing with that many files is that you don't
want to clutter one directory with millions of files. Your operating
system / file system may have limits on the number of files in a
directory, and access to those files will be slow if there are so many.

To get around this, you can have multiple levels of directories based on
the first few characters of the filename. So for example if the file is
called foo3455.jpg for example, its file location might
be /images/f/o/o/foo3455.jpg which should leave you with only a few
hundred images in each third level directory, depending on file naming
conventions.

--
Chris Hope - The Electric Toolbox - http://www.electrictoolbox.com/
Jul 20 '05 #2

Aggro
P: n/a
Aggro
Chris Hope wrote:
[color=blue]
> To get around this, you can have multiple levels of directories based on
> the first few characters of the filename. So for example if the file is
> called foo3455.jpg for example, its file location might
> be /images/f/o/o/foo3455.jpg which should leave you with only a few
> hundred images in each third level directory, depending on file naming[/color]

Of if filenames are too similar for that, or you are naming the files by
the application side, you could just give id for each file and name the
file according to the id (1.jpg, 2.jpg, 3.jpg , etc.) And place the
files into numbered folders, like this:

1-1000000
|- 1-100000
|- 1-10000
|- 20001-30000
|- 30001-40000
|- ...
|- 100001-200000
|- 200001-300000
|- 300001-400000
|- ...
1000001-2000000
|- ...
2000002-3000000
|- ...


So folder named "1-1000000" would contain all sub folders and files that
contain images between 1 to 1000000. And sub folder named "1-100000"
would contain all sub folders and files between 1 to 100000 etc. The
example system would allow you to have exatctly 10000 images for each
folder, but you could expand it to have for example 100 images for each
folder if you like to. And while this system has only 10 sub folders
under each folder, you could instead have 100 or more. Like this:

1-1000000
|- 1-10000
|- 20001-30000
|- 30001-40000
|- ...
|- 980001-990000
|- 990001-1000000
1000001-2000000
|- 1000001-1010000
|- 1020001-1030000
|- ...
2000002-3000000
|- ...

10000 might be too much files under each folder, so you might want to
fix this a little more to something like 1000 sub folders under each
folder and 1000 images under each folder, or something like that.

Just a suggestion anyway ;)
Jul 20 '05 #3

Chris Hope
P: n/a
Chris Hope
Aggro wrote:
[color=blue]
> Chris Hope wrote:
>[color=green]
>> To get around this, you can have multiple levels of directories based
>> on the first few characters of the filename. So for example if the
>> file is called foo3455.jpg for example, its file location might
>> be /images/f/o/o/foo3455.jpg which should leave you with only a few
>> hundred images in each third level directory, depending on file
>> naming[/color]
>
> Of if filenames are too similar for that, or you are naming the files
> by the application side, you could just give id for each file and name
> the file according to the id (1.jpg, 2.jpg, 3.jpg , etc.) And place
> the files into numbered folders, like this:
>
> 1-1000000
> |- 1-100000
> |- 1-10000
> |- 20001-30000
> |- 30001-40000
> |- ...
> |- 100001-200000
> |- 200001-300000
> |- 300001-400000
> |- ...
> 1000001-2000000
> |- ...
> 2000002-3000000
> |- ...
>
>
> So folder named "1-1000000" would contain all sub folders and files
> that contain images between 1 to 1000000. And sub folder named
> "1-100000" would contain all sub folders and files between 1 to 100000
> etc. The example system would allow you to have exatctly 10000 images
> for each folder, but you could expand it to have for example 100
> images for each folder if you like to. And while this system has only
> 10 sub folders under each folder, you could instead have 100 or more.
> Like this:
>
> 1-1000000
> |- 1-10000
> |- 20001-30000
> |- 30001-40000
> |- ...
> |- 980001-990000
> |- 990001-1000000
> 1000001-2000000
> |- 1000001-1010000
> |- 1020001-1030000
> |- ...
> 2000002-3000000
> |- ...
>
> 10000 might be too much files under each folder, so you might want to
> fix this a little more to something like 1000 sub folders under each
> folder and 1000 images under each folder, or something like that.
>
> Just a suggestion anyway ;)[/color]

Good thinking too. My idea was just really a starting point because it
really all depends on what sort of naming exists for the filenames.

The first time I came across this method of storing data/files was when
I was studying and one of our lecturers was talking about a huge phone
number database. I'm not sure why they didn't just use a relational
database, but for whatever reason they stored the data on the file
system and used a directory structure like eg 1/2/3/5/5/5/7/8/90.dat
where the first three digits were the area code and the rest of them
were the start of the phone number etc.

--
Chris Hope - The Electric Toolbox - http://www.electrictoolbox.com/
Jul 20 '05 #4

Bill Karwin
P: n/a
Bill Karwin
Chris Hope wrote:[color=blue]
> To get around this, you can have multiple levels of directories based on
> the first few characters of the filename. So for example if the file is
> called foo3455.jpg for example, its file location might
> be /images/f/o/o/foo3455.jpg which should leave you with only a few
> hundred images in each third level directory, depending on file naming
> conventions.[/color]

I agree in principle with this plan, and I can think of another
advantage: because Linux filesystems can mount at any directory, you
can make /images/f be mounted from a separate hard drive partition from
/images/a, /images/b, /images/c, /images/d, etc. Or even do the mounts
at the next level down.

This gives you a lot more flexibility of mapping logical storage to
physical storage than MySQL table files generally give you.

Regards,
Bill K.
Jul 20 '05 #5

Norman Peelman
P: n/a
Norman Peelman
"Benoit St-Jean" <bstjean@yahoo.com> wrote in message
news:3540ac3e.0411301147.68f810ed@posting.google.c om...[color=blue]
> I am looking at options/ways to store 12 million gif/jpg images in a
> database. Either we store a link to the file or we store the image
> itself in the database. Images will range from 4k to 35k in size and
> there will be 12 millions of them (at the beginning)... I expect a 8%
> growth every year.
>
> We will also have to perform some cleanup jobs to delete images that
> are not longer referenced by the master table. We'll also have to
> consider backup.
>
> Anyone of you has the same problem/database or something similar? I'm
> interested by your comments... Things to do, things not to do, pros
> and cons, tips to backup, tips to cleanup, comments, etc...
>
> We're on MySQL 3.23.58 and Red Hat. Our database currently has over
> 100 million records but we're planning on having images for the 12
> million most important ones... :)
>
> Waiting for you input.
>
> Thanks![/color]

I've seen the other posts and have to disagree. Keeping images (alot of
them) in the filesystem is too cumbersome. Storing them in the database is
actually a very simple matter and if you take the proper precautions you can
keep your database stable and backups can be easily done. I too have a
situation where I had to make the decision, and I chose database.


Norman
--
Avatar hosting at www.easyavatar.com



Jul 20 '05 #6

Bill Karwin
P: n/a
Bill Karwin
Norman Peelman wrote:[color=blue]
> I've seen the other posts and have to disagree. Keeping images (alot of
> them) in the filesystem is too cumbersome. Storing them in the database is
> actually a very simple matter and if you take the proper precautions you can
> keep your database stable and backups can be easily done. I too have a
> situation where I had to make the decision, and I chose database.[/color]

That's a perfectly reasonable strategy. I put the images on the
filesystem in a system I designed in 1992, when database capacities were
smaller, and the programming means to store and fetch blobs were more
awkward than they are today.

But one should be aware of the limitations of MySQL tables even today,
when talking about millions of images. If the average image is about
100KB, and you need to store 12 million of them, that's 1.12 terabytes.

MySQL has docs on table size limitations and limitations imposed by
operating system filesystem types:
http://dev.mysql.com/doc/mysql/en/Table_size.html.

Also one can use MERGE tables to treat multiple tables as one table.

Regards,
Bill K.
Jul 20 '05 #7

Norman Peelman
P: n/a
Norman Peelman
"Bill Karwin" <bill@karwin.com> wrote in message
news:cotg6s02at0@enews4.newsguy.com...[color=blue]
> Norman Peelman wrote:[color=green]
> > I've seen the other posts and have to disagree. Keeping images (alot[/color][/color]
of[color=blue][color=green]
> > them) in the filesystem is too cumbersome. Storing them in the database[/color][/color]
is[color=blue][color=green]
> > actually a very simple matter and if you take the proper precautions you[/color][/color]
can[color=blue][color=green]
> > keep your database stable and backups can be easily done. I too have a
> > situation where I had to make the decision, and I chose database.[/color]
>
> That's a perfectly reasonable strategy. I put the images on the
> filesystem in a system I designed in 1992, when database capacities were
> smaller, and the programming means to store and fetch blobs were more
> awkward than they are today.
>
> But one should be aware of the limitations of MySQL tables even today,
> when talking about millions of images. If the average image is about
> 100KB, and you need to store 12 million of them, that's 1.12 terabytes.
>
> MySQL has docs on table size limitations and limitations imposed by
> operating system filesystem types:
> http://dev.mysql.com/doc/mysql/en/Table_size.html.
>
> Also one can use MERGE tables to treat multiple tables as one table.
>
> Regards,
> Bill K.[/color]

I figure it this way, due to the way 'files' are stored in the filesystem
(cluster size) where a file takes a multiple of 4k (NTFS), you will always
have 'wasted' disk space and with users constantly uploading and deleting
files you get fragmented really fast - which can lead to problems/slowdowns.
I developed a way to circumvent this by using some PHP code and the
database. Works like a charm.

Norman


--
Avatar hosting at www.easyavatar.com



Jul 20 '05 #8

Post your reply

Help answer this question



Didn't find the answer to your MySQL Database question?

You can also browse similar questions: MySQL Database