473,387 Members | 1,790 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Storing 12 millions images

I am looking at options/ways to store 12 million gif/jpg images in a
database. Either we store a link to the file or we store the image
itself in the database. Images will range from 4k to 35k in size and
there will be 12 millions of them (at the beginning)... I expect a 8%
growth every year.

We will also have to perform some cleanup jobs to delete images that
are not longer referenced by the master table. We'll also have to
consider backup.

Anyone of you has the same problem/database or something similar? I'm
interested by your comments... Things to do, things not to do, pros
and cons, tips to backup, tips to cleanup, comments, etc...

We're on MySQL 3.23.58 and Red Hat. Our database currently has over
100 million records but we're planning on having images for the 12
million most important ones... :)

Waiting for you input.

Thanks!
Jul 20 '05 #1
7 3463
Benoit St-Jean wrote:
I am looking at options/ways to store 12 million gif/jpg images in a
database. Either we store a link to the file or we store the image
itself in the database. Images will range from 4k to 35k in size and
there will be 12 millions of them (at the beginning)... I expect a 8%
growth every year.

We will also have to perform some cleanup jobs to delete images that
are not longer referenced by the master table. We'll also have to
consider backup.

Anyone of you has the same problem/database or something similar? I'm
interested by your comments... Things to do, things not to do, pros
and cons, tips to backup, tips to cleanup, comments, etc...

We're on MySQL 3.23.58 and Red Hat. Our database currently has over
100 million records but we're planning on having images for the 12
million most important ones... :)


I personally think putting images into a database is a really bad idea,
and that you're best to store a link to the filename in the database.

The problem when you're dealing with that many files is that you don't
want to clutter one directory with millions of files. Your operating
system / file system may have limits on the number of files in a
directory, and access to those files will be slow if there are so many.

To get around this, you can have multiple levels of directories based on
the first few characters of the filename. So for example if the file is
called foo3455.jpg for example, its file location might
be /images/f/o/o/foo3455.jpg which should leave you with only a few
hundred images in each third level directory, depending on file naming
conventions.

--
Chris Hope - The Electric Toolbox - http://www.electrictoolbox.com/
Jul 20 '05 #2
Chris Hope wrote:
To get around this, you can have multiple levels of directories based on
the first few characters of the filename. So for example if the file is
called foo3455.jpg for example, its file location might
be /images/f/o/o/foo3455.jpg which should leave you with only a few
hundred images in each third level directory, depending on file naming


Of if filenames are too similar for that, or you are naming the files by
the application side, you could just give id for each file and name the
file according to the id (1.jpg, 2.jpg, 3.jpg , etc.) And place the
files into numbered folders, like this:

1-1000000
|- 1-100000
|- 1-10000
|- 20001-30000
|- 30001-40000
|- ...
|- 100001-200000
|- 200001-300000
|- 300001-400000
|- ...
1000001-2000000
|- ...
2000002-3000000
|- ...
So folder named "1-1000000" would contain all sub folders and files that
contain images between 1 to 1000000. And sub folder named "1-100000"
would contain all sub folders and files between 1 to 100000 etc. The
example system would allow you to have exatctly 10000 images for each
folder, but you could expand it to have for example 100 images for each
folder if you like to. And while this system has only 10 sub folders
under each folder, you could instead have 100 or more. Like this:

1-1000000
|- 1-10000
|- 20001-30000
|- 30001-40000
|- ...
|- 980001-990000
|- 990001-1000000
1000001-2000000
|- 1000001-1010000
|- 1020001-1030000
|- ...
2000002-3000000
|- ...

10000 might be too much files under each folder, so you might want to
fix this a little more to something like 1000 sub folders under each
folder and 1000 images under each folder, or something like that.

Just a suggestion anyway ;)
Jul 20 '05 #3
Aggro wrote:
Chris Hope wrote:
To get around this, you can have multiple levels of directories based
on the first few characters of the filename. So for example if the
file is called foo3455.jpg for example, its file location might
be /images/f/o/o/foo3455.jpg which should leave you with only a few
hundred images in each third level directory, depending on file
naming


Of if filenames are too similar for that, or you are naming the files
by the application side, you could just give id for each file and name
the file according to the id (1.jpg, 2.jpg, 3.jpg , etc.) And place
the files into numbered folders, like this:

1-1000000
|- 1-100000
|- 1-10000
|- 20001-30000
|- 30001-40000
|- ...
|- 100001-200000
|- 200001-300000
|- 300001-400000
|- ...
1000001-2000000
|- ...
2000002-3000000
|- ...
So folder named "1-1000000" would contain all sub folders and files
that contain images between 1 to 1000000. And sub folder named
"1-100000" would contain all sub folders and files between 1 to 100000
etc. The example system would allow you to have exatctly 10000 images
for each folder, but you could expand it to have for example 100
images for each folder if you like to. And while this system has only
10 sub folders under each folder, you could instead have 100 or more.
Like this:

1-1000000
|- 1-10000
|- 20001-30000
|- 30001-40000
|- ...
|- 980001-990000
|- 990001-1000000
1000001-2000000
|- 1000001-1010000
|- 1020001-1030000
|- ...
2000002-3000000
|- ...

10000 might be too much files under each folder, so you might want to
fix this a little more to something like 1000 sub folders under each
folder and 1000 images under each folder, or something like that.

Just a suggestion anyway ;)


Good thinking too. My idea was just really a starting point because it
really all depends on what sort of naming exists for the filenames.

The first time I came across this method of storing data/files was when
I was studying and one of our lecturers was talking about a huge phone
number database. I'm not sure why they didn't just use a relational
database, but for whatever reason they stored the data on the file
system and used a directory structure like eg 1/2/3/5/5/5/7/8/90.dat
where the first three digits were the area code and the rest of them
were the start of the phone number etc.

--
Chris Hope - The Electric Toolbox - http://www.electrictoolbox.com/
Jul 20 '05 #4
Chris Hope wrote:
To get around this, you can have multiple levels of directories based on
the first few characters of the filename. So for example if the file is
called foo3455.jpg for example, its file location might
be /images/f/o/o/foo3455.jpg which should leave you with only a few
hundred images in each third level directory, depending on file naming
conventions.


I agree in principle with this plan, and I can think of another
advantage: because Linux filesystems can mount at any directory, you
can make /images/f be mounted from a separate hard drive partition from
/images/a, /images/b, /images/c, /images/d, etc. Or even do the mounts
at the next level down.

This gives you a lot more flexibility of mapping logical storage to
physical storage than MySQL table files generally give you.

Regards,
Bill K.
Jul 20 '05 #5
"Benoit St-Jean" <bs*****@yahoo.com> wrote in message
news:35**************************@posting.google.c om...
I am looking at options/ways to store 12 million gif/jpg images in a
database. Either we store a link to the file or we store the image
itself in the database. Images will range from 4k to 35k in size and
there will be 12 millions of them (at the beginning)... I expect a 8%
growth every year.

We will also have to perform some cleanup jobs to delete images that
are not longer referenced by the master table. We'll also have to
consider backup.

Anyone of you has the same problem/database or something similar? I'm
interested by your comments... Things to do, things not to do, pros
and cons, tips to backup, tips to cleanup, comments, etc...

We're on MySQL 3.23.58 and Red Hat. Our database currently has over
100 million records but we're planning on having images for the 12
million most important ones... :)

Waiting for you input.

Thanks!


I've seen the other posts and have to disagree. Keeping images (alot of
them) in the filesystem is too cumbersome. Storing them in the database is
actually a very simple matter and if you take the proper precautions you can
keep your database stable and backups can be easily done. I too have a
situation where I had to make the decision, and I chose database.
Norman
--
Avatar hosting at www.easyavatar.com

Jul 20 '05 #6
Norman Peelman wrote:
I've seen the other posts and have to disagree. Keeping images (alot of
them) in the filesystem is too cumbersome. Storing them in the database is
actually a very simple matter and if you take the proper precautions you can
keep your database stable and backups can be easily done. I too have a
situation where I had to make the decision, and I chose database.


That's a perfectly reasonable strategy. I put the images on the
filesystem in a system I designed in 1992, when database capacities were
smaller, and the programming means to store and fetch blobs were more
awkward than they are today.

But one should be aware of the limitations of MySQL tables even today,
when talking about millions of images. If the average image is about
100KB, and you need to store 12 million of them, that's 1.12 terabytes.

MySQL has docs on table size limitations and limitations imposed by
operating system filesystem types:
http://dev.mysql.com/doc/mysql/en/Table_size.html.

Also one can use MERGE tables to treat multiple tables as one table.

Regards,
Bill K.
Jul 20 '05 #7
"Bill Karwin" <bi**@karwin.com> wrote in message
news:co*********@enews4.newsguy.com...
Norman Peelman wrote:
I've seen the other posts and have to disagree. Keeping images (alot of them) in the filesystem is too cumbersome. Storing them in the database is actually a very simple matter and if you take the proper precautions you can keep your database stable and backups can be easily done. I too have a
situation where I had to make the decision, and I chose database.


That's a perfectly reasonable strategy. I put the images on the
filesystem in a system I designed in 1992, when database capacities were
smaller, and the programming means to store and fetch blobs were more
awkward than they are today.

But one should be aware of the limitations of MySQL tables even today,
when talking about millions of images. If the average image is about
100KB, and you need to store 12 million of them, that's 1.12 terabytes.

MySQL has docs on table size limitations and limitations imposed by
operating system filesystem types:
http://dev.mysql.com/doc/mysql/en/Table_size.html.

Also one can use MERGE tables to treat multiple tables as one table.

Regards,
Bill K.


I figure it this way, due to the way 'files' are stored in the filesystem
(cluster size) where a file takes a multiple of 4k (NTFS), you will always
have 'wasted' disk space and with users constantly uploading and deleting
files you get fragmented really fast - which can lead to problems/slowdowns.
I developed a way to circumvent this by using some PHP code and the
database. Works like a charm.

Norman
--
Avatar hosting at www.easyavatar.com

Jul 20 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: bissatch | last post by:
Hi, I am currently writing a news admin system. I would like to add the ability to add images to each article. What I have always done in the past is uploaded (using a form) the image to a...
4
by: Rednelle | last post by:
Greetings all, As a newbie, using Access 2000, I would appreciate advice on the best way to include pictures. I have developed a 'Home Inventory' database which can include jpeg thumbnails of...
6
by: stenospamron | last post by:
Does anyone have any advice as to how to get JPG images into an OLE Object field? I have created a table that includes this data type, and allowed Access wizzard to generate a form. I wish to...
9
by: Adam J Knight | last post by:
Hi all, Just wondering whats everyones prefered method of storing images ? 1) File System 2) Database (SqlServer) (Seems to be easier, but has a performance hit) Appreciate some insight!!!...
3
by: Kevin | last post by:
Anyone hashing or storing URLs for later lookup? I was curious for the best practices on storing such a wide column that needs indexing and if there were alternatives. We have a table an are...
4
by: lorirobn | last post by:
Hi, I need to add photos to my database. Back End is on MS SQL Server (I believe 2000), and Front End is on MS Access. I have read about storing the photos as BLOBS, but I am not sure how to...
2
by: Paulo | last post by:
Hi, how are you ? Can you send me any examples about storing images on bd and showing them on gridview? Im using VS2005 asp.net 2.0 C# Thanks a lot!
1
by: Jonathan Wood | last post by:
My site includes a feature that allows users to upload an image. (Never more than one image per user.) I've been considering storing these uploaded images as a file on the server with a filename...
6
by: Carl Banks | last post by:
I was wondering if anyone had any advice on this. This is not to study graph theory; I'm using the graph to represent a problem domain. The graphs could be arbitrarily large, and could easily...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.