473,902 Members | 4,628 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Filesystem vs. Postgres for images

Hello,

I am working on web portal. There are some ads. We have about 200 000
ads. Every ad have own directory called ID, where is 5 subdirectories
with various sizes of 5 images.

Filesystem is too slow. But I don't know, if I store these images into
postgres, performace will grow.

Second question is, what kind of hardware I need for storing in DB. Now
I have Intel(R) Pentium(R) 4 CPU 1.70GHz with 512MB RAM and 120GB HDD.

thanx for advices...

miso

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #1
16 8239
Hello,
Filesystem is too slow. But I don't know, if I store these images into
postgres, performace will grow.
but postres also stores its data on the filesystem.
maybe take a better FS like XFS (xfs is very nice and performes good),
imho other filesystems like reiser have some version-problems ;)
so storing images in postres as blob imho is not as fast as XFS, but
maybe you have to do some performance tests.
maybe you can store an index table for your images with path and
filenames - if did not so far, this should speed up your file-search.
also having a lot of ram, at least 1Gb for such a big portal, and scsi
is a good idea, if you want to boost it up, take a dual proz system with
scsi-raid and a lot of ram - and costs a lot of money also :((.
another tip is to activate proxy or httpd-cache and compression or other
caching stuff in php, java etc.

sorry for my broken english.
volker

Michal Hlavac wrote:
Hello,

I am working on web portal. There are some ads. We have about 200 000
ads. Every ad have own directory called ID, where is 5 subdirectories
with various sizes of 5 images.

Filesystem is too slow. But I don't know, if I store these images into
postgres, performace will grow.

Second question is, what kind of hardware I need for storing in DB. Now
I have Intel(R) Pentium(R) 4 CPU 1.70GHz with 512MB RAM and 120GB HDD.

thanx for advices...

miso

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,
imho other filesystems like reiser have some version-problems ;)


Oh please. Reiser is as unstable as postgres is slow - in other words, both
have to suffer prejudice which used to be true loooong ago. ;-)

In cases of large directories ext2/3 perform extremely bad (as in the original
post) So this guy will be better off with anything but ext2/3. That's why I
switched from ext2 to reiser ~2 years ago (without any problems since).

Mit freundlichem Gruß / With kind regards
Holger Klawitter
- --
lists <at> klawitter <dot> de
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAe8dP1Xd t0HKSwgYRAkLqAJ 4vmqkDGkFYDL67a PMAK6qGAavEQgCf ekvV
JCht52XAoXE8Drb XX24B8gc=
=XjOz
-----END PGP SIGNATURE-----
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddres sHere" to ma*******@postg resql.org)

Nov 23 '05 #3
On Tue, 2004-04-13 at 01:44, Michal Hlavac wrote:
Hello,

I am working on web portal. There are some ads. We have about 200 000
ads. Every ad have own directory called ID, where is 5 subdirectories
with various sizes of 5 images.

Filesystem is too slow. But I don't know, if I store these images into
postgres, performace will grow.

Consider breaking your directories up, i.e.:

/ads/(ID % 1000)/ID

I use that for a system with several million images, works great. I
really don't think putting them in the database will do anything
positive for you. :)


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 23 '05 #4
There has got to be some sort of standard way to do this. We have the
same problem where I work. Terabytes of images, but the question is
still sort of around "BLOBs or Files?" Our final decision was to use the
file system. We found that you didn't really gain anything by storing
the images in the DB, other than having one place to get the data from.
The file system approach is much easier to backup, because each image
can be archived separately as well as browsed by 3rd party tools.

-jj-
On Tue, 2004-04-13 at 07:40, Cott Lang wrote:
On Tue, 2004-04-13 at 01:44, Michal Hlavac wrote:
Hello,

I am working on web portal. There are some ads. We have about 200 000
ads. Every ad have own directory called ID, where is 5 subdirectories
with various sizes of 5 images.

Filesystem is too slow. But I don't know, if I store these images into
postgres, performace will grow.

Consider breaking your directories up, i.e.:

/ads/(ID % 1000)/ID

I use that for a system with several million images, works great. I
really don't think putting them in the database will do anything
positive for you. :)


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

--
Jeremiah Jahn <je******@cs.ea rlham.edu>
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #5
Hello,

No standard way that I know of :). We tend to use BLOBS because we can
have associated tables
with metadata about the images that can be searched etc.... Of course
you could that with the filesystem
as well but we find blobs easier.

I will say we tend to use BLOBS or Bytea.

J
Jeremiah Jahn wrote:
There has got to be some sort of standard way to do this. We have the
same problem where I work. Terabytes of images, but the question is
still sort of around "BLOBs or Files?" Our final decision was to use the
file system. We found that you didn't really gain anything by storing
the images in the DB, other than having one place to get the data from.
The file system approach is much easier to backup, because each image
can be archived separately as well as browsed by 3rd party tools.

-jj-
On Tue, 2004-04-13 at 07:40, Cott Lang wrote:

On Tue, 2004-04-13 at 01:44, Michal Hlavac wrote:

Hello,

I am working on web portal. There are some ads. We have about 200 000
ads. Every ad have own directory called ID, where is 5 subdirectories
with various sizes of 5 images.

Filesystem is too slow. But I don't know, if I store these images into
postgres, performace will grow.

Consider breaking your directories up, i.e.:

/ads/(ID % 1000)/ID

I use that for a system with several million images, works great. I
really don't think putting them in the database will do anything
positive for you. :)


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandpromp t.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #6
On Apr 13, 2004, at 9:40 AM, Jeremiah Jahn wrote:
There has got to be some sort of standard way to do this. We have the
same problem where I work. Terabytes of images, but the question is
still sort of around "BLOBs or Files?" Our final decision was to use
the
file system. We found that you didn't really gain anything by storing
the images in the DB, other than having one place to get the data from.
The file system approach is much easier to backup, because each image
can be archived separately as well as browsed by 3rd party tools.


This is a pretty "classic problem," of performance modeling. While it
wasn't images, I worked on a system that had several million small
files (5-100K) that needed to be stored. The performance bottleneck
was a couple of things, in storing them in the FS (the bottleneck is
similar in PostgreSQL):

1. Directory name lookups do not scale well, so keep the number of
files in a directory to a manageable number (100-500).
2. Retrieval time is limited not by disk bandwidth, but by I/O seek
performance. More spindles = more concurrent I/O in flight. Also, this
is where SCSI takes a massive lead with tag-command-queuing.

In our case, we ended up using a three-tier directory structure, so
that we could manage the number of files per directory, and then
because load was relatively even across the top 20 "directorie s", we
split them onto 5 spindle-pairs (i.e. RAID-1). This is a place where
RAID-5 is your enemy. RAID-1, when implemented with read-balancing, is
a substantial performance increase.

Hope this helps. Some of these things apply to PostgreSQL, except
until there's better manageability of TABLESPACE, and the ability to
split tables across multiple spaces, it's going to be hard to hit those
numbers. This is a place where the "big databases" are better. But
then, that's the top 5% of installs. Tradeoffs.

Chris
--
| Christopher Petrilli
| petrilli (at) amber.org
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 23 '05 #7
> Hi,
imho other filesystems like reiser have some version-problems ;)


Oh please. Reiser is as unstable as postgres is slow - in other words, both
have to suffer prejudice which used to be true loooong ago. ;-)

In cases of large directories ext2/3 perform extremely bad (as in the original
post) So this guy will be better off with anything but ext2/3. That's whyI
switched from ext2 to reiser ~2 years ago (without any problems since).

Mit freundlichem Gruß / With kind regards
Holger Klawitter
- --


I use reiserfs, too. Large directories (hundreds of thousand files) does not slow down file retrival, and i never had any problems with stability.

/Mattias
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Nov 23 '05 #8
I tried the bytea types, but the parsing done by the system on insert
etc. was so bad that it made it usable for me. Our solution is to keep
all of the metadata in the db plus an id and then a web service that
gets the image from the FS.

On Tue, 2004-04-13 at 09:05, Joshua D. Drake wrote:
Hello,

No standard way that I know of :). We tend to use BLOBS because we can
have associated tables
with metadata about the images that can be searched etc.... Of course
you could that with the filesystem
as well but we find blobs easier.

I will say we tend to use BLOBS or Bytea.

J
Jeremiah Jahn wrote:
There has got to be some sort of standard way to do this. We have the
same problem where I work. Terabytes of images, but the question is
still sort of around "BLOBs or Files?" Our final decision was to use the
file system. We found that you didn't really gain anything by storing
the images in the DB, other than having one place to get the data from.
The file system approach is much easier to backup, because each image
can be archived separately as well as browsed by 3rd party tools.

-jj-
On Tue, 2004-04-13 at 07:40, Cott Lang wrote:

On Tue, 2004-04-13 at 01:44, Michal Hlavac wrote:
Hello,

I am working on web portal. There are some ads. We have about 200 000
ads. Every ad have own directory called ID, where is 5 subdirectories
with various sizes of 5 images.

Filesystem is too slow. But I don't know, if I store these images into
postgres, performace will grow.
Consider breaking your directories up, i.e.:

/ads/(ID % 1000)/ID

I use that for a system with several million images, works great. I
really don't think putting them in the database will do anything
positive for you. :)


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

--
Jeremiah Jahn <je******@cs.ea rlham.edu>
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 23 '05 #9
> I am working on web portal. There are some ads. We have about 200 000
ads. Every ad have own directory called ID, where is 5 subdirectories
with various sizes of 5 images.

Filesystem is too slow. But I don't know, if I store these images into
postgres, performace will grow.


Certainly the problem you are experiencing is because you have 200,000
directories, and directory lookups are not scaling well.

I had a look at this a few weeks ago for an email storage application.
Using a filesystem with better directory lookup performance (? xfs,
resiserfs, jfs)is one obvious solution, as is storing the data in the
database. If you want to use files in an ext2/3 filesystem, you need to
break up the directories into a hierarchy.

I did some web research trying to find numbers for how many entries you
can get away with in an ext2/3 filesystem before the lookup time starts
to bite. I didn't find very much useful data. The best answer I got
was "between 100 and 1000". Since my identifiers are decimail numbers,
I had a choice of breaking them up into groups of two or three (i.e.
12/34/56/78 or 012/345/678). I went for groups of two and it works
well. Certainly this is not the limiting factor in the system as a
whole.

Looking back, I wonder if I should have gone for groups of three. Is
the lookup time a function of the number of entries in the directory, or
the size (in bytes) of the directory? Since my directory names are
short in this scheme, I get more directory entries per disk block.

One other thing to mention: have you turned off access time (atime)
logging for the filesystem? (man mount)

--Phil.


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 23 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
3687
by: cover | last post by:
The question is, we have two options to store images, either in a Database (MySQL, Postgres, ...) like blob data, or in the hard disk the file and the path in database. Which option is better? When? Why? Thanks you for your answers.
7
3496
by: Benoit St-Jean | last post by:
I am looking at options/ways to store 12 million gif/jpg images in a database. Either we store a link to the file or we store the image itself in the database. Images will range from 4k to 35k in size and there will be 12 millions of them (at the beginning)... I expect a 8% growth every year. We will also have to perform some cleanup jobs to delete images that are not longer referenced by the master table. We'll also have to...
6
3057
by: bissatch | last post by:
Hi, I am currently writing a news admin system. I would like to add the ability to add images to each article. What I have always done in the past is uploaded (using a form) the image to a folder on the server and then in the database table that I INSERT the news article, I'll store the path of the uploaded image. To me this seems a bad idea as if the image paths were changed on the
48
11963
by: Edwin Quijada | last post by:
Hi !! Everybody I am developing app using Delphi and I have a question: I have to save pictures into my database. Each picture has 20 o 30k aprox. What is the way more optimus? That 's table will have 500000 records around. Somebody said the best way to do that was encoder the picture to field bytea but I dont know about this. Another way is save the path to the picture file but I dont like so much because I need to write to disk by OS...
1
6301
by: Matthew Hixson | last post by:
I am currently working on a Java web application in which we are making use of the JDBC driver for Postgres 7.4.1. Part of our application allows the administrators to manage a large number of small images, most of them not exceeding 5KB. There is about a gigabyte of these small files. We're currently storing the files on disk and the other information about the file in the database (historical reasons that I won't complain about here)....
3
1931
by: Bernhard Ankenbrand | last post by:
Hi, we have a table width about 60.000.000 entrys and about 4GB storage size. When creating an index on this table the whole linux box freezes and the reiser-fs file system is corrupted on not recoverable. Does anybody have experience with this amount of data in postgres 7.4.2? Is there a limit anywhere? Thanks
0
2710
by: NM | last post by:
Hello, I've got a problem inserting binary objects into the postgres database. I have binary objects (e.g. images or smth else) of any size which I want to insert into the database. Funny is it works for files larger than 8000 Bytes. If a file is less than 1000 Bytes I get the following message: Error message: --invalid input syntax for type oid: "\074\077......";
1
1291
by: ttamilvanan81 | last post by:
Hai everyone, i am new to javascript. Now i have doing one Image gallary application. In that application i have upload two images, one for Befor image and another one for after image. All those images are stored into the filesystem, the images names only stored into the database. Left side of page: Before Pictures (One picture showing, date, description details). On picture shown (if there are other pictures, there is a button...
7
2010
by: Keith Hughitt | last post by:
Hi all, I am having trouble preloading images in a javascript application, and was wondering if anyone had any suggestions. Basically I have a bunch of images stored in a database as BLOBs. At any given point in time a subset of those images is displayed on- screen. At certains times I want to swap out those on screen with new ones from the database, and do some as seamlessly as possible. So what I've tried to do is first create Image...
0
9845
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
10981
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10499
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9673
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
8047
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5893
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4725
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4306
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3323
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.