how to secure documents in server

RAZZ wrote:

Hello, Can anyone suggest me solution?

I Need to manage different types of documents (doc,xls,ppt etc) in
server. I have folder structure to maintain these documents in server.

Say folder1 is having all doc files; folder2 is having all xls files
and so on.
Now these documents should not be able to get access through the url
by directly typing path.
E-g if I try to access directly www.mywebsite.com/folder1/xyz.doc it
will open the document in browser itself.
At the same time these documents should be access only through our
website once they are login. But without login also if you know the
path you can get these documents how should I avoid it?

How can I provide security to these documents in server?

Pur ALL thes documents as large BLOB objects in a database: thats one
easy place to store them and one access methodd needed to restrict
access to what you want.

Jul 18 '08 #10

RAZZ wrote:

On Jul 18, 3:50 pm, Captain Paralytic <paul_laut...@yahoo.comwrote:
>On 18 Jul, 11:36, RAZZ <rajat82.gu...@gmail.comwrote:

>>>You're not going to be able to do much on yahoo server I'm afraid. The
most common way to do this is to store the files outside of the web
root and use a php script to deliver the file.
I suggest you change hosts. There are much better value ones out there.
thank you for response can you suggest me bit in details regarding
"storing files outside of the web root and use a php script to deliver
the file"?
Actually another way to do it is to store the files in a BLOB field in
a database and delivering them from there. Here is a tutorial for that
and you could adapt it for the file system version:http://www.php-mysql-tutorial.com/php-mysql-upload.php

That was really very good option but i have documents or doc files
which contains images and tables while downloading text are fine but
images and tables are coming in some encrypted format?

As ling as they are encapuslated IN the file, that doesn't matter. a
data base will store any file.

Jul 18 '08 #11

Captain Paralytic wrote:

Actually another way to do it is to store the files in a BLOB field in
a database and delivering them from there. Here is a tutorial for that
and you could adapt it for the file system version:
http://www.php-mysql-tutorial.com/php-mysql-upload.php

I'm surprised this document doesn't mention how disastrous it can be
for the performance of a database. Only use for tiny binary data and a
limited amount of records, I'ld say... I would even vote to dismiss
LONGBLOB; it often creates more problems than it solves.

--
Bart

Jul 18 '08 #12

Pugi!

I do not know the details of your provider or host but if you can
store your documents outside of your documentroot, no one can access
your files directly. You can use php to store them and retrieve them.
I store the filename and mimetype in database (and some other
information), files are stored in a directory outside documentroot
where apache has read/write access (because users are allowed to
upload documents) (in my case the documents are even stored on another
server with NFS share). Once you obtained the filename and mimetype
from database and path from config file:

header("Cache-Control: max-age=60");
header('Content-type: ' . $filemime);
header("Content-Disposition: attachment; filename=\"" . $filename .
"\"");
readfile($filepath . $filename);

It not only downloads the file but also asks if you want to open it
with the associated program (MS Word or OO Writer for *.doc, ...)

Pugi!

On 18 jul, 12:05, RAZZ <rajat82.gu...@gmail.comwrote:

Hello, Can anyone suggest me solution?

I Need to manage different types of documents (doc,xls,ppt etc) in
server. I have folder structure to maintain these documents in server.

Say folder1 is having all doc files; folder2 is having all xls files
and so on.

Now these documents should not be able to get access through the url
by directly typing path.
E-g if I try to access directlywww.mywebsite.com/folder1/xyz.docit
will open the document in browser itself.
At the same time these documents should be access only through our
website once they are login. But without login also if you know the
path you can get these documents how should I avoid it?

How can I provide security to these documents in server?

Jul 18 '08 #13

J.O. Aho

RAZZ wrote:

Hello, Can anyone suggest me solution?

There been those who already mentioned to store the files outside the web
servers "document root", this is the most secure method (of course depending
on the security of the script/application that supplies the file, in worst
case this can endanger the security of the whole server).

..htaccess and similar web server restrictions has the draw back that not
everyone offers this and it can be easy to do it the wrong way when
unexperienced with web server configuration.

The idea of storing binary files in a database is quite good, but it will
affect the sql server in a negative way, specially the larger the binary files
are.

A fourth method is to encrypt the files and store them in the "document root",
and the special download script decodes the file when downloaded by someone
with access to get the decrypted file. (this can be combined with all the
other methods too), this way someone accessing the file directly can't use it.

A lot simpler way is to rename the files to something quite random (md5 hash
the name, don't forget to salt it), store the hashed filename in a database
table where you have the original filename too. The download script in this
case will take an argument of the original filename, look in the database for
the hashed name, provides the file to the user (with header you send it as the
original name), this way you can't get the file with direct download unless
you know the hashed file name. If you combine this one with the previous
method, you should have a quite good false security on the files.

--

//Aho

Jul 18 '08 #14

J.O. Aho wrote:

>
The idea of storing binary files in a database is quite good, but it
will affect the sql server in a negative way, specially the larger the
binary files are.

Ok, why should it take longer to pull a large file out of one locatin in
a database than one location in a filesssytem?

IME the things that slow databases down are not getting data out of
them, its performing complex relational queries.

Jul 18 '08 #15

Bart Van der Donck wrote:

Captain Paralytic wrote:

>Actually another way to do it is to store the files in a BLOB field
in a database and delivering them from there. Here is a tutorial for
that and you could adapt it for the file system version:
http://www.php-mysql-tutorial.com/php-mysql-upload.php

I'm surprised this document doesn't mention how disastrous it can be
for the performance of a database.

It doesn't because it isn't

Only use for tiny binary data and a
limited amount of records, I'ld say... I would even vote to dismiss
LONGBLOB; it often creates more problems than it solves.

I usually chunk the files into BLOBs

Jul 18 '08 #16

The Natural Philosopher wrote:

J.O. Aho wrote:
>>
The idea of storing binary files in a database is quite good, but it
will affect the sql server in a negative way, specially the larger
the binary files are.

Ok, why should it take longer to pull a large file out of one locatin
in a database than one location in a filesssytem?

IME the things that slow databases down are not getting data out of
them, its performing complex relational queries.

I have tested this and I have found it slightly slower to get files from a
database table than from the file system. Then again, it is slightly slower
building pages dynamically with php/MySQL than it is to serve fixed html
pages. So basically, when I find that storing files in a database is the
best way to handle the application I am writing, that's the way I do it.

Jul 18 '08 #17

Bart Van der Donck wrote:

Captain Paralytic wrote:

>Actually another way to do it is to store the files in a BLOB field in
a database and delivering them from there. Here is a tutorial for that
and you could adapt it for the file system version:
http://www.php-mysql-tutorial.com/php-mysql-upload.php

I'm surprised this document doesn't mention how disastrous it can be
for the performance of a database. Only use for tiny binary data and a
limited amount of records, I'ld say... I would even vote to dismiss
LONGBLOB; it often creates more problems than it solves.

--
Bart

You're just using the database for what it's made for - storing and
accessing data. It's not at all disastrous - in fact, if you get enough
files in the database, performance may actually improve over that file
system's.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Jul 18 '08 #18

J.O. Aho wrote:

RAZZ wrote:
>Hello, Can anyone suggest me solution?

There been those who already mentioned to store the files outside the
web servers "document root", this is the most secure method (of course
depending on the security of the script/application that supplies the
file, in worst case this can endanger the security of the whole server).

.htaccess and similar web server restrictions has the draw back that not
everyone offers this and it can be easy to do it the wrong way when
unexperienced with web server configuration.

The idea of storing binary files in a database is quite good, but it
will affect the sql server in a negative way, specially the larger the
binary files are.

A common misconception by those who haven't used databases for storing
large amounts of data. Properly configured, the database will have
excellent performance.

A fourth method is to encrypt the files and store them in the "document
root", and the special download script decodes the file when downloaded
by someone with access to get the decrypted file. (this can be combined
with all the other methods too), this way someone accessing the file
directly can't use it.

A lot simpler way is to rename the files to something quite random (md5
hash the name, don't forget to salt it), store the hashed filename in a
database table where you have the original filename too. The download
script in this case will take an argument of the original filename, look
in the database for the hashed name, provides the file to the user (with
header you send it as the original name), this way you can't get the
file with direct download unless you know the hashed file name. If you
combine this one with the previous method, you should have a quite good
false security on the files.

Even worse performance than storing the data in the database in the
first place. More overhead for the scripting language, while no
significant savings on the database end.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Jul 18 '08 #19

Paul Lautman wrote:

The Natural Philosopher wrote:
>J.O. Aho wrote:
>>The idea of storing binary files in a database is quite good, but it
will affect the sql server in a negative way, specially the larger
the binary files are.

Ok, why should it take longer to pull a large file out of one locatin
in a database than one location in a filesssytem?

IME the things that slow databases down are not getting data out of
them, its performing complex relational queries.

I have tested this and I have found it slightly slower to get files from a
database table than from the file system. Then again, it is slightly slower
building pages dynamically with php/MySQL than it is to serve fixed html
pages. So basically, when I find that storing files in a database is the
best way to handle the application I am writing, that's the way I do it.

Paul,

But try putting 100K files in a directory on the file system and see how
much it slows things down. Whereas the database will hardly notice any
performance decrease.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Jul 18 '08 #20

Jorge

On Jul 18, 8:58*pm, The Natural Philosopher <a...@b.cwrote:

J.O. Aho wrote:

The idea of storing binary files in a database is quite good, but it
will affect the sql server in a negative way, specially the larger the
binary files are.

Ok, why should it take longer to pull a large file out of one locatin in
a database than one location in a filesssytem?

I think the point is that retrieving such a large data chunk from a db
might momentarily impact the performance of forthcoming db operations,
think about what happens to the sql database caches.

--Jorge.

Jul 18 '08 #21

Jerry Stuckle wrote:

Bart Van der Donck wrote:

>Captain Paralytic wrote:

>>http://www.php-mysql-tutorial.com/php-mysql-upload.php

>I'm surprised this document doesn't mention how disastrous it can be
for the performance of a database. Only use for tiny binary data and a
limited amount of records, I'ld say... I would even vote to dismiss
LONGBLOB; it often creates more problems than it solves.

You're just using the database for what it's made for - storing and
accessing data. *It's not at all disastrous - in fact, if you get enough
files in the database, performance may actually improve over that file
system's.

I would be interested to see some articles or benchmarks about this
issue. Got any ? From my experience I've actually always encountered
the opposite (MySQL and MS Access) whose performance dramatically
decreases with larger BLOBS. I'm working with many GB's of pictures
for whom I store nothing in tables (ID of the record = name of the
picture / application ties pics to IDs). I've good experiences with
this approach, even under heavy load. But I'm always interested to
learn how this strategy could be improved.

--
Bart

Jul 18 '08 #22

Jerry Stuckle wrote:

Paul Lautman wrote:
>The Natural Philosopher wrote:
>>J.O. Aho wrote:
The idea of storing binary files in a database is quite good, but
it will affect the sql server in a negative way, specially the
larger the binary files are.

Ok, why should it take longer to pull a large file out of one
locatin in a database than one location in a filesssytem?

IME the things that slow databases down are not getting data out of
them, its performing complex relational queries.

I have tested this and I have found it slightly slower to get files
from a database table than from the file system. Then again, it is
slightly slower building pages dynamically with php/MySQL than it is
to serve fixed html pages. So basically, when I find that storing
files in a database is the best way to handle the application I am
writing, that's the way I do it.

Paul,

But try putting 100K files in a directory on the file system and see
how much it slows things down. Whereas the database will hardly
notice any performance decrease.

I have always found it slightly slower to get the equivalent file from the
database rather than from the file system. But as I say, it doesn't bother
me. If the application is generally better with the files in a database,
that's where they go. If the application is easier with them on disk, then I
put them there. Likewise, if something works better with static html pages I
will use them. When it comes to down to it, we have a vast range of
technologies at our disposal. I look upon my role as being good at picking
the right one for the right task. There is always a balance to be struck
between speed of processing, functionality, ease of maintenance, ...

Jul 18 '08 #23

Bart Van der Donck wrote:

Jerry Stuckle wrote:

>Bart Van der Donck wrote:

>>Captain Paralytic wrote:
http://www.php-mysql-tutorial.com/php-mysql-upload.php
I'm surprised this document doesn't mention how disastrous it can be
for the performance of a database. Only use for tiny binary data and a
limited amount of records, I'ld say... I would even vote to dismiss
LONGBLOB; it often creates more problems than it solves.
You're just using the database for what it's made for - storing and
accessing data. It's not at all disastrous - in fact, if you get enough
files in the database, performance may actually improve over that file
system's.

I would be interested to see some articles or benchmarks about this
issue. Got any ? From my experience I've actually always encountered
the opposite (MySQL and MS Access) whose performance dramatically
decreases with larger BLOBS. I'm working with many GB's of pictures
for whom I store nothing in tables (ID of the record = name of the
picture / application ties pics to IDs). I've good experiences with
this approach, even under heavy load. But I'm always interested to
learn how this strategy could be improved.

--
Bart

Over 20 years of experience doing it, starting with DB2 on mainframes.

But don't count MS Access in there. Use a real database. MySQL
qualifies. And it has to be configured properly.

BTW - benchmarks tell exactly one thing - how a database runs UNDER
THOSE CONDITIONS. Change the conditions and benchmarks aren't valid any
more.

With that said, under live conditions, I've seen virtually no slowdown
when accessing blob data in a database. And in some cases it actually
runs faster.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Jul 19 '08 #24

Paul Lautman wrote:

Jerry Stuckle wrote:
>Paul Lautman wrote:
>>The Natural Philosopher wrote:
J.O. Aho wrote:
The idea of storing binary files in a database is quite good, but
it will affect the sql server in a negative way, specially the
larger the binary files are.
>
Ok, why should it take longer to pull a large file out of one
locatin in a database than one location in a filesssytem?

IME the things that slow databases down are not getting data out of
them, its performing complex relational queries.
I have tested this and I have found it slightly slower to get files
from a database table than from the file system. Then again, it is
slightly slower building pages dynamically with php/MySQL than it is
to serve fixed html pages. So basically, when I find that storing
files in a database is the best way to handle the application I am
writing, that's the way I do it.

Paul,

But try putting 100K files in a directory on the file system and see
how much it slows things down. Whereas the database will hardly
notice any performance decrease.

I have always found it slightly slower to get the equivalent file from the
database rather than from the file system. But as I say, it doesn't bother
me. If the application is generally better with the files in a database,
that's where they go. If the application is easier with them on disk, then I
put them there. Likewise, if something works better with static html pages I
will use them. When it comes to down to it, we have a vast range of
technologies at our disposal. I look upon my role as being good at picking
the right one for the right task. There is always a balance to be struck
between speed of processing, functionality, ease of maintenance, ...

Yes, but with that many files in a directory, even Linux slows down
quite a bit. It isn't made to handle that many different files.

But for a good database, you're just starting.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Jul 19 '08 #25

Jorge wrote:

On Jul 18, 8:58 pm, The Natural Philosopher <a...@b.cwrote:
>J.O. Aho wrote:

>>The idea of storing binary files in a database is quite good, but it
will affect the sql server in a negative way, specially the larger the
binary files are.
Ok, why should it take longer to pull a large file out of one locatin in
a database than one location in a filesssytem?

I think the point is that retrieving such a large data chunk from a db
might momentarily impact the performance of forthcoming db operations,
think about what happens to the sql database caches.

--Jorge.

Not at all, if the database is properly configured.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Jul 19 '08 #26

Paul Lautman wrote:

Jerry Stuckle wrote:
>Paul Lautman wrote:
>>The Natural Philosopher wrote:
J.O. Aho wrote:
The idea of storing binary files in a database is quite good, but
it will affect the sql server in a negative way, specially the
larger the binary files are.
>
Ok, why should it take longer to pull a large file out of one
locatin in a database than one location in a filesssytem?

IME the things that slow databases down are not getting data out of
them, its performing complex relational queries.
I have tested this and I have found it slightly slower to get files
from a database table than from the file system. Then again, it is
slightly slower building pages dynamically with php/MySQL than it is
to serve fixed html pages. So basically, when I find that storing
files in a database is the best way to handle the application I am
writing, that's the way I do it.

Paul,

But try putting 100K files in a directory on the file system and see
how much it slows things down. Whereas the database will hardly
notice any performance decrease.

I have always found it slightly slower to get the equivalent file from the
database rather than from the file system. But as I say, it doesn't bother
me. If the application is generally better with the files in a database,
that's where they go. If the application is easier with them on disk, then I
put them there. Likewise, if something works better with static html pages I
will use them. When it comes to down to it, we have a vast range of
technologies at our disposal. I look upon my role as being good at picking
the right one for the right task. There is always a balance to be struck
between speed of processing, functionality, ease of maintenance, ...

Yes. Exactly. The key is to not get religious about it ..."the RIGHT way
is to.."

Advantages of the database...

- one point backup of all data
- definitely not directly accessible via HTML
- has much better indexing and searching than a flat file system in a
directory.
- possibly simpler integration with other bits of data assciated with te
file to be served )i.e. you MIGHT want a decsription of what it is).

On the downside, its a few more machine cycles and possibly a lot more
RAM to serve it up.
HOWEVER it is perfectly possible to have separate database on even a
separate machine to do the serving, if it gets too onerous.

Jul 19 '08 #27

The Natural Philosopher wrote:

Advantages of the database...

- one point backup of all data
- definitely not directly accessible via HTML
- has much better indexing and searching than a flat file system in a
directory.
- possibly simpler integration with other bits of data assciated with
te file to be served )i.e. you MIGHT want a decsription of what it
is).

Also, and this is the bit I really like, when you delete the record the file
automatically goes with it.

Jul 19 '08 #28

Jerry Stuckle wrote:

Paul Lautman wrote:
>The Natural Philosopher wrote:
>>J.O. Aho wrote:
The idea of storing binary files in a database is quite good, but
it will affect the sql server in a negative way, specially the
larger the binary files are.

Ok, why should it take longer to pull a large file out of one
locatin in a database than one location in a filesssytem?

IME the things that slow databases down are not getting data out of
them, its performing complex relational queries.

I have tested this and I have found it slightly slower to get files
from a database table than from the file system. Then again, it is
slightly slower building pages dynamically with php/MySQL than it is
to serve fixed html pages. So basically, when I find that storing
files in a database is the best way to handle the application I am
writing, that's the way I do it.

Paul,

But try putting 100K files in a directory on the file system and see
how much it slows things down. Whereas the database will hardly
notice any performance decrease.

Actually I guess I ought to qualify my timings comment. I have no proof that
it is the database that was slowing things down per-se. To serve the images
required invoking a load of script, which wasn't going to help and of course
the MySQL installation was on a shared server, so no opportunity to optimise
the settings for this task.

Jul 19 '08 #29

Michael Fesser

..oO(The Natural Philosopher)

>Yes. Exactly. The key is to not get religious about it ..."the RIGHT way
is to.."

Advantages of the database...

- one point backup of all data
- definitely not directly accessible via HTML
- has much better indexing and searching than a flat file system in a
directory.
- possibly simpler integration with other bits of data assciated with te
file to be served )i.e. you MIGHT want a decsription of what it is).

On the downside, its a few more machine cycles and possibly a lot more
RAM to serve it up.

Some more pros and cons:

http://groups.google.com/group/alt.p...e4dd4f90eafa84

Micha

Jul 19 '08 #30

Jorge

On Jul 19, 7:38*am, The Natural Philosopher <a...@b.cwrote:

>
Yes. Exactly. The key is to not get religious about it ..."the RIGHT way
is to.."

In fact, a filesystem is a ~DBMS that handles just one type of data
(files). But the amount of metadata that a filesystem (easily) keeps/
provides about its data (the files) is limited, while there's no limit
to the amount of metadata that can be (easily) saved/retrieved in a
DBMS. Both are (most likely) equally well optimized to do their jobs
efficiently. The APIs to get to the data are completely different. One
is pretty familiar and the other is not so much. I love the idea of
single file backups (as in a DBMS). OTOH, the filesystem approach
suits better for incremental backups.

--Jorge.

Jul 19 '08 #31

Michael Fesser wrote:

.oO(The Natural Philosopher)

>Yes. Exactly. The key is to not get religious about it ..."the RIGHT way
is to.."

Advantages of the database...

- one point backup of all data
- definitely not directly accessible via HTML
- has much better indexing and searching than a flat file system in a
directory.
- possibly simpler integration with other bits of data assciated with te
file to be served )i.e. you MIGHT want a decsription of what it is).

On the downside, its a few more machine cycles and possibly a lot more
RAM to serve it up.

Some more pros and cons:

http://groups.google.com/group/alt.p...e4dd4f90eafa84

Micha

Which is not entirely accurate...

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Jul 19 '08 #32

Paul Lautman wrote:

The Natural Philosopher wrote:
>Advantages of the database...

- one point backup of all data
- definitely not directly accessible via HTML
- has much better indexing and searching than a flat file system in a
directory.
- possibly simpler integration with other bits of data assciated with
te file to be served )i.e. you MIGHT want a decsription of what it
is).
Also, and this is the bit I really like, when you delete the record the file
automatically goes with it.

Good point.

Jul 19 '08 #33

Michael Fesser wrote:

.oO(The Natural Philosopher)

>Yes. Exactly. The key is to not get religious about it ..."the RIGHT way
is to.."

Advantages of the database...

- one point backup of all data
- definitely not directly accessible via HTML
- has much better indexing and searching than a flat file system in a
directory.
- possibly simpler integration with other bits of data assciated with te
file to be served )i.e. you MIGHT want a decsription of what it is).

On the downside, its a few more machine cycles and possibly a lot more
RAM to serve it up.

Some more pros and cons:

http://groups.google.com/group/alt.p...e4dd4f90eafa84

Micha

Shows a lot of bis there and many usupported assertions. Some of which
ARE wrong.

Jul 19 '08 #34

Jerry Stuckle wrote:

[...]
But don't count MS Access in there. *Use a real database. *MySQL
qualifies. *And it has to be configured properly.

Not the real communism ?[*] I partly agree for MS Access [**], but I
have reasons to believe that my MySQL databases are set up properly.
This is not a thing I do myself, but sysadmins in one of the giant
datacenters who stick to one config for the entire park.

BTW - benchmarks tell exactly one thing - how a database runs UNDER
THOSE CONDITIONS. *Change the conditions and benchmarks aren't valid any
more.

With that said, under live conditions, I've seen virtually no slowdown
when accessing blob data in a database. *And in some cases it actually
runs faster.

I think the question is how BLOBs are handled. My situation is a
browser-based application that consists of many read actions (public
+intranet) and few update/delete actions (admin). Now suppose:

(1) Read actions without BLOB:
- Application does not load any BLOB data from database.
- Application uses a var holding the system-path (usr/my/path/to/
pics/), adds the ID to it, adds .jpg to it, tests if file exists (-e).
- If yes, use URL-path in stead of system-path and output inside an
<IMGto screen.
- No binary data has to be handled; the major memory use here (if any)
is the -e check for file existance. But even this could be skipped
with a workaround.

(2) Read actions with BLOB:
- Load BLOB from column (already a memory-intensive task of its own).
- Store in some folder (id.).
- Output with <img>.

(3) Update & delete actions without BLOB:
- Update/delete instructions stay out of DB, affects file system only.

(4) Update & delete actions with BLOB:
- Update/delete instructions stay out of file system, affects DB only

It is my experience that (1) has huge memory benefits compared to
(2).

The difference between (3) and (4) is not so clear; especially because
MySQL probably optimizes this processus. I think in practice you would
see that (3) is faster for environment A, and (4) for environment B;
but never with real considerable differences.

And (1) and (2) are much more important since they count for 99.x% of
the queries in my case.
[*] -"Communism is great." -"But look how things went in the USSR."
-"That was not the real communism."
[**] Many tendencies in MS Access are a good thermometer for general
database issues; MS Access is just the first that fails :-)

--
Bart

Jul 21 '08 #35

Bart Van der Donck wrote:

Jerry Stuckle wrote:

>[...]
But don't count MS Access in there. Use a real database. MySQL
qualifies. And it has to be configured properly.

Not the real communism ?[*] I partly agree for MS Access [**], but I
have reasons to believe that my MySQL databases are set up properly.
This is not a thing I do myself, but sysadmins in one of the giant
datacenters who stick to one config for the entire park.

>BTW - benchmarks tell exactly one thing - how a database runs UNDER
THOSE CONDITIONS. Change the conditions and benchmarks aren't valid any
more.

With that said, under live conditions, I've seen virtually no slowdown
when accessing blob data in a database. And in some cases it actually
runs faster.

I think the question is how BLOBs are handled. My situation is a
browser-based application that consists of many read actions (public
+intranet) and few update/delete actions (admin). Now suppose:

(1) Read actions without BLOB:
- Application does not load any BLOB data from database.
- Application uses a var holding the system-path (usr/my/path/to/
pics/), adds the ID to it, adds .jpg to it, tests if file exists (-e).
- If yes, use URL-path in stead of system-path and output inside an
<IMGto screen.
- No binary data has to be handled; the major memory use here (if any)
is the -e check for file existance. But even this could be skipped
with a workaround.

(2) Read actions with BLOB:
- Load BLOB from column (already a memory-intensive task of its own).
- Store in some folder (id.).

Unnecessary: Just..

- Output with <img>.

...pointing to a second php script that loads the BLOB and spits it out.

>
(3) Update & delete actions without BLOB:
- Update/delete instructions stay out of DB, affects file system only.

(4) Update & delete actions with BLOB:
- Update/delete instructions stay out of file system, affects DB only

It is my experience that (1) has huge memory benefits compared to
(2).

Well the way you have it, it duplicates the file in its entirety, which
is inefficient.

The way I do it, it streams off the database via the unix socket into
PHP memory space, and is outputted from there via the web server to the
network.

VERY little extra PHP or CPU activity is required, but I grant you its
probably held in PHP and SQL type memory areas as well as disk cache
memory. Its probably NOT held i e.g.apache memory though..apache or
whatever will read the stdout of the CGI script that spits it, and juts
pass the bytes...and memory is cheap. Cheaper than CPU anyway.

Reading a record has to be something a database is highly optimised for.

Jul 21 '08 #36

The Natural Philosopher wrote:

Bart Van der Donck wrote:
>(1) Read actions without BLOB:
- Application does not load any BLOB data from database.
- Application uses a var holding the system-path (usr/my/path/to/
pics/), adds the ID to it, adds .jpg to it, tests if file exists (-e).
- If yes, use URL-path in stead of system-path and output inside an
<IMGto screen.
- No binary data has to be handled; the major memory use here (if any)
is the -e check for file existance. But even this could be skipped
with a workaround.

>(2) Read actions with BLOB:
- Load BLOB from column (already a memory-intensive task of its own).
- Store in some folder (id.).

>It is my experience that (1) has huge memory benefits compared to
(2).

The way I do it, it streams off the database via the unix socket into
PHP memory space, and is outputted from there via the web server to the
network.

VERY little extra PHP *or CPU activity is required, but I grant you its
probably held in PHP and SQL type memory areas as well as disk cache
memory. Its probably NOT held i e.g.apache memory though..apache or
whatever will read the *stdout of the CGI script that spits it, and juts
pass the bytes...and memory is cheap. Cheaper than CPU anyway.

All I do is this:

SELECT id FROM table;
print "<img src=url/to/$id.jpg>";

Compared to your way:
- Simpler
- No need to start new php scripts to output raw binary stream for
every image
- No sockets
- No need to read heavy binary BLOB from DB
- No chance for possible cache attacks in MySQL, PHP, filesystem or
Apache

I don't want to sound religious, but I think my way is much better.

--
Bart

Jul 21 '08 #37

AlmostBob

"Bart Van der Donck" <ba**@nijlen.comwrote in message
news:59**********************************@c65g2000 hsa.googlegroups.com...
The Natural Philosopher wrote:

Bart Van der Donck wrote:
>(1) Read actions without BLOB:
- Application does not load any BLOB data from database.
- Application uses a var holding the system-path (usr/my/path/to/
pics/), adds the ID to it, adds .jpg to it, tests if file exists (-e).
- If yes, use URL-path in stead of system-path and output inside an
<IMGto screen.
- No binary data has to be handled; the major memory use here (if any)
is the -e check for file existance. But even this could be skipped
with a workaround.

>(2) Read actions with BLOB:
- Load BLOB from column (already a memory-intensive task of its own).
- Store in some folder (id.).

>It is my experience that (1) has huge memory benefits compared to
(2).

The way I do it, it streams off the database via the unix socket into
PHP memory space, and is outputted from there via the web server to the
network.

VERY little extra PHP or CPU activity is required, but I grant you its
probably held in PHP and SQL type memory areas as well as disk cache
memory. Its probably NOT held i e.g.apache memory though..apache or
whatever will read the stdout of the CGI script that spits it, and juts
pass the bytes...and memory is cheap. Cheaper than CPU anyway.

Jul 21 '08 #38

AlmostBob wrote:

"Bart Van der Donck" <ba**@nijlen.comwrote in message
news:59**********************************@c65g2000 hsa.googlegroups.com...
The Natural Philosopher wrote:

>Bart Van der Donck wrote:
>>(1) Read actions without BLOB:
- Application does not load any BLOB data from database.
- Application uses a var holding the system-path (usr/my/path/to/
pics/), adds the ID to it, adds .jpg to it, tests if file exists (-e).
- If yes, use URL-path in stead of system-path and output inside an
<IMGto screen.
- No binary data has to be handled; the major memory use here (if any)
is the -e check for file existance. But even this could be skipped
with a workaround.
(2) Read actions with BLOB:
- Load BLOB from column (already a memory-intensive task of its own).
- Store in some folder (id.).

>>It is my experience that (1) has huge memory benefits compared to
(2).
The way I do it, it streams off the database via the unix socket into
PHP memory space, and is outputted from there via the web server to the
network.

VERY little extra PHP or CPU activity is required, but I grant you its
probably held in PHP and SQL type memory areas as well as disk cache
memory. Its probably NOT held i e.g.apache memory though..apache or
whatever will read the stdout of the CGI script that spits it, and juts
pass the bytes...and memory is cheap. Cheaper than CPU anyway.

All I do is this:

SELECT id FROM table;
print "<img src=url/to/$id.jpg>";

Compared to your way:
- Simpler
- No need to start new php scripts to output raw binary stream for
every image
- No sockets
- No need to read heavy binary BLOB from DB
- No chance for possible cache attacks in MySQL, PHP, filesystem or
Apache

I don't want to sound religious, but I think my way is much better.

There is no better: it depends on the requirements.

Your way there is no chance to protect the image directory from random
downloads for example.

In my case the user may be a user with far greater access than the
general public, and have access to internal data - like plans drawings
and specifications.

I don't want script kiddies stealing vital info: Putting them in a
database is one giant leap in that sense.

execution speed and efficiency is only one of many many issues.

In my case the above, plus a general requirement to try and get all
important corporate data in the data base, under one backup regime, were
more significant. I especially did NOT want user accessible image files
that might get deleted by accident. I could protect the database area by
making it only accessible by root or the mysql daemon: direct access to
download areas had to be at lest readable, and if uploaded, wrteable, by
the permissions the web server and php ran at.
In practice at moderate loads the download speeds are far more dominant
that CPU or RAM limitations. And indeed the ability to make a special
download script that re-sizes the images on the fly, turned out to be a
better way to go than storing thumbnails of varying sizes. One trades
disk space for processing overhead.

As a practicing engineer all my working life, it still amazes me that
people will always come up with what amounts to a religious statement
about any particular implementation, that it is universally 'better'.

If that were the case, it would be universally adopted instantly.

Jerry has (for once) made an extremely valid point about directory sizes
as well. Databases are far better at finding things quickly in large
amounts of data: far better than a crude directory search. Once the
overhead in scanning the directory exceeds the extra download
efficiency, you are overall on a loser with flat files.

AND if you run into CPU or RAM limitations, its a lot easier to - say -
move your database to a honking new machine, or upgrade the one you have
than completely re-write all your applications to use the database, that
used to use a file.

I am NOT claiming that a database is te 'right' answer in all cases,
just pointing out that it may be a decision you want to make carefully,
as it is somewhat hard to change later on, and in most cases the extra
overhead on using it is more than compensated by the benefits,
particularly in access control.

Which was the primary concern of the OP.

--
Bart
But BArt
View source
shows the true path to your image, not good

Jul 21 '08 #39

Bart Van der Donck wrote:

Jerry Stuckle wrote:

>[...]
But don't count MS Access in there. Use a real database. MySQL
qualifies. And it has to be configured properly.

Not the real communism ?[*] I partly agree for MS Access [**], but I
have reasons to believe that my MySQL databases are set up properly.
This is not a thing I do myself, but sysadmins in one of the giant
datacenters who stick to one config for the entire park.

Not necessarily. Sysadmins cannot correctly set up a system in the
dark. They need communications from the developers on what data is
being stored, how it is being handled, etc.

Unfortunately, most sysadmins know very little about how to tune a
database (not just MySQL) and the results is poor response.

>BTW - benchmarks tell exactly one thing - how a database runs UNDER
THOSE CONDITIONS. Change the conditions and benchmarks aren't valid any
more.

With that said, under live conditions, I've seen virtually no slowdown
when accessing blob data in a database. And in some cases it actually
runs faster.

I think the question is how BLOBs are handled. My situation is a
browser-based application that consists of many read actions (public
+intranet) and few update/delete actions (admin). Now suppose:

(1) Read actions without BLOB:
- Application does not load any BLOB data from database.
- Application uses a var holding the system-path (usr/my/path/to/
pics/), adds the ID to it, adds .jpg to it, tests if file exists (-e).
- If yes, use URL-path in stead of system-path and output inside an
<IMGto screen.
- No binary data has to be handled; the major memory use here (if any)
is the -e check for file existance. But even this could be skipped
with a workaround.

Wrong - binary data is still handled.

(2) Read actions with BLOB:
- Load BLOB from column (already a memory-intensive task of its own).
- Store in some folder (id.).
- Output with <img>.

Not very intensive at all. And you don't store it in some folder.

(3) Update & delete actions without BLOB:
- Update/delete instructions stay out of DB, affects file system only.

Yep.

(4) Update & delete actions with BLOB:
- Update/delete instructions stay out of file system, affects DB only

Yep.

It is my experience that (1) has huge memory benefits compared to
(2).

Memory is nothing nowadays. Sure, you need more memory for the database
to effectively handle large blobs. But a few more megabytes is nothing.

The difference between (3) and (4) is not so clear; especially because
MySQL probably optimizes this processus. I think in practice you would
see that (3) is faster for environment A, and (4) for environment B;
but never with real considerable differences.

And (1) and (2) are much more important since they count for 99.x% of
the queries in my case.

And the difference is much less than you claim.

[*] -"Communism is great." -"But look how things went in the USSR."
-"That was not the real communism."
[**] Many tendencies in MS Access are a good thermometer for general
database issues; MS Access is just the first that fails :-)

--
Bart

Databases are optimized for retrieving data - especially from large
groups of data. File systems are just low level databases which handle
small amounts of data (a few files) very well.

One of the big differences is that as your data grows, the database
efficiency remains fairly static. However, file system performance
degrades. Eventually, the file system will actually perform worse than
the database does. Try putting 100K files in one directory. Good luck.
But a database handles 100M rows with ease.

And no, MS Access is not a real database, and is not a good thermometer
for anything other than how bad it really is. Real databases work in an
entirely different way and perform much differently.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Jul 21 '08 #40

Bart Van der Donck wrote:

The Natural Philosopher wrote:

>Bart Van der Donck wrote:
>>(1) Read actions without BLOB:
- Application does not load any BLOB data from database.
- Application uses a var holding the system-path (usr/my/path/to/
pics/), adds the ID to it, adds .jpg to it, tests if file exists (-e).
- If yes, use URL-path in stead of system-path and output inside an
<IMGto screen.
- No binary data has to be handled; the major memory use here (if any)
is the -e check for file existance. But even this could be skipped
with a workaround.
(2) Read actions with BLOB:
- Load BLOB from column (already a memory-intensive task of its own).
- Store in some folder (id.).

>>It is my experience that (1) has huge memory benefits compared to
(2).
The way I do it, it streams off the database via the unix socket into
PHP memory space, and is outputted from there via the web server to the
network.

VERY little extra PHP or CPU activity is required, but I grant you its
probably held in PHP and SQL type memory areas as well as disk cache
memory. Its probably NOT held i e.g.apache memory though..apache or
whatever will read the stdout of the CGI script that spits it, and juts
pass the bytes...and memory is cheap. Cheaper than CPU anyway.

All I do is this:

SELECT id FROM table;
print "<img src=url/to/$id.jpg>";

Compared to your way:
- Simpler
- No need to start new php scripts to output raw binary stream for
every image
- No sockets
- No need to read heavy binary BLOB from DB
- No chance for possible cache attacks in MySQL, PHP, filesystem or
Apache

I don't want to sound religious, but I think my way is much better.

--
Bart

It's easier for YOU. And you THINK your way is better. But you've
never really tried with lots of images, have you? In fact, I suspect
you've never really checked it at all with a real database which has
been designed and configured to do this type of operation.

So all you really have to go on is your opinion.

OTOH, some of us have been doing it for years (over 20, in my case,
starting with DB2 on mainframes), and have both designed databases and
configured RDBMS's to handle these operations efficiently. We've seen
the difference in performance, and it isn't what you claim.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Jul 21 '08 #41

Jerry Stuckle wrote:

Bart Van der Donck wrote:

>* *SELECT id FROM table;
* *print "<img src=url/to/$id.jpg>";

It's easier for YOU. *And you THINK your way is better. *But you've
never really tried with lots of images, have you? *

Yes I have, and the tests with BLOBs were disastrous for my case
(although I must admit this study was done already 9 years ago).

Perhaps you're right that my requirements were a bit particular; I'm
facing a read load of a few MB/sec and a modest update/delete load
only peaking at nightly cronjobs. Images are spread on the machine
over 57 directories, the largest directory is holding 22,241 images at
this moment. Maybe it's BSD or the running shell that is optimal (?);
one thing I know -and tested well enough- is that my MySQL cannot
handle this kind of BLOB "abuse" under such conditions.

I can understand it might be desirable that the URL to the image must
be unknown, like Natural Philosopher said, or other requirements which
make this or that approach more preferable. In my case the binaries
are about hotel photos having their telephone number as the name of
the JPG's. This level of protection is acceptable here; performance
critera are more crucial.

In fact, I suspect you've never really checked it at all with
a real database which has been designed and configured to do
this type of operation.
So all you really have to go on is your opinion.

It's unwise to draw a conclusion from something you only suspect.

But you're right, it's my opinion, but based on experience and
proceeded by quite some study and benchmarks. I think that, for my
case, it was the best possible design under the given requirements.

--
Bart

Jul 21 '08 #42

Jones wrote:

On Mon, 21 Jul 2008 06:46:33 -0400, Jerry Stuckle <js*******@attglobal.net>
wrote:

>Not necessarily. Sysadmins cannot correctly set up a system in the
dark. They need communications from the developers on what data is
being stored, how it is being handled, etc.

Once upon a time the term, "system analyst" actually meant something.
And then Alan Sugar started selling desktop PC's to everyone and now
everyone thinks they're a "software engineer" just because they can hack
a few lines of PHP or type ./configure.

The "developers" should have worked it all out before the project even started.
Thats the REAL problem - here presumably and elsewhere for certain.

No, there are still sysadmins, who are responsible for system tuning.
It isn't just the needs of the database developers which needs to be
taken into consideration - there are others, also.

Of course, you're right - nowadays there are too many "system
administrators" who only hold that title because they failed Programming
101.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Jul 25 '08 #43

Bart Van der Donck wrote:

Jerry Stuckle wrote:

>Bart Van der Donck wrote:

>> SELECT id FROM table;
print "<img src=url/to/$id.jpg>";
It's easier for YOU. And you THINK your way is better. But you've
never really tried with lots of images, have you?

Yes I have, and the tests with BLOBs were disastrous for my case
(although I must admit this study was done already 9 years ago).

How many is a lot? I've done it with over 50M images (several terabytes
- but that was a mainframe) in a database with no performance
degradation. But the database and RDBMS were designed to do it, also.

And this was under live conditions, averaging 10K queries/second.

Perhaps you're right that my requirements were a bit particular; I'm
facing a read load of a few MB/sec and a modest update/delete load
only peaking at nightly cronjobs. Images are spread on the machine
over 57 directories, the largest directory is holding 22,241 images at
this moment. Maybe it's BSD or the running shell that is optimal (?);
one thing I know -and tested well enough- is that my MySQL cannot
handle this kind of BLOB "abuse" under such conditions.

Do it all in one directory. That's what the database effectively does.
And it means you don't need to sort images into different directories,
create new directories when the images get too large...

I can understand it might be desirable that the URL to the image must
be unknown, like Natural Philosopher said, or other requirements which
make this or that approach more preferable. In my case the binaries
are about hotel photos having their telephone number as the name of
the JPG's. This level of protection is acceptable here; performance
critera are more crucial.

>In fact, I suspect you've never really checked it at all with
a real database which has been designed and configured to do
this type of operation.
So all you really have to go on is your opinion.

It's unwise to draw a conclusion from something you only suspect.

But you're right, it's my opinion, but based on experience and
proceeded by quite some study and benchmarks. I think that, for my
case, it was the best possible design under the given requirements.

--
Bart

Yep, but your "study" and "benchmarks" were not necessarily accurate.
So neither are your conclusions.

Tune the RDBMS and design the database correctly, and there is virtually
no overhead. After all, all a file system is is a dumb dbms.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Jul 25 '08 #44

Geoff Berrow

Message-ID: <g6**********@registered.motzarella.orgfrom Jerry Stuckle
contained the following:

After all, all a file system is is a dumb dbms.

Don't you mean, a file system is a database?

--
Geoff Berrow 0110001001101100010000000110
001101101011011001000110111101100111001011
100110001101101111001011100111010101101011
http://slipperyhill.co.uk

Jul 26 '08 #45

Geoff Berrow wrote:

Message-ID: <g6**********@registered.motzarella.orgfrom Jerry Stuckle
contained the following:

>After all, all a file system is is a dumb dbms.

Don't you mean, a file system is a database?

No, the files are a database. A file system is a dump database
management system.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================

Jul 26 '08 #46