473,385 Members | 1,942 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

1 - 2 millions files in one folder?

Hi all,
I really need your advice.
I have to store over a million files, 10 - 15 kb each, in one folder.
The files are created by my php script, sometimes the old files are
deleted and new ones are written.
So, basically on every connection my script reads/deletes/ writes files
from/to that folder.
Right now i have only around 300 000 files in that folder, and it feels
like its getting slower for that script to work. It does work at the
moment, but i am not sure what will happen when there is over a million
files there...
Are there any limits of files that can be stored in a folder?
Would it be better for me to use mysql? I am not sure how mysql will
cope with millions of writes/reads
What would you recommend?
Thank you very much!
p.s.I am running linux, fedora core 3

Jul 27 '06 #1
17 2550
On Thu, 27 Jul 2006 02:49:10 -0700, b007uk wrote:
Hi all,
I really need your advice.
I have to store over a million files, 10 - 15 kb each, in one folder.
The files are created by my php script, sometimes the old files are
deleted and new ones are written.
So, basically on every connection my script reads/deletes/ writes files
from/to that folder.
Right now i have only around 300 000 files in that folder, and it feels
like its getting slower for that script to work. It does work at the
moment, but i am not sure what will happen when there is over a million
files there...
Are there any limits of files that can be stored in a folder?
Would it be better for me to use mysql? I am not sure how mysql will
cope with millions of writes/reads
What would you recommend?
Thank you very much!
p.s.I am running linux, fedora core 3
In a word... *you're crazy*!!! Look at the way that files are stored under
linux, with the different file systems. with ext2/3, god knows how many
levels of indirection you'll be going through to even amange to index the
directory.

You need to do a lot of reading, a lot of customization, and a load of
benchmarking to get this to work. And, tbh, I'd find another solution.
There must be a way to subdivide this data to get an acceptable number of
files ( thousands or less!!! ) in each directory.

Steve

Jul 27 '06 #2
b0****@gmail.com wrote:
Hi all,
I really need your advice.
I have to store over a million files, 10 - 15 kb each, in one folder.
The files are created by my php script, sometimes the old files are
deleted and new ones are written.
So, basically on every connection my script reads/deletes/ writes files
from/to that folder.
Right now i have only around 300 000 files in that folder, and it feels
like its getting slower for that script to work. It does work at the
moment, but i am not sure what will happen when there is over a million
files there...
Are there any limits of files that can be stored in a folder?
Would it be better for me to use mysql? I am not sure how mysql will
cope with millions of writes/reads
What would you recommend?
Thank you very much!
p.s.I am running linux, fedora core 3
Hi,

Don't. :P
If you know that folder will contain millions of files, the underlying OS
will need more and more time to get the right file.
Some OS's are smarter than others, I do not know the details.

But better be safe than sorry, so if possible, use a database. These things
are set up to easily handle massive tablelookups by means of (smart)
indexing.

A simple approach:
(Postgresnotation, not mySQL which I avoid)

create a table that holds your content
create table files{
fileid serial primary key,
filename text,
content text
}

Now you can get the content of each file based on its id very fast because
fileid is primary key, and thus indexed.
If you want to use the filename, index that one too.

So (very fast)
SELECT content from files WHERE (fileid=238756);
because fileid is indexed.

And if you indexed filename too, this will also be very fast:
SELECT content FROM files WHERE (filename='myfile_3_4_2006.txt');

Alternatively: Maybe you can translate your whole approach to a database and
produce the results when needed instead of making millions of files.
This is hard to say since I do not know the underlying problem, but in
general you can solve this with a good designed database.

Hope that helps,

Regards,
Erwin Moller
Jul 27 '06 #3
Message-ID: <44**********************@news.xs4all.nlfrom Erwin Moller
contained the following:
>A simple approach:
(Postgresnotation, not mySQL which I avoid)
Last time I checked it mysql was being used for tinyurl.com

http://tinyurl.com/

--
Geoff Berrow (put thecat out to email)
It's only Usenet, no one dies.
My opinions, not the committee's, mine.
Simple RFDs http://www.ckdog.co.uk/rfdmaker/
Jul 27 '06 #4
Geoff Berrow wrote:
Message-ID: <44**********************@news.xs4all.nlfrom Erwin Moller
contained the following:
>>A simple approach:
(Postgresnotation, not mySQL which I avoid)

Last time I checked it mysql was being used for tinyurl.com

http://tinyurl.com/
So what?
My choice of database is not based on tinyurl.com using something or not.
;-)

The reason I prefer Postgresql above mySQL has more to do with support of
Foreign Keys (which failed silently in mySQL), transactions, etc.
When I made my pick of prefered database a few years ago, mySQL was no
comparision to Postgres.

Yes I know: Since mySQL offered the use of innoDB in combination with imysql
they solved these serious shortcomings.

Also: I do not say mySQL sucks or anything, it is just that mySQL matured a
short while ago and you'll have to tweak it before you can use FK's and
transactions and the like, but yeah, it can be done with innoDB.

I think the main reason for mySQL's popularity lies in the fact they offered
it on M$ systems, where Postgres was only running on *nix. That too is in
the past by the way: Postgresql is available for years now on M$.

Regards,
Erwin Moller
Jul 27 '06 #5
You are right :(
Thats probably why i can't even enter that folder now, it takes ages :(
I'll try to change it to work with mysql...
To tell you the truth i am a bit affraid of mysql, was always storing
data in folders/files, but i think i'll manage...
And no, i can't predict the file names, so cant organize it, files are
generated from the user input.
Is it possible to store complete html files with all its tags in mysql?
Thank you very much!

Jul 27 '06 #6
Message-ID: <44**********************@news.xs4all.nlfrom Erwin Moller
contained the following:
>>>A simple approach:
(Postgresnotation, not mySQL which I avoid)

Last time I checked it mysql was being used for tinyurl.com

http://tinyurl.com/

So what?
My choice of database is not based on tinyurl.com using something or not.
;-)
I know that, but it may have been seen as indicating that you thought
MySQL could not handle large numbers
>
The reason I prefer Postgresql above mySQL has more to do with support of
Foreign Keys (which failed silently in mySQL), transactions, etc.
When I made my pick of prefered database a few years ago, mySQL was no
comparision to Postgres.
Not quite sure what you mean by foreign keys failing silently but
perhaps this is off topic for this group.
--
Geoff Berrow (put thecat out to email)
It's only Usenet, no one dies.
My opinions, not the committee's, mine.
Simple RFDs http://www.ckdog.co.uk/rfdmaker/
Jul 27 '06 #7
b0****@gmail.com schrieb:
[...]
Is it possible to store complete html files with all its tags in mysql?
Rather store the HTML code in a text field. To display it, just get the
contents of that field with PHP, echo it and exit. No need to make it a
file at all!

--
Markus
Jul 27 '06 #8
Geoff Berrow wrote:
Message-ID: <44**********************@news.xs4all.nlfrom Erwin Moller
contained the following:
>>>>A simple approach:
(Postgresnotation, not mySQL which I avoid)

Last time I checked it mysql was being used for tinyurl.com

http://tinyurl.com/

So what?
My choice of database is not based on tinyurl.com using something or not.
;-)
Hi Geoff,
I know that, but it may have been seen as indicating that you thought
MySQL could not handle large numbers
No, that was not what I ment.
I just ment: if you start using a database, Postgres has (had) some
advantages above mySQL.

>>
The reason I prefer Postgresql above mySQL has more to do with support of
Foreign Keys (which failed silently in mySQL), transactions, etc.
When I made my pick of prefered database a few years ago, mySQL was no
comparision to Postgres.

Not quite sure what you mean by foreign keys failing silently but
perhaps this is off topic for this group.
Yes a little off topic, but that happens all the time in here. ;-)

What I mean by 'failing silently' is this:
create table tbluser(
userid serial primary key,
username text
)

create table tblarticle(
articleid serial primary key,
writtenby integer references tbluser.userid,
title text,
content text
)

and then:
insert into tbluser (username) values ('Geoff');

suppose the userid for that insert (serial/autonumber) is 1.
Now with mySQL, if I insert an illegal value for writtenby in tblarticle,
like this:
insert into tblarticle (writtenby,title,content) VALUES
(33, 'my title', 'bla');

it just fails to check the contstraint, and boldly inserts 33 for writtenby,
which should actually give an error (Foreign Key contraint violation, os
something like that).

I rather had mySQL say: "What does 'references' mean in your
tabledefinition? I do not know that word."
instead of pretending it understands, but never enforcing the constraint.

I had the same kind of trouble with transactions with mySQL, that is why I
said it matured just a short while ago (with innoDB and imysql).

For simple datastorage, this presents no problem, but once your database
gets more complex, you really want to be able to rely on FK constraints.

Anyway, this is off topic indeed. :-)

Regards,
Erwin Moller
Jul 27 '06 #9
b0****@gmail.com wrote:
You are right :(
Thats probably why i can't even enter that folder now, it takes ages :(
I'll try to change it to work with mysql...
To tell you the truth i am a bit affraid of mysql, was always storing
data in folders/files, but i think i'll manage...
And no, i can't predict the file names, so cant organize it, files are
generated from the user input.
In that case make sure you put an index on the filename in your table.
It will speed up the lookups a lot, but it will decrease the
inserts/updates.

So if your system is most of the time busy looking up: use an index.
In case you are almost all the time inserting, leave it.

That is just a very general rule-of-thumb.
If you really want to know what is going on, use a profiler.
But forget about the profiler for now, and start learning SQL. :-)
(If you want to use mySQL, nothing wrong with that. I was just making a
punch to mySQL. It can probably easy do what you want. So just go mySQL.)
Is it possible to store complete html files with all its tags in mysql?
Yes, no problem.
For a column of type TEXT or VARCHAR, the HTML is just a bunch of
characters.
Pay attention however to escaping.
SQL uses the ' as stringdelimitter, so if you use that ' in your HTML, be
sure you escape it. mySQL has all kinds of escapingfunctions, as does PHP
(addslashes()), so you'll find one that suits your needs.

Regards,
Erwin Moller
Thank you very much!
Jul 27 '06 #10
Thanks a lot, I'll do that!
I guess its time to learn mysql @)

Jul 27 '06 #11
Erwin Moller wrote:
Geoff Berrow wrote:

>>Message-ID: <44**********************@news.xs4all.nlfrom Erwin Moller
contained the following:

>>>>>A simple approach:
>(Postgresnotation, not mySQL which I avoid)

Last time I checked it mysql was being used for tinyurl.com

http://tinyurl.com/
So what?
My choice of database is not based on tinyurl.com using something or not.
;-)

Hi Geoff,

>>I know that, but it may have been seen as indicating that you thought
MySQL could not handle large numbers


No, that was not what I ment.
I just ment: if you start using a database, Postgres has (had) some
advantages above mySQL.
Sure. And MySQL has advantages over Progress. Both are good
databases, with their own advantages and disadvantages.
>
>>>The reason I prefer Postgresql above mySQL has more to do with support of
Foreign Keys (which failed silently in mySQL), transactions, etc.
When I made my pick of prefered database a few years ago, mySQL was no
comparision to Postgres.

Not quite sure what you mean by foreign keys failing silently but
perhaps this is off topic for this group.


Yes a little off topic, but that happens all the time in here. ;-)

What I mean by 'failing silently' is this:
create table tbluser(
userid serial primary key,
username text
)

create table tblarticle(
articleid serial primary key,
writtenby integer references tbluser.userid,
title text,
content text
)

and then:
insert into tbluser (username) values ('Geoff');

suppose the userid for that insert (serial/autonumber) is 1.
Now with mySQL, if I insert an illegal value for writtenby in tblarticle,
like this:
insert into tblarticle (writtenby,title,content) VALUES
(33, 'my title', 'bla');

it just fails to check the contstraint, and boldly inserts 33 for writtenby,
which should actually give an error (Foreign Key contraint violation, os
something like that).
Not a failure. Documented operation when now using INNODB.
>
I rather had mySQL say: "What does 'references' mean in your
tabledefinition? I do not know that word."
instead of pretending it understands, but never enforcing the constraint.
But REFERENCES is part of the SQL standard, and they are trying to
adhere to the standard.
I had the same kind of trouble with transactions with mySQL, that is why I
said it matured just a short while ago (with innoDB and imysql).
Again, documented operation.
For simple datastorage, this presents no problem, but once your database
gets more complex, you really want to be able to rely on FK constraints.
True. But the bottom line is - know your tools!
Anyway, this is off topic indeed. :-)

Regards,
Erwin Moller

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Jul 27 '06 #12
b0****@gmail.com wrote:
I have to store over a million files, 10 - 15 kb each, in one folder.
The files are created by my php script, sometimes the old files are
deleted and new ones are written.
So, basically on every connection my script reads/deletes/ writes files
from/to that folder.
Right now i have only around 300 000 files in that folder, and it feels
like its getting slower for that script to work. It does work at the
moment, but i am not sure what will happen when there is over a million
files there...
Are there any limits of files that can be stored in a folder?
No (depends on the filesystem but in general no).

However, with many filesystems the search time will get really bad when
you have so many files in one folder.

Instead you can make a little hash structure, it's easy to do and will
provide you a significant performance boost.

Let's say your files are all named with a sequence of 6 random letters
(like "rjudfx" and "qopmnu" and "zsijpa").

Make yourself 26 directories inside of your one large directory: 'a',
'b', 'c', 'd', 'e', etc.

Then store the files in the directory named after the first letter. file
"rjudfx" would go inside 'r', and so on.

You can make some quick, easy functions to add the directory prefix onto
the names when you are reading and writing them.

function hashname($filename)
{
return $filename{0} . "/{$filename}";
}

Then, instead of doing fopen($filename), just do
fopen(hashname($filename)).

This way the search space is cut into 1/26 of what it was before, and
accessing the files will be much faster.

miguel
--
Photos from 40 countries on 5 continents: http://travel.u.nu
Latest photos: Malaysia; Thailand; Singapore; Spain; Morocco
Airports of the world: http://airport.u.nu
Jul 27 '06 #13
Thank you!
Maybe I won't need to use mysql after all!
File names are words or frases that may have digits, separated by '-',
like this: this-is-one-file.txt this-1-is-another.txt and-more.txt
ill try to change '-' to '/' and save it like that:
../t/this/is/one/file.txt
it should work
Thank you for the idea again!

Jul 27 '06 #14
b0****@gmail.com wrote:
Thank you!
Maybe I won't need to use mysql after all!
File names are words or frases that may have digits, separated by '-',
like this: this-is-one-file.txt this-1-is-another.txt and-more.txt
ill try to change '-' to '/' and save it like that:
./t/this/is/one/file.txt
it should work
Thank you for the idea again!
If you do this you will have to make a lot of directories all the time.

If the names are pretty unpredictable like that, how about just taking
the md5() of the name and using the first character of that? That way
you get 16 buckets to spread them out over.

miguel
--
Photos from 40 countries on 5 continents: http://travel.u.nu
Latest photos: Malaysia; Thailand; Singapore; Spain; Morocco
Airports of the world: http://airport.u.nu
Jul 27 '06 #15
Miguel Cruz wrote:
No (depends on the filesystem but in general no).

However, with many filesystems the search time will get really bad when
you have so many files in one folder.

Instead you can make a little hash structure, it's easy to do and will
provide you a significant performance boost.
I wonder if it wouldn't be easier to create a ReiserFS volume and mount
the directory on it. I haven't use it myself but I've read that you can
gain an order of magnitude improvement in performance over ext2 when
you have lots of small files.

Jul 27 '06 #16
Miguel Cruz wrote:
b0****@gmail.com wrote:
>Thank you!
Maybe I won't need to use mysql after all!
File names are words or frases that may have digits, separated by '-',
like this: this-is-one-file.txt this-1-is-another.txt and-more.txt
ill try to change '-' to '/' and save it like that:
./t/this/is/one/file.txt
it should work
Thank you for the idea again!

If you do this you will have to make a lot of directories all the time.

If the names are pretty unpredictable like that, how about just taking
the md5() of the name and using the first character of that? That way
you get 16 buckets to spread them out over.

miguel
Or take first 2 or 3 digits and get a hash-space of 256 or 4096
subdirectories and filesystem will be still humming fast with 1,000,000
files.

roman
Jul 29 '06 #17
Miguel Cruz wrote:
b0****@gmail.com wrote:
Thank you!
Maybe I won't need to use mysql after all!
File names are words or frases that may have digits, separated by '-',
like this: this-is-one-file.txt this-1-is-another.txt and-more.txt
ill try to change '-' to '/' and save it like that:
./t/this/is/one/file.txt
it should work
Thank you for the idea again!
If you do this you will have to make a lot of directories all the time.

If the names are pretty unpredictable like that, how about just taking
the md5() of the name and using the first character of that? That way
you get 16 buckets to spread them out over.

miguel

Or take first 2 or 3 digits and get a hash-space of 256 or 4096
subdirectories and filesystem will be still humming fast with 1,000,000
files.

roman
The solutions offered about breaking down into subdirectories is a good
one. I use it.

But ever hear the term throw more hardware at it? Sometimes brute force
is the cheapest way to go.

1) Buy a higher end machine.

2) You can extend current machine by spreading files across multiple
storage servers or multiple partitions.

Jul 30 '06 #18

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Benoit St-Jean | last post by:
I am looking at options/ways to store 12 million gif/jpg images in a database. Either we store a link to the file or we store the image itself in the database. Images will range from 4k to 35k in...
8
by: Adam Clauss | last post by:
I have a folder containing many subfolders (and subfolders and....) all containing various .cs files. Is there any "easy" way to get them all added to the solution. Preferable would be that the...
2
by: Glen | last post by:
As I understand it, when the first page of an application is accessed, all ASPX/ASCX/etc. files in the same folder are compiled using the JIT compiler. Is there a way to turn this feature off? ...
8
by: vinesh | last post by:
I have sample Asp.Net Web Application project. Let me know how to keep the files related to this project (like the webform.aspx, WebForm1.aspx.vb, WebForm1.aspx.resx) in a separate folder within a...
8
by: Paw | last post by:
Greetings. I use asp. what I need is is when a visitor comes to the site, I need it to check the host name. if "www.hometowndigest.com" is the host, then check a folder named "something" and if...
5
by: Redmond | last post by:
Celerity is a powerful application for analysing C/C++ files. * It can process millions of source code lines. It supports standard C/C++. For each project, it shows the source files, include...
3
by: jaeden99 | last post by:
I was wandering if nyone has a script to move files older than x days old? i've seen several to delete, but I don't want to delete. I would like to create a backup of the files first verify with...
1
by: =?Utf-8?B?UVNJRGV2ZWxvcGVy?= | last post by:
Using .NET 2.0 is it more efficient to copy files to a single folder versus spreading them across multiple folders. For instance if we have 100,000 files to be copied, Do we copy all of them to...
8
by: =?Utf-8?B?QnJ5YW4=?= | last post by:
Hello group. I have some code (given to me), but I don't know alot about ASP, so I was hoping someone here can help. Running on Win 2008 server. The code below will scan a folder and subfolder...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.