473,569 Members | 2,446 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Storing file information in memory

I'm writing a command line utility to move some files. I'm dealing with
thousands of files and I was wondering if anyone had any suggestions.

This is what I have currently:

$arrayVirtualFi le =
array( 'filename'=>'fi lename',
'basename'=>'fi lename.ext',
'extension'=>'e xt',
'size'=>0,
'dirname'=>'',
'uxtimestamp'=> '');

I then loop through a directory and for each file I populate the $arrayVirtualFi le
and add it to $arrayOfVirtual Files.
A directory of ~2500 files takes up about ~1.7 MB of memory when I run
the script.
Anyone have any suggestions as to how to take up less space?

Thanks!!
Posted by NewsLook (Trial Licence) from http://www.ghytred.com/NewsLook/about.aspx

Nov 15 '07 #1
10 2285

"deciacco" <a@awrote in message
news:c6******** *************** *******@ghytred .com...
I'm writing a command line utility to move some files. I'm dealing with
thousands of files and I was wondering if anyone had any suggestions.

This is what I have currently:

$arrayVirtualFi le =
array( 'filename'=>'fi lename',
'basename'=>'fi lename.ext',
'extension'=>'e xt',
'size'=>0,
'dirname'=>'',
'uxtimestamp'=> '');

I then loop through a directory and for each file I populate the
$arrayVirtualFi le
and add it to $arrayOfVirtual Files.
A directory of ~2500 files takes up about ~1.7 MB of memory when I run
the script.
Anyone have any suggestions as to how to take up less space?
well, that all depends what you're doing with that information. plus, your
array structure is a must point. why not just store the file names in an
array. when you need all that info, just use the pathinfo() function. with
just that, so far you have the file name, basename, extension, path...all
you need now is to call fstat() to get the size and the touch time. that
should knock down your memory consumption monumentally. plus, using pathinfo
and fstat will give you a bunch more information that your current
structure.

so, store minimally what you need. then use functions to get the info when
you need it. but again, you should really define what you're doing this all
for...as in, once you have that info, what are you doing?
Nov 15 '07 #2
thanks for the reply steve...

basically, i want to collect the file information into memory so that I can
then do analysis, like compare file times and sizes. it's much faster to do
this in memory than to do it from disk. should have mentioned this earlier
as you said...

"Steve" <no****@example .comwrote in message
news:W9******** *******@newsfe0 2.lga...
>
"deciacco" <a@awrote in message
news:c6******** *************** *******@ghytred .com...
>I'm writing a command line utility to move some files. I'm dealing with
thousands of files and I was wondering if anyone had any suggestions.

This is what I have currently:

$arrayVirtualF ile =
array( 'filename'=>'fi lename',
'basename'=>'f ilename.ext',
'extension'=>' ext',
'size'=>0,
'dirname'=>' ',
'uxtimestamp'= >'');

I then loop through a directory and for each file I populate the
$arrayVirtualF ile
and add it to $arrayOfVirtual Files.
A directory of ~2500 files takes up about ~1.7 MB of memory when I run
the script.
Anyone have any suggestions as to how to take up less space?

well, that all depends what you're doing with that information. plus, your
array structure is a must point. why not just store the file names in an
array. when you need all that info, just use the pathinfo() function. with
just that, so far you have the file name, basename, extension, path...all
you need now is to call fstat() to get the size and the touch time. that
should knock down your memory consumption monumentally. plus, using
pathinfo and fstat will give you a bunch more information that your
current structure.

so, store minimally what you need. then use functions to get the info when
you need it. but again, you should really define what you're doing this
all for...as in, once you have that info, what are you doing?

Nov 15 '07 #3
deciacco wrote:
thanks for the reply steve...

basically, i want to collect the file information into memory so that I can
then do analysis, like compare file times and sizes. it's much faster to do
this in memory than to do it from disk. should have mentioned this earlier
as you said...
Why do you care how much memory it takes?

1.7MB is not very much.
Nov 16 '07 #4

"The Natural Philosopher" <a@b.cwrote in message
news:11******** *******@proxy00 .news.clara.net ...
deciacco wrote:
>thanks for the reply steve...

basically, i want to collect the file information into memory so that I
can then do analysis, like compare file times and sizes. it's much faster
to do this in memory than to do it from disk. should have mentioned this
earlier as you said...

Why do you care how much memory it takes?

1.7MB is not very much.
why do you care if he cares?

solve the problem!
Nov 16 '07 #5
These days memory is not an issue, but that does not mean we shouldn't write
good, efficient code that utilizes memory well.

While 1.7MB is not much, that is what is generated when I look at ~2500
files. I have approximately 175000 files to look at and my script uses up
about 130MB. I was simply wondering if someone out there with more
experience, had a better way of doing this that would utilize less memory.

"The Natural Philosopher" <a@b.cwrote in message
news:11******** *******@proxy00 .news.clara.net ...
deciacco wrote:
>thanks for the reply steve...

basically, i want to collect the file information into memory so that I
can then do analysis, like compare file times and sizes. it's much faster
to do this in memory than to do it from disk. should have mentioned this
earlier as you said...

Why do you care how much memory it takes?

1.7MB is not very much.

Nov 16 '07 #6
deciacco wrote:
"The Natural Philosopher" <a@b.cwrote in message
news:11******** *******@proxy00 .news.clara.net ...
>deciacco wrote:
>>thanks for the reply steve...

basically, i want to collect the file information into memory so that I
can then do analysis, like compare file times and sizes. it's much faster
to do this in memory than to do it from disk. should have mentioned this
earlier as you said...
Why do you care how much memory it takes?

1.7MB is not very much.

These days memory is not an issue, but that does not mean we shouldn't
write good, efficient code that utilizes memory well.
There is also something known as "premature optimization".
While 1.7MB is not much, that is what is generated when I look at
~2500 files. I have approximately 175000 files to look at and my
script uses up about 130MB. I was simply wondering if someone out
there with more experience, had a better way of doing this that would
utilize less memory.
(Top posting fixed)

How are you figuring your 1.7Mb? If you're just looking at how much
memory is being used by the process, for instance, there will be a lot
of other things in there, also - like your code.

1.7Mb for 2500 files comes out to just under 700 bytes per entry, which
seems rather a bit large to me. But it also depends on just how much
you're storing in the array (i.e. how long are your path names).

I also wonder why you feel a need to store so much info in memory, but
I'm sure you have a good reason.

P.S. Please don't top post. Thanks.
--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===

Nov 16 '07 #7
"Jerry Stuckle" <js*******@attg lobal.netwrote in message
news:Oa******** *************** *******@comcast .com...
deciacco wrote:
>"The Natural Philosopher" <a@b.cwrote in message
news:11******* ********@proxy0 0.news.clara.ne t...
>>deciacco wrote:
thanks for the reply steve...
basically, i want to collect the file information into memory so
that I can then do analysis, like compare file times and sizes.
it's much faster to do this in memory than to do it from disk.
should have mentioned this earlier as you said...
Why do you care how much memory it takes?
1.7MB is not very much.
These days memory is not an issue, but that does not mean we shouldn't
write good, efficient code that utilizes memory well.
There is also something known as "premature optimization".
>While 1.7MB is not much, that is what is generated when I look at
~2500 files. I have approximately 175000 files to look at and my
script uses up about 130MB. I was simply wondering if someone out
there with more experience, had a better way of doing this that would
utilize less memory.
(Top posting fixed)
How are you figuring your 1.7Mb? If you're just looking at how much
memory is being used by the process, for instance, there will be a lot of
other things in there, also - like your code.
1.7Mb for 2500 files comes out to just under 700 bytes per entry, which
seems rather a bit large to me. But it also depends on just how much
you're storing in the array (i.e. how long are your path names).
I also wonder why you feel a need to store so much info in memory, but I'm
sure you have a good reason.
P.S. Please don't top post. Thanks.
Jerry...

I use Outlook Express and it does top-posting by default. Didn't realize
top-posting was bad.

To answer your questions:

"Premature Optimization"
I first noticed this problem in my first program. It was running much slower
and taking up 5 times as much memory. I realized I needed to rethink my
code.

"Figuring Memory Use"
To get the amount of memory used, I take a reading with memory_get_usag e()
at the start of the code in question and then take another reading at the
end of the snippet. I then take the difference and that should give me a
good idea of the amount of memory my code is utilizing.

"Feel the Need"
The first post shows you an array of the type of data I store. This array
gets created for each file and added as an item to another array. In other
words, an array of arrays. As I mentioned in a fallow-up posting, the reason
I'm doing this is because I want to do some analysis of file information,
like comparing file times and sizes from two seperate directories. This is
much faster in memory than on disk.


Nov 16 '07 #8

"deciacco" <a@awrote in message
news:Xr******** *************** *******@giganew s.com...
"Jerry Stuckle" <js*******@attg lobal.netwrote in message
news:Oa******** *************** *******@comcast .com...
>deciacco wrote:
>>"The Natural Philosopher" <a@b.cwrote in message
news:11****** *********@proxy 00.news.clara.n et...
deciacco wrote:
thanks for the reply steve...
basically , i want to collect the file information into memory so
that I can then do analysis, like compare file times and sizes.
it's much faster to do this in memory than to do it from disk.
should have mentioned this earlier as you said...
Why do you care how much memory it takes?
1.7MB is not very much.
These days memory is not an issue, but that does not mean we shouldn't
write good, efficient code that utilizes memory well.
There is also something known as "premature optimization".
>>While 1.7MB is not much, that is what is generated when I look at
~2500 files. I have approximately 175000 files to look at and my
script uses up about 130MB. I was simply wondering if someone out
there with more experience, had a better way of doing this that would
utilize less memory.
(Top posting fixed)
How are you figuring your 1.7Mb? If you're just looking at how much
memory is being used by the process, for instance, there will be a lot of
other things in there, also - like your code.
1.7Mb for 2500 files comes out to just under 700 bytes per entry, which
seems rather a bit large to me. But it also depends on just how much
you're storing in the array (i.e. how long are your path names).
I also wonder why you feel a need to store so much info in memory, but
I'm sure you have a good reason.
P.S. Please don't top post. Thanks.

Jerry...

I use Outlook Express and it does top-posting by default. Didn't realize
top-posting was bad.
i use oe too. just hit ctrl+end immediately after hitting 'reply group'. a
usenet thread isn't like an email conversation where both parties already
know what was said in the previous coorespondence. top posting in usenet
forces *everyone* to start reading a post from the bottom up. this is
particularly painful when in-line responses are made...you have to not only
read from the bottom up, but find the start of a reponse, read down to see
the in-line response(s), then scroll back up past the start of that post
again.

tons of other reasons. we just ask that you know and try to follow as best
you can what usenet considers uniform/standard netiquette.
To answer your questions:
<snip>
"Feel the Need"
The first post shows you an array of the type of data I store. This array
gets created for each file and added as an item to another array. In other
words, an array of arrays. As I mentioned in a fallow-up posting, the
reason I'm doing this is because I want to do some analysis of file
information, like comparing file times and sizes from two seperate
directories. This is much faster in memory than on disk.
ok, for the comparisons...c onsider speed and memory consumption. if you were
to get a list of file names, your memory consumption would be at its bare
minimum (almost). when doing the comparison, you can vastly improve your
performance *and* maintainability by iterating through the files, getting
the file info, putting that info into a db, and then run queries against the
table. the db will beat your php comparison algorythms any day of the week.
plus, sql is formalized...so everyone will understand how you are making
your comparisons.

the only way to get lower memory consumption would be to, during the process
of listing files, DON'T store the file but immediately put all the
information into the db at that point. that will be the theoretical best
performance and memory utilization combination there can be.

btw, i posted this function in another group and someone asked today what
the hell it does. since it directly relates to what you're doing AND uses
pathinfo and fstat, which i mentioned to you briefly in this thread before,
i thought i'd post this example to help:

==============

<?
function listFiles($path = '.', $extension = array(), $combine = false)
{
$wd = getcwd();
$path .= substr($path, -1) != '/' ? '/' : '';
if (!chdir($path)) { return array(); }
if (!$extension){ $extension = array('*'); }
if (!is_array($ext ension)){ $extension = array($extensio n); }
$extensions = '*.{' . implode(',', $extension) . '}';
$files = glob($extension s, GLOB_BRACE);
chdir($wd);
if (!$files){ return array(); }
$list = array();
$path = $combine ? $path : '';
foreach ($files as $file)
{
$list[] = $path . $file;
}
return $list;
}
$files = listFiles('c:/inetpub/wwwroot/images', 'jpg', true);
$images = array();
foreach ($files as $file)
{
$fileInfo = pathinfo($file) ;
$handle = fopen($file, 'r');
$fileInfo = array_merge($fi leInfo, fstat($handle)) ;
fclose($handle) ;
for ($i = 0; $i < 13; $i++){ unset($fileInfo[$i]); }
echo '<pre>' . print_r($fileIn fo, true) . '</pre>';
}
?>
Nov 16 '07 #9
deciacco wrote:
"Jerry Stuckle" <js*******@attg lobal.netwrote in message
news:Oa******** *************** *******@comcast .com...
>deciacco wrote:
>>"The Natural Philosopher" <a@b.cwrote in message
news:11****** *********@proxy 00.news.clara.n et...
deciacco wrote:
thanks for the reply steve...
basically , i want to collect the file information into memory so
that I can then do analysis, like compare file times and sizes.
it's much faster to do this in memory than to do it from disk.
should have mentioned this earlier as you said...
Why do you care how much memory it takes?
1.7MB is not very much.
These days memory is not an issue, but that does not mean we shouldn't
write good, efficient code that utilizes memory well.
There is also something known as "premature optimization".
>>While 1.7MB is not much, that is what is generated when I look at
~2500 files. I have approximately 175000 files to look at and my
script uses up about 130MB. I was simply wondering if someone out
there with more experience, had a better way of doing this that would
utilize less memory.
(Top posting fixed)
How are you figuring your 1.7Mb? If you're just looking at how much
memory is being used by the process, for instance, there will be a lot of
other things in there, also - like your code.
1.7Mb for 2500 files comes out to just under 700 bytes per entry, which
seems rather a bit large to me. But it also depends on just how much
you're storing in the array (i.e. how long are your path names).
I also wonder why you feel a need to store so much info in memory, but I'm
sure you have a good reason.
P.S. Please don't top post. Thanks.

Jerry...

I use Outlook Express and it does top-posting by default. Didn't realize
top-posting was bad.
No problem. Recommendation - get Thunderbird. Much superior, and free :-)
To answer your questions:

"Premature Optimization"
I first noticed this problem in my first program. It was running much slower
and taking up 5 times as much memory. I realized I needed to rethink my
code.
OK, so you've identified a problem. Good.
"Figuring Memory Use"
To get the amount of memory used, I take a reading with memory_get_usag e()
at the start of the code in question and then take another reading at the
end of the snippet. I then take the difference and that should give me a
good idea of the amount of memory my code is utilizing.
At last - someone who knows how to figure memory usage correctly! :-)

But I'm still confused why it would take almost 700 bytes per entry on
average. The array overhead shouldn't be *that* bad.

"Feel the Need"
The first post shows you an array of the type of data I store. This array
gets created for each file and added as an item to another array. In other
words, an array of arrays. As I mentioned in a fallow-up posting, the reason
I'm doing this is because I want to do some analysis of file information,
like comparing file times and sizes from two seperate directories. This is
much faster in memory than on disk.

Yes, it would be faster to do the comparisons in memory. However, you
also need to consider the amount of time it takes to create your arrays.
It isn't minor compared to some other operations.

When you're searching for files on the disk, as you get the file info,
the first one will take a while because the system has to (probably)
fetch the info from disk. But this caches several file entries, so the
next few will be relatively quick, until the system has to hit the disk
again (a big enough cache and that might never happen).

However, at the same time, if you just read one file from each directory
(assuming you're comparing the same file names) and compare them, then
go to the next file, the cache will still probably be valid, unless your
system is heavily loaded with high CPU and disk utilization. So in that
case your current algorithm probably will be slower than reading one at
a time and comparing.

Of course, if you're doing multiple compares, i.a. 'a' from the first
directory with 'x', 'y' and 'z' from the second directory, this wouldn't
be the case.
--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===

Nov 16 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
2810
by: Steve | last post by:
Hi all i am just starting to get back into VB and i need a little help. I am writing a program that asks a user to type in a set of numbers/letters (in this case shipping containers). Once the data is entered i have my 4 letters and i want to be able to call up data relating to the 4 letters. Basically i want it to show who the...
2
8197
by: Tony | last post by:
Hi, I have dynamically created a PDF document in memory as a FileOutputStream Now I have to get it into a DB2 table, storing it as a BLOB. The table has a document id, document name, some date fields and this BLOB column that stores PDF Files. Until now, the PDF files were read off of a disk drive. The code used was: byte fileAsBytes =...
6
2560
by: Alfonso Morra | last post by:
I have written the following code, to test the concept of storing objects in a vector. I encounter two run time errors: 1). myClass gets destructed when pushed onto the vector 2). Prog throws a "SEGV" when run (presumably - attempt to delete deleted memory. Please take a look and see if you can notice any mistakes I'm making. Basically,...
5
2133
by: Don Vaillancourt | last post by:
I'm building a system when one can upload a document to the website. I will be storing the document on the hard-drive for quick/easy access, but I was also thinking of storing it in an existing database since most of the sites information is all stored there. As well there would be only one place to worry about backing up. And if the...
22
2432
by: Wynand Winterbach | last post by:
I think every C programmer can relate to the frustrations that malloc allocated arrays bring. In particular, I've always found the fact that the size of an array must be stored separately to be a nightmare. There are of course many solutions, but they all end up forcing you to abandon the array syntax in favour of macros or functions. Now...
1
5045
by: Ritu | last post by:
How to store connection strings in machine.config?What are the advantages of storing it in the machine.config file.Can u provide me with the information regarding accessibility of machine.config file,meaning thereby,who can access thisfile and who cannot.An early reply would be appreciated.
4
1293
by: kanones | last post by:
I have some data that is been retrieved from a call to sql server stored procedure that I want to store for a period of time in a web farm architecture. I want to minimize the calls to sql server as much as possible. Storing it in application cache will result the calls to be made if the users are bounced from one server to another. But is...
6
3187
by: (PeteCresswell) | last post by:
User wants to go this route instead of storing pointers in the DB and the documents outside. Only time I tried it was with only MS Word docs - and that was a loooong time ago - and it seemed to me like there were performance issues at the time. How about the different types? The MS docs I would expect Access to differentiate and handle...
4
3917
by: lorirobn | last post by:
Hi, I need to add photos to my database. Back End is on MS SQL Server (I believe 2000), and Front End is on MS Access. I have read about storing the photos as BLOBS, but I am not sure how to do this with SQL Server. Does this mean store the photo as OLE image, but do something else to it to make it a "Blob"? I have also read about...
0
7721
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7633
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
1
7699
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
6320
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
5247
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3669
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2130
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1238
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
971
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.