473,387 Members | 1,575 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Reducing memory consumption

I'm using PHP to run a CLI application. It's a script run by cron that
parses some HTML files (with DOM XML), and I ended up using PHP to integrate with
the rest of the code that already runs the website.

The problem is: it's eating more memory than a black hole. It eats the
current limit of 256MB set in php.ini, in an application that would hardly
consume 4MB if written in C. I don't care if this application takes much longer
to run than it would in C, but eating that much memory is not acceptable.

So, my question is, how do I find out what is eating that much memory?
I'm suspicious of memory leaks, or very stupid garbage collection. Any help?

--
Bruno Barberi Gnecco <brunobg_at_users.sourceforge.net>
This was the most unkindest cut of all.
-- William Shakespeare, "Julius Caesar"
Apr 24 '07 #1
9 9201
Bruno Barberi Gnecco wrote:
I'm using PHP to run a CLI application. It's a script run by cron that
parses some HTML files (with DOM XML), and I ended up using PHP to
integrate with
the rest of the code that already runs the website.

The problem is: it's eating more memory than a black hole. It eats the
current limit of 256MB set in php.ini, in an application that would hardly
consume 4MB if written in C. I don't care if this application takes much
longer
to run than it would in C, but eating that much memory is not acceptable.

So, my question is, how do I find out what is eating that much memory?
I'm suspicious of memory leaks, or very stupid garbage collection. Any
help?
Without knowing what your application does, it's impossible to tell.

But I know I've handled some very large files (i.e. log files, XML,
etc.) in 8MB of memory without any problems.

I've even parsed a (rather poorly written) html page that's 10Mb and
still not run out of memory at 8MB.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Apr 24 '07 #2
Jerry Stuckle wrote:
Bruno Barberi Gnecco wrote:
> I'm using PHP to run a CLI application. It's a script run by cron
that
parses some HTML files (with DOM XML), and I ended up using PHP to
integrate with
the rest of the code that already runs the website.

The problem is: it's eating more memory than a black hole. It eats
the
current limit of 256MB set in php.ini, in an application that would
hardly
consume 4MB if written in C. I don't care if this application takes
much longer
to run than it would in C, but eating that much memory is not acceptable.

So, my question is, how do I find out what is eating that much
memory?
I'm suspicious of memory leaks, or very stupid garbage collection. Any
help?

Without knowing what your application does, it's impossible to tell.

But I know I've handled some very large files (i.e. log files, XML,
etc.) in 8MB of memory without any problems.

I've even parsed a (rather poorly written) html page that's 10Mb and
still not run out of memory at 8MB.
Exactly, that's why I'm puzzled by this. What the application
does is very simple: it opens an IMAP connection, and for each email,
it parses the HTML body to extract some information out of it, and
saves this information into a database. THe HTML files are less than
1MB, and number of messages read is small (< 20). Since the information
is parsed by pieces, the memory used by it should peak at 10kb or 20kb.

The parsing is done using DOM (not DOM XML, as I wrote before,
my mistake) and xpath queries. The parsing is done in a separate method,
so I was expecting that any memory allocated for parsing a message
would be freed before the next one is parsed. I'm using php 5.

What did you use to parse your page? DOM? DOM XML? Something
else?

Any tips? Thanks!

--
Bruno Barberi Gnecco <brunobg_at_users.sourceforge.net>
It takes a smart husband to have the last word and not use it.
Apr 24 '07 #3
ljb wrote:
br***************@users.sourceforge.net wrote:
>> I'm using PHP to run a CLI application. It's a script run by cron that
parses some HTML files (with DOM XML), and I ended up using PHP to integrate with
the rest of the code that already runs the website.

The problem is: it's eating more memory than a black hole. It eats the
current limit of 256MB set in php.ini, in an application that would hardly
consume 4MB if written in C. I don't care if this application takes much longer
to run than it would in C, but eating that much memory is not acceptable.

So, my question is, how do I find out what is eating that much memory?
I'm suspicious of memory leaks, or very stupid garbage collection. Any help?


Different sort of problem, but I struggled with a long-running script that
leaked a bit of memory on each loop, and after a few days/weeks was using
too much memory. What I had to do was "instrument" the PHP script,
inserting calls to report current memory usage at frequent intervals. There
are two ways to do this. memory_get_usage() might be available (depends
on how PHP was built), but I think it only reports memory used by PHP
allocators. (Didn't help me, because the leak turned out to be in a loaded
extension.) The other way (on Linux, for example) is to look at
/proc/meminfo for total memory usage. Do this often enough in your script,
and you should be able to narrow down where the memory is being lost.
Thanks. I did this, and the memory is apparently being lost by
the INSERT query. prepare/execute leak *a lot* of memory, while a simple
query() still leaks, but much less.

Thinking it might be related to this bug:
http://bugs.php.net/bug.php?id=39885 (memory leak when doing loads of
INSERT's and there are duplicate key errors), I added a SELECT to avoid
the errors. This had an extraordinary effect: memory consumption started
to decrease, and soon I was using negative memory. When the program ended,
memory_get_usage() was returning '-9415900'.

Since this universe doesn't allow negative memory usage, I
wonder WTF is going on. I'm using MDB2 as DB frontend, BTW, which
may be the culprit here. What frees more memory than it allocated is
a call to query('SELECT ...'). Removing this call leads back to the
endless growing memory problem.

Any ideas to find out what is causing this: mysql, php,
mdb2?

--
Bruno Barberi Gnecco <brunobg_at_users.sourceforge.net>
Cropp's Law:
The amount of work done varies inversly
with the time spent in the office.
Apr 24 '07 #4

"Bruno Barberi Gnecco" <br***************@users.sourceforge.netwrote in
message news:f0********@news3.newsguy.com...
memory_get_usage() was returning '-9415900'.
This effect is due to the original "black hole"
Your code has passed through a worm-hole into a parallell process which
assigns memory inversly :)
Apr 24 '07 #5
Vince Morgan wrote:
"Bruno Barberi Gnecco" <br***************@users.sourceforge.netwrote in
message news:f0********@news3.newsguy.com...

>>memory_get_usage() was returning '-9415900'.


This effect is due to the original "black hole"
Your code has passed through a worm-hole into a parallell process which
assigns memory inversly :)

Of course! You just gave me the breakthrough I was looking for
to get my Nobel :) Or perhaps I can create my own 'infinite storage'
service, only the data will be stored in a write-only location ;)

--
Bruno Barberi Gnecco <brunobg_at_users.sourceforge.net>
Apr 24 '07 #6
Bruno Barberi Gnecco wrote:
Jerry Stuckle wrote:
>Bruno Barberi Gnecco wrote:
>> I'm using PHP to run a CLI application. It's a script run by cron
that
parses some HTML files (with DOM XML), and I ended up using PHP to
integrate with
the rest of the code that already runs the website.

The problem is: it's eating more memory than a black hole. It
eats the
current limit of 256MB set in php.ini, in an application that would
hardly
consume 4MB if written in C. I don't care if this application takes
much longer
to run than it would in C, but eating that much memory is not
acceptable.

So, my question is, how do I find out what is eating that much
memory?
I'm suspicious of memory leaks, or very stupid garbage collection.
Any help?

Without knowing what your application does, it's impossible to tell.

But I know I've handled some very large files (i.e. log files, XML,
etc.) in 8MB of memory without any problems.

I've even parsed a (rather poorly written) html page that's 10Mb and
still not run out of memory at 8MB.

Exactly, that's why I'm puzzled by this. What the application
does is very simple: it opens an IMAP connection, and for each email,
it parses the HTML body to extract some information out of it, and
saves this information into a database. THe HTML files are less than
1MB, and number of messages read is small (< 20). Since the information
is parsed by pieces, the memory used by it should peak at 10kb or 20kb.

The parsing is done using DOM (not DOM XML, as I wrote before,
my mistake) and xpath queries. The parsing is done in a separate method,
so I was expecting that any memory allocated for parsing a message
would be freed before the next one is parsed. I'm using php 5.

What did you use to parse your page? DOM? DOM XML? Something
else?

Any tips? Thanks!
No, I wasn't using DOM on this one - just stripping out the tags.

However, the DOM does a lot of things behind the scenes. For instance,
when you call DOMDocument::getElementsByTagName(), DOM will allocate an
entire nodelist. And this nodelist will contain everything under each
node in the list.

So if you do something like:

$doc = new DOMDocument;
$doc->load("inputfile.xml");

You'll get the entire document into the DOMDocument. Now, if you:

$l1 = $doc->getElementsByTagName('level1');

You'll get a nodelist with all the level 1 tags. But each entry in the
nodelist will contain all of the elements under it - level 2, level 3,
and so on.

So if you have a layout such as:

<level1>
<level2 />
<level2>
<level3 />
</level2>
</level1>

Your DOMDocument will contain all the items - but so will the nodelist.
Effectively you've about doubled the amount of memory being required.

If you now get the level2's, you'll have two entries - one which is just
a level2, but the second one will have level2 and level3.

So you can see memory usage can increase a lot, especially if you have a
lot of lower levels.

And BTW - depending on the amount of whitespace in your XML file, even
the DOMDocument object may take more or less memory than the file itself.

The problem here is the DOMNodeList doesn't have a method to remove an
entry from the list. I don't know what

unset(nodelist->item($i));

would do - but I don't think I'd try it. I suspect the DOMNodeList
would have problems with it.

The only thing I can recommend is to unset the nodelists themselves as
soon as possible. That should free up the memory used by them.

Of course, there's another possibility here, also - that there's a
memory leak in it. I haven't seen one - but then I can't say as I've
done anything as big as you are, and I haven't looked for problems. And
a search of the PHP bugs database doesn't show anything being reported.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Apr 24 '07 #7
Jerry Stuckle wrote:
Bruno Barberi Gnecco wrote:
>Jerry Stuckle wrote:
>>Bruno Barberi Gnecco wrote:

I'm using PHP to run a CLI application. It's a script run by
cron that
parses some HTML files (with DOM XML), and I ended up using PHP to
integrate with
the rest of the code that already runs the website.

The problem is: it's eating more memory than a black hole. It
eats the
current limit of 256MB set in php.ini, in an application that would
hardly
consume 4MB if written in C. I don't care if this application takes
much longer
to run than it would in C, but eating that much memory is not
acceptable.

So, my question is, how do I find out what is eating that much
memory?
I'm suspicious of memory leaks, or very stupid garbage collection.
Any help?
Without knowing what your application does, it's impossible to tell.

But I know I've handled some very large files (i.e. log files, XML,
etc.) in 8MB of memory without any problems.

I've even parsed a (rather poorly written) html page that's 10Mb
and still not run out of memory at 8MB.


Exactly, that's why I'm puzzled by this. What the application
does is very simple: it opens an IMAP connection, and for each email,
it parses the HTML body to extract some information out of it, and
saves this information into a database. THe HTML files are less than
1MB, and number of messages read is small (< 20). Since the information
is parsed by pieces, the memory used by it should peak at 10kb or 20kb.

The parsing is done using DOM (not DOM XML, as I wrote before,
my mistake) and xpath queries. The parsing is done in a separate method,
so I was expecting that any memory allocated for parsing a message
would be freed before the next one is parsed. I'm using php 5.

What did you use to parse your page? DOM? DOM XML? Something
else?

Any tips? Thanks!

No, I wasn't using DOM on this one - just stripping out the tags.

However, the DOM does a lot of things behind the scenes. For instance,
when you call DOMDocument::getElementsByTagName(), DOM will allocate an
entire nodelist. And this nodelist will contain everything under each
node in the list.

So if you do something like:

$doc = new DOMDocument;
$doc->load("inputfile.xml");

You'll get the entire document into the DOMDocument. Now, if you:

$l1 = $doc->getElementsByTagName('level1');

You'll get a nodelist with all the level 1 tags. But each entry in the
nodelist will contain all of the elements under it - level 2, level 3,
and so on.

So if you have a layout such as:

<level1>
<level2 />
<level2>
<level3 />
</level2>
</level1>

Your DOMDocument will contain all the items - but so will the nodelist.
Effectively you've about doubled the amount of memory being required.

If you now get the level2's, you'll have two entries - one which is just
a level2, but the second one will have level2 and level3.

So you can see memory usage can increase a lot, especially if you have a
lot of lower levels.

And BTW - depending on the amount of whitespace in your XML file, even
the DOMDocument object may take more or less memory than the file itself.

The problem here is the DOMNodeList doesn't have a method to remove an
entry from the list. I don't know what

unset(nodelist->item($i));

would do - but I don't think I'd try it. I suspect the DOMNodeList
would have problems with it.

The only thing I can recommend is to unset the nodelists themselves as
soon as possible. That should free up the memory used by them.

Of course, there's another possibility here, also - that there's a
memory leak in it. I haven't seen one - but then I can't say as I've
done anything as big as you are, and I haven't looked for problems. And
a search of the PHP bugs database doesn't show anything being reported.
As I mentioned in the other post, I found out that it isn't
DOM eating all the memory, but the SQL queries. Apparently I ran into
two bugs:

1) prepare/Execute has a memory leak. This could be happening in MDB2
or in PHP itself, perhaps in the mysqli extension. This happens
consistently eventually exhausts memory.

2) there is a problem in mysqli queries that seem to confuse the
allocated memory counting, but it's not a serious bug (i.e., it
doesn't crash. I successfully completed a long run of my script,
which added some 27k entries to the database. Despite the memory
becoming negative, it didn't crash, and apparently there was no
corruption or unexpected results (not that I could see so far).

In this successful #2 run, what I did was get the mysqli
connection from mdb2 (with getConnection()) and run mysqli_query()
directly (and OMG, how slow mdb2 is!). So this problem isn't in
MDB2: it's either in PHP itself or in the mysqli extension. My
*guess* is that PHP memory system is counting something wrong
when it allocates memory. I watched top(1) while the script
ran, and it didn't consume a lot of memory (10-16 MB), which is
a little more than I'd expect, but I was including MDB2 and
other stuff. If I didn't exhaust the memory first, I'd have
never noticed that the memory count was negative.

I'm still at a loss of whom should I report this bug
to. Any suggestions?

--
Bruno Barberi Gnecco <brunobg_at_users.sourceforge.net>
It's always darkest just before it gets pitch black.
Apr 24 '07 #8
Bruno Barberi Gnecco wrote:
Jerry Stuckle wrote:
>Bruno Barberi Gnecco wrote:
>>Jerry Stuckle wrote:

Bruno Barberi Gnecco wrote:

I'm using PHP to run a CLI application. It's a script run by
cron that
parses some HTML files (with DOM XML), and I ended up using PHP to
integrate with
the rest of the code that already runs the website.
>
The problem is: it's eating more memory than a black hole. It
eats the
current limit of 256MB set in php.ini, in an application that would
hardly
consume 4MB if written in C. I don't care if this application takes
much longer
to run than it would in C, but eating that much memory is not
acceptable.
>
So, my question is, how do I find out what is eating that much
memory?
I'm suspicious of memory leaks, or very stupid garbage collection.
Any help?
>

Without knowing what your application does, it's impossible to tell.

But I know I've handled some very large files (i.e. log files, XML,
etc.) in 8MB of memory without any problems.

I've even parsed a (rather poorly written) html page that's 10Mb
and still not run out of memory at 8MB.
Exactly, that's why I'm puzzled by this. What the application
does is very simple: it opens an IMAP connection, and for each email,
it parses the HTML body to extract some information out of it, and
saves this information into a database. THe HTML files are less than
1MB, and number of messages read is small (< 20). Since the information
is parsed by pieces, the memory used by it should peak at 10kb or 20kb.

The parsing is done using DOM (not DOM XML, as I wrote before,
my mistake) and xpath queries. The parsing is done in a separate method,
so I was expecting that any memory allocated for parsing a message
would be freed before the next one is parsed. I'm using php 5.

What did you use to parse your page? DOM? DOM XML? Something
else?

Any tips? Thanks!

No, I wasn't using DOM on this one - just stripping out the tags.

However, the DOM does a lot of things behind the scenes. For
instance, when you call DOMDocument::getElementsByTagName(), DOM will
allocate an entire nodelist. And this nodelist will contain
everything under each node in the list.

So if you do something like:

$doc = new DOMDocument;
$doc->load("inputfile.xml");

You'll get the entire document into the DOMDocument. Now, if you:

$l1 = $doc->getElementsByTagName('level1');

You'll get a nodelist with all the level 1 tags. But each entry in
the nodelist will contain all of the elements under it - level 2,
level 3, and so on.

So if you have a layout such as:

<level1>
<level2 />
<level2>
<level3 />
</level2>
</level1>

Your DOMDocument will contain all the items - but so will the
nodelist. Effectively you've about doubled the amount of memory being
required.

If you now get the level2's, you'll have two entries - one which is
just a level2, but the second one will have level2 and level3.

So you can see memory usage can increase a lot, especially if you have
a lot of lower levels.

And BTW - depending on the amount of whitespace in your XML file, even
the DOMDocument object may take more or less memory than the file itself.

The problem here is the DOMNodeList doesn't have a method to remove an
entry from the list. I don't know what

unset(nodelist->item($i));

would do - but I don't think I'd try it. I suspect the DOMNodeList
would have problems with it.

The only thing I can recommend is to unset the nodelists themselves as
soon as possible. That should free up the memory used by them.

Of course, there's another possibility here, also - that there's a
memory leak in it. I haven't seen one - but then I can't say as I've
done anything as big as you are, and I haven't looked for problems.
And a search of the PHP bugs database doesn't show anything being
reported.

As I mentioned in the other post, I found out that it isn't
DOM eating all the memory, but the SQL queries. Apparently I ran into
two bugs:

1) prepare/Execute has a memory leak. This could be happening in MDB2
or in PHP itself, perhaps in the mysqli extension. This happens
consistently eventually exhausts memory.

2) there is a problem in mysqli queries that seem to confuse the
allocated memory counting, but it's not a serious bug (i.e., it
doesn't crash. I successfully completed a long run of my script,
which added some 27k entries to the database. Despite the memory
becoming negative, it didn't crash, and apparently there was no
corruption or unexpected results (not that I could see so far).

In this successful #2 run, what I did was get the mysqli
connection from mdb2 (with getConnection()) and run mysqli_query()
directly (and OMG, how slow mdb2 is!). So this problem isn't in
MDB2: it's either in PHP itself or in the mysqli extension. My
*guess* is that PHP memory system is counting something wrong
when it allocates memory. I watched top(1) while the script
ran, and it didn't consume a lot of memory (10-16 MB), which is
a little more than I'd expect, but I was including MDB2 and
other stuff. If I didn't exhaust the memory first, I'd have
never noticed that the memory count was negative.

I'm still at a loss of whom should I report this bug
to. Any suggestions?
Yes, I read your other posts after I responded.

PHP bugs are managed at http://www.php.net. Pear bugs are at
http://pear.php.net

I'm not sure which one it would be, either. But you'll need to create
the problem with a *small* test case so they can duplicate it.
Otherwise they don't stand much of a chance of finding the bug.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Apr 24 '07 #9
"Bruno Barberi Gnecco" <br***************@users.sourceforge.netwrote in
message news:f0*********@news2.newsguy.com...
Vince Morgan wrote:
"Bruno Barberi Gnecco" <br***************@users.sourceforge.netwrote
in
message news:f0********@news3.newsguy.com...

>memory_get_usage() was returning '-9415900'.

This effect is due to the original "black hole"
Your code has passed through a worm-hole into a parallell process which
assigns memory inversly :)

Of course! You just gave me the breakthrough I was looking for
to get my Nobel :) Or perhaps I can create my own 'infinite storage'
service, only the data will be stored in a write-only location ;)
The infinite storage service is true genius Bruno, can't see how I didn't
see it myself. Perhaps it may be possible to make this service read/write
with some considerable effort. However, you would have to manage the
downloads carefully. Allowing too many would quickly exhaust the available
storage..
If you should manage to achieve this I would expect a share of the financial
proceeds of course. However, you are welcome to the glory. I always get a
large volcanic zit in the center of my forehead an hour or so before
recieving prestigious international awards ;)
Apr 25 '07 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Greg Merideth | last post by:
I've written a basic windows service to provide some helper xml functions for my web methods and even thou the service is only about 1k lines long with 1 timer, its mem usage is 10m and its vm mem...
6
by: Andy | last post by:
Along with many others I've noticed the large amount of memory that can be taken up by the aspnet_wp.exe. I've found that I can better control and limit this memory consumption by including a...
1
by: anandav2001 | last post by:
Hello developers, I have created an executable(system tray application) in VS.net 2003 using VB.net. My app was taking 30 MB memory(since some web services call are there which happens for each...
7
by: Salvador | last post by:
Hi, I am using WMI to gather information about different computers (using win2K and win 2K3), checking common classes and also WMI load balance. My application runs every 1 minute and reports...
8
by: Raja Gregory | last post by:
Hi All, I have developed a server application in C#. While running this application, it is not running more than 30 minutes due to memory leak. Could you tell me what will be the problem?
12
by: Tony | last post by:
Hi expert, I installed DB2 v8.2 server on Solaris 9 box. When I connect to DB2 using control centre or other applications(except command line), around 12 db2sysc processes pop up and each one...
1
by: buu | last post by:
It's strange to me, but, create a dictionary and fill it with 1 mil. of some objects. then, see the memory consumption (arised, of course). then, clean the dictionary.... memory consumption is...
10
by: deciacco | last post by:
I'm writing a command line utility to move some files. I'm dealing with thousands of files and I was wondering if anyone had any suggestions. This is what I have currently: $arrayVirtualFile =...
17
by: Cesar | last post by:
Hello people. I'm having a Winform app that contains a webbrowser control that keeps navigating from one page to another permanentrly to make some tests. The problem I'm having is that after a...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.