By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,949 Members | 1,855 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,949 IT Pros & Developers. It's quick & easy.

Reducing memory consumption

P: n/a
I'm using PHP to run a CLI application. It's a script run by cron that
parses some HTML files (with DOM XML), and I ended up using PHP to integrate with
the rest of the code that already runs the website.

The problem is: it's eating more memory than a black hole. It eats the
current limit of 256MB set in php.ini, in an application that would hardly
consume 4MB if written in C. I don't care if this application takes much longer
to run than it would in C, but eating that much memory is not acceptable.

So, my question is, how do I find out what is eating that much memory?
I'm suspicious of memory leaks, or very stupid garbage collection. Any help?

--
Bruno Barberi Gnecco <brunobg_at_users.sourceforge.net>
This was the most unkindest cut of all.
-- William Shakespeare, "Julius Caesar"
Apr 24 '07 #1
Share this Question
Share on Google+
9 Replies


P: n/a
Bruno Barberi Gnecco wrote:
I'm using PHP to run a CLI application. It's a script run by cron that
parses some HTML files (with DOM XML), and I ended up using PHP to
integrate with
the rest of the code that already runs the website.

The problem is: it's eating more memory than a black hole. It eats the
current limit of 256MB set in php.ini, in an application that would hardly
consume 4MB if written in C. I don't care if this application takes much
longer
to run than it would in C, but eating that much memory is not acceptable.

So, my question is, how do I find out what is eating that much memory?
I'm suspicious of memory leaks, or very stupid garbage collection. Any
help?
Without knowing what your application does, it's impossible to tell.

But I know I've handled some very large files (i.e. log files, XML,
etc.) in 8MB of memory without any problems.

I've even parsed a (rather poorly written) html page that's 10Mb and
still not run out of memory at 8MB.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Apr 24 '07 #2

P: n/a
Jerry Stuckle wrote:
Bruno Barberi Gnecco wrote:
> I'm using PHP to run a CLI application. It's a script run by cron
that
parses some HTML files (with DOM XML), and I ended up using PHP to
integrate with
the rest of the code that already runs the website.

The problem is: it's eating more memory than a black hole. It eats
the
current limit of 256MB set in php.ini, in an application that would
hardly
consume 4MB if written in C. I don't care if this application takes
much longer
to run than it would in C, but eating that much memory is not acceptable.

So, my question is, how do I find out what is eating that much
memory?
I'm suspicious of memory leaks, or very stupid garbage collection. Any
help?

Without knowing what your application does, it's impossible to tell.

But I know I've handled some very large files (i.e. log files, XML,
etc.) in 8MB of memory without any problems.

I've even parsed a (rather poorly written) html page that's 10Mb and
still not run out of memory at 8MB.
Exactly, that's why I'm puzzled by this. What the application
does is very simple: it opens an IMAP connection, and for each email,
it parses the HTML body to extract some information out of it, and
saves this information into a database. THe HTML files are less than
1MB, and number of messages read is small (< 20). Since the information
is parsed by pieces, the memory used by it should peak at 10kb or 20kb.

The parsing is done using DOM (not DOM XML, as I wrote before,
my mistake) and xpath queries. The parsing is done in a separate method,
so I was expecting that any memory allocated for parsing a message
would be freed before the next one is parsed. I'm using php 5.

What did you use to parse your page? DOM? DOM XML? Something
else?

Any tips? Thanks!

--
Bruno Barberi Gnecco <brunobg_at_users.sourceforge.net>
It takes a smart husband to have the last word and not use it.
Apr 24 '07 #3

P: n/a
ljb wrote:
br***************@users.sourceforge.net wrote:
>> I'm using PHP to run a CLI application. It's a script run by cron that
parses some HTML files (with DOM XML), and I ended up using PHP to integrate with
the rest of the code that already runs the website.

The problem is: it's eating more memory than a black hole. It eats the
current limit of 256MB set in php.ini, in an application that would hardly
consume 4MB if written in C. I don't care if this application takes much longer
to run than it would in C, but eating that much memory is not acceptable.

So, my question is, how do I find out what is eating that much memory?
I'm suspicious of memory leaks, or very stupid garbage collection. Any help?


Different sort of problem, but I struggled with a long-running script that
leaked a bit of memory on each loop, and after a few days/weeks was using
too much memory. What I had to do was "instrument" the PHP script,
inserting calls to report current memory usage at frequent intervals. There
are two ways to do this. memory_get_usage() might be available (depends
on how PHP was built), but I think it only reports memory used by PHP
allocators. (Didn't help me, because the leak turned out to be in a loaded
extension.) The other way (on Linux, for example) is to look at
/proc/meminfo for total memory usage. Do this often enough in your script,
and you should be able to narrow down where the memory is being lost.
Thanks. I did this, and the memory is apparently being lost by
the INSERT query. prepare/execute leak *a lot* of memory, while a simple
query() still leaks, but much less.

Thinking it might be related to this bug:
http://bugs.php.net/bug.php?id=39885 (memory leak when doing loads of
INSERT's and there are duplicate key errors), I added a SELECT to avoid
the errors. This had an extraordinary effect: memory consumption started
to decrease, and soon I was using negative memory. When the program ended,
memory_get_usage() was returning '-9415900'.

Since this universe doesn't allow negative memory usage, I
wonder WTF is going on. I'm using MDB2 as DB frontend, BTW, which
may be the culprit here. What frees more memory than it allocated is
a call to query('SELECT ...'). Removing this call leads back to the
endless growing memory problem.

Any ideas to find out what is causing this: mysql, php,
mdb2?

--
Bruno Barberi Gnecco <brunobg_at_users.sourceforge.net>
Cropp's Law:
The amount of work done varies inversly
with the time spent in the office.
Apr 24 '07 #4

P: n/a

"Bruno Barberi Gnecco" <br***************@users.sourceforge.netwrote in
message news:f0********@news3.newsguy.com...
memory_get_usage() was returning '-9415900'.
This effect is due to the original "black hole"
Your code has passed through a worm-hole into a parallell process which
assigns memory inversly :)
Apr 24 '07 #5

P: n/a
Vince Morgan wrote:
"Bruno Barberi Gnecco" <br***************@users.sourceforge.netwrote in
message news:f0********@news3.newsguy.com...

>>memory_get_usage() was returning '-9415900'.


This effect is due to the original "black hole"
Your code has passed through a worm-hole into a parallell process which
assigns memory inversly :)

Of course! You just gave me the breakthrough I was looking for
to get my Nobel :) Or perhaps I can create my own 'infinite storage'
service, only the data will be stored in a write-only location ;)

--
Bruno Barberi Gnecco <brunobg_at_users.sourceforge.net>
Apr 24 '07 #6

P: n/a
Bruno Barberi Gnecco wrote:
Jerry Stuckle wrote:
>Bruno Barberi Gnecco wrote:
>> I'm using PHP to run a CLI application. It's a script run by cron
that
parses some HTML files (with DOM XML), and I ended up using PHP to
integrate with
the rest of the code that already runs the website.

The problem is: it's eating more memory than a black hole. It
eats the
current limit of 256MB set in php.ini, in an application that would
hardly
consume 4MB if written in C. I don't care if this application takes
much longer
to run than it would in C, but eating that much memory is not
acceptable.

So, my question is, how do I find out what is eating that much
memory?
I'm suspicious of memory leaks, or very stupid garbage collection.
Any help?

Without knowing what your application does, it's impossible to tell.

But I know I've handled some very large files (i.e. log files, XML,
etc.) in 8MB of memory without any problems.

I've even parsed a (rather poorly written) html page that's 10Mb and
still not run out of memory at 8MB.

Exactly, that's why I'm puzzled by this. What the application
does is very simple: it opens an IMAP connection, and for each email,
it parses the HTML body to extract some information out of it, and
saves this information into a database. THe HTML files are less than
1MB, and number of messages read is small (< 20). Since the information
is parsed by pieces, the memory used by it should peak at 10kb or 20kb.

The parsing is done using DOM (not DOM XML, as I wrote before,
my mistake) and xpath queries. The parsing is done in a separate method,
so I was expecting that any memory allocated for parsing a message
would be freed before the next one is parsed. I'm using php 5.

What did you use to parse your page? DOM? DOM XML? Something
else?

Any tips? Thanks!
No, I wasn't using DOM on this one - just stripping out the tags.

However, the DOM does a lot of things behind the scenes. For instance,
when you call DOMDocument::getElementsByTagName(), DOM will allocate an
entire nodelist. And this nodelist will contain everything under each
node in the list.

So if you do something like:

$doc = new DOMDocument;
$doc->load("inputfile.xml");

You'll get the entire document into the DOMDocument. Now, if you:

$l1 = $doc->getElementsByTagName('level1');

You'll get a nodelist with all the level 1 tags. But each entry in the
nodelist will contain all of the elements under it - level 2, level 3,
and so on.

So if you have a layout such as:

<level1>
<level2 />
<level2>
<level3 />
</level2>
</level1>

Your DOMDocument will contain all the items - but so will the nodelist.
Effectively you've about doubled the amount of memory being required.

If you now get the level2's, you'll have two entries - one which is just
a level2, but the second one will have level2 and level3.

So you can see memory usage can increase a lot, especially if you have a
lot of lower levels.

And BTW - depending on the amount of whitespace in your XML file, even
the DOMDocument object may take more or less memory than the file itself.

The problem here is the DOMNodeList doesn't have a method to remove an
entry from the list. I don't know what

unset(nodelist->item($i));

would do - but I don't think I'd try it. I suspect the DOMNodeList
would have problems with it.

The only thing I can recommend is to unset the nodelists themselves as
soon as possible. That should free up the memory used by them.

Of course, there's another possibility here, also - that there's a
memory leak in it. I haven't seen one - but then I can't say as I've
done anything as big as you are, and I haven't looked for problems. And
a search of the PHP bugs database doesn't show anything being reported.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Apr 24 '07 #7

P: n/a
Jerry Stuckle wrote:
Bruno Barberi Gnecco wrote:
>Jerry Stuckle wrote:
>>Bruno Barberi Gnecco wrote:

I'm using PHP to run a CLI application. It's a script run by
cron that
parses some HTML files (with DOM XML), and I ended up using PHP to
integrate with
the rest of the code that already runs the website.

The problem is: it's eating more memory than a black hole. It
eats the
current limit of 256MB set in php.ini, in an application that would
hardly
consume 4MB if written in C. I don't care if this application takes
much longer
to run than it would in C, but eating that much memory is not
acceptable.

So, my question is, how do I find out what is eating that much
memory?
I'm suspicious of memory leaks, or very stupid garbage collection.
Any help?
Without knowing what your application does, it's impossible to tell.

But I know I've handled some very large files (i.e. log files, XML,
etc.) in 8MB of memory without any problems.

I've even parsed a (rather poorly written) html page that's 10Mb
and still not run out of memory at 8MB.


Exactly, that's why I'm puzzled by this. What the application
does is very simple: it opens an IMAP connection, and for each email,
it parses the HTML body to extract some information out of it, and
saves this information into a database. THe HTML files are less than
1MB, and number of messages read is small (< 20). Since the information
is parsed by pieces, the memory used by it should peak at 10kb or 20kb.

The parsing is done using DOM (not DOM XML, as I wrote before,
my mistake) and xpath queries. The parsing is done in a separate method,
so I was expecting that any memory allocated for parsing a message
would be freed before the next one is parsed. I'm using php 5.

What did you use to parse your page? DOM? DOM XML? Something
else?

Any tips? Thanks!

No, I wasn't using DOM on this one - just stripping out the tags.

However, the DOM does a lot of things behind the scenes. For instance,
when you call DOMDocument::getElementsByTagName(), DOM will allocate an
entire nodelist. And this nodelist will contain everything under each
node in the list.

So if you do something like:

$doc = new DOMDocument;
$doc->load("inputfile.xml");

You'll get the entire document into the DOMDocument. Now, if you:

$l1 = $doc->getElementsByTagName('level1');

You'll get a nodelist with all the level 1 tags. But each entry in the
nodelist will contain all of the elements under it - level 2, level 3,
and so on.

So if you have a layout such as:

<level1>
<level2 />
<level2>
<level3 />
</level2>
</level1>

Your DOMDocument will contain all the items - but so will the nodelist.
Effectively you've about doubled the amount of memory being required.

If you now get the level2's, you'll have two entries - one which is just
a level2, but the second one will have level2 and level3.

So you can see memory usage can increase a lot, especially if you have a
lot of lower levels.

And BTW - depending on the amount of whitespace in your XML file, even
the DOMDocument object may take more or less memory than the file itself.

The problem here is the DOMNodeList doesn't have a method to remove an
entry from the list. I don't know what

unset(nodelist->item($i));

would do - but I don't think I'd try it. I suspect the DOMNodeList
would have problems with it.

The only thing I can recommend is to unset the nodelists themselves as
soon as possible. That should free up the memory used by them.

Of course, there's another possibility here, also - that there's a
memory leak in it. I haven't seen one - but then I can't say as I've
done anything as big as you are, and I haven't looked for problems. And
a search of the PHP bugs database doesn't show anything being reported.
As I mentioned in the other post, I found out that it isn't
DOM eating all the memory, but the SQL queries. Apparently I ran into
two bugs:

1) prepare/Execute has a memory leak. This could be happening in MDB2
or in PHP itself, perhaps in the mysqli extension. This happens
consistently eventually exhausts memory.

2) there is a problem in mysqli queries that seem to confuse the
allocated memory counting, but it's not a serious bug (i.e., it
doesn't crash. I successfully completed a long run of my script,
which added some 27k entries to the database. Despite the memory
becoming negative, it didn't crash, and apparently there was no
corruption or unexpected results (not that I could see so far).

In this successful #2 run, what I did was get the mysqli
connection from mdb2 (with getConnection()) and run mysqli_query()
directly (and OMG, how slow mdb2 is!). So this problem isn't in
MDB2: it's either in PHP itself or in the mysqli extension. My
*guess* is that PHP memory system is counting something wrong
when it allocates memory. I watched top(1) while the script
ran, and it didn't consume a lot of memory (10-16 MB), which is
a little more than I'd expect, but I was including MDB2 and
other stuff. If I didn't exhaust the memory first, I'd have
never noticed that the memory count was negative.

I'm still at a loss of whom should I report this bug
to. Any suggestions?

--
Bruno Barberi Gnecco <brunobg_at_users.sourceforge.net>
It's always darkest just before it gets pitch black.
Apr 24 '07 #8

P: n/a
Bruno Barberi Gnecco wrote:
Jerry Stuckle wrote:
>Bruno Barberi Gnecco wrote:
>>Jerry Stuckle wrote:

Bruno Barberi Gnecco wrote:

I'm using PHP to run a CLI application. It's a script run by
cron that
parses some HTML files (with DOM XML), and I ended up using PHP to
integrate with
the rest of the code that already runs the website.
>
The problem is: it's eating more memory than a black hole. It
eats the
current limit of 256MB set in php.ini, in an application that would
hardly
consume 4MB if written in C. I don't care if this application takes
much longer
to run than it would in C, but eating that much memory is not
acceptable.
>
So, my question is, how do I find out what is eating that much
memory?
I'm suspicious of memory leaks, or very stupid garbage collection.
Any help?
>

Without knowing what your application does, it's impossible to tell.

But I know I've handled some very large files (i.e. log files, XML,
etc.) in 8MB of memory without any problems.

I've even parsed a (rather poorly written) html page that's 10Mb
and still not run out of memory at 8MB.
Exactly, that's why I'm puzzled by this. What the application
does is very simple: it opens an IMAP connection, and for each email,
it parses the HTML body to extract some information out of it, and
saves this information into a database. THe HTML files are less than
1MB, and number of messages read is small (< 20). Since the information
is parsed by pieces, the memory used by it should peak at 10kb or 20kb.

The parsing is done using DOM (not DOM XML, as I wrote before,
my mistake) and xpath queries. The parsing is done in a separate method,
so I was expecting that any memory allocated for parsing a message
would be freed before the next one is parsed. I'm using php 5.

What did you use to parse your page? DOM? DOM XML? Something
else?

Any tips? Thanks!

No, I wasn't using DOM on this one - just stripping out the tags.

However, the DOM does a lot of things behind the scenes. For
instance, when you call DOMDocument::getElementsByTagName(), DOM will
allocate an entire nodelist. And this nodelist will contain
everything under each node in the list.

So if you do something like:

$doc = new DOMDocument;
$doc->load("inputfile.xml");

You'll get the entire document into the DOMDocument. Now, if you:

$l1 = $doc->getElementsByTagName('level1');

You'll get a nodelist with all the level 1 tags. But each entry in
the nodelist will contain all of the elements under it - level 2,
level 3, and so on.

So if you have a layout such as:

<level1>
<level2 />
<level2>
<level3 />
</level2>
</level1>

Your DOMDocument will contain all the items - but so will the
nodelist. Effectively you've about doubled the amount of memory being
required.

If you now get the level2's, you'll have two entries - one which is
just a level2, but the second one will have level2 and level3.

So you can see memory usage can increase a lot, especially if you have
a lot of lower levels.

And BTW - depending on the amount of whitespace in your XML file, even
the DOMDocument object may take more or less memory than the file itself.

The problem here is the DOMNodeList doesn't have a method to remove an
entry from the list. I don't know what

unset(nodelist->item($i));

would do - but I don't think I'd try it. I suspect the DOMNodeList
would have problems with it.

The only thing I can recommend is to unset the nodelists themselves as
soon as possible. That should free up the memory used by them.

Of course, there's another possibility here, also - that there's a
memory leak in it. I haven't seen one - but then I can't say as I've
done anything as big as you are, and I haven't looked for problems.
And a search of the PHP bugs database doesn't show anything being
reported.

As I mentioned in the other post, I found out that it isn't
DOM eating all the memory, but the SQL queries. Apparently I ran into
two bugs:

1) prepare/Execute has a memory leak. This could be happening in MDB2
or in PHP itself, perhaps in the mysqli extension. This happens
consistently eventually exhausts memory.

2) there is a problem in mysqli queries that seem to confuse the
allocated memory counting, but it's not a serious bug (i.e., it
doesn't crash. I successfully completed a long run of my script,
which added some 27k entries to the database. Despite the memory
becoming negative, it didn't crash, and apparently there was no
corruption or unexpected results (not that I could see so far).

In this successful #2 run, what I did was get the mysqli
connection from mdb2 (with getConnection()) and run mysqli_query()
directly (and OMG, how slow mdb2 is!). So this problem isn't in
MDB2: it's either in PHP itself or in the mysqli extension. My
*guess* is that PHP memory system is counting something wrong
when it allocates memory. I watched top(1) while the script
ran, and it didn't consume a lot of memory (10-16 MB), which is
a little more than I'd expect, but I was including MDB2 and
other stuff. If I didn't exhaust the memory first, I'd have
never noticed that the memory count was negative.

I'm still at a loss of whom should I report this bug
to. Any suggestions?
Yes, I read your other posts after I responded.

PHP bugs are managed at http://www.php.net. Pear bugs are at
http://pear.php.net

I'm not sure which one it would be, either. But you'll need to create
the problem with a *small* test case so they can duplicate it.
Otherwise they don't stand much of a chance of finding the bug.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attglobal.net
==================
Apr 24 '07 #9

P: n/a
"Bruno Barberi Gnecco" <br***************@users.sourceforge.netwrote in
message news:f0*********@news2.newsguy.com...
Vince Morgan wrote:
"Bruno Barberi Gnecco" <br***************@users.sourceforge.netwrote
in
message news:f0********@news3.newsguy.com...

>memory_get_usage() was returning '-9415900'.

This effect is due to the original "black hole"
Your code has passed through a worm-hole into a parallell process which
assigns memory inversly :)

Of course! You just gave me the breakthrough I was looking for
to get my Nobel :) Or perhaps I can create my own 'infinite storage'
service, only the data will be stored in a write-only location ;)
The infinite storage service is true genius Bruno, can't see how I didn't
see it myself. Perhaps it may be possible to make this service read/write
with some considerable effort. However, you would have to manage the
downloads carefully. Allowing too many would quickly exhaust the available
storage..
If you should manage to achieve this I would expect a share of the financial
proceeds of course. However, you are welcome to the glory. I always get a
large volcanic zit in the center of my forehead an hour or so before
recieving prestigious international awards ;)
Apr 25 '07 #10

This discussion thread is closed

Replies have been disabled for this discussion.