[PDB-1484] Memory leak when 'replace facts' failing Created: 2015/05/05  Updated: 2015/06/11  Resolved: 2015/06/11

Status: Closed
Project: PuppetDB
Component/s: None
Affects Version/s: PDB 2.3.3
Fix Version/s: None

Type: Bug Priority: Normal
Reporter: Simon Oxwell Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

CentOS 7, Postgres 9.2.7




Just as an aside to PDB-1448, we're also seeing PuppetDB run out of heap.

Looking at the memory dumps, we see a very large number of Postgres Prepared Statements, taking up huge amounts of memory.

For example, from today we had:
4,881 instances of "org.postgresql.jdbc4.Jdbc4PreparedStatement" with retained size of 500,866,312 bytes

and one from a month ago:
4,614 instances of "org.postgresql.jdbc4.Jdbc4PreparedStatement" with retained size of 617,294,101 bytes.

Our heap size is 750M, and 109 puppet agents.

My guess is that when the 'replace facts' fails with the foreign key constraint violation, it doesn't clean up the prepared statement objects.

For now, we're restarting PuppetDB when it becomes unresponsive about once a week (usually don't wait for a heap dump to happen) while we wait for the fix for PDB-1448 to be released.


Comment by Ken Barber [ 2015/05/06 ]

Simon Oxwell we can control the prepared-statement cache by adding a configuration option. I believe now its hard-coded to 1000 items, but since we have quite a few dynamic queries this can quickly get large.

I wouldn't have expected 500 MB though, I've seen 50 MB before. Do you have a hprof dump that we can look at for this?

Comment by Simon Oxwell [ 2015/05/07 ]

Yes, two, but they're on the large size (~900M).

Comment by Ken Barber [ 2015/05/07 ]

Simon Oxwell actually, we're good - we did add a configuration item, I had just forgotten: http://docs.puppetlabs.com/puppetdb/2.3/configure.html#statements-cache-size

Trying tuning that down, and seeing if that improves things memory-wise.

Comment by Simon Oxwell [ 2015/05/10 ]

Hi Kenneth,

Pretty sure that this is only happening because the failures we're seeing as a result of PDB-1448, which is resulting in a lot of failed SQL.

I've updated to 2.3.4 and my SQL failures have gone, so I'm not expecting the problem to repeat. Should know in a couple of days.

Generated at Sat Aug 08 17:34:48 PDT 2020 using Jira 8.5.2#805002-sha1:a66f9354b9e12ac788984e5d84669c903a370049.