Uploaded image for project: 'PuppetDB'
  1. PuppetDB
  2. PDB-1721

Possible Memory leak with PuppetDB?

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Normal
    • Resolution: Fixed
    • Affects Version/s: PDB 2.3.4
    • Fix Version/s: PDB 3.0.2
    • Component/s: None
    • Labels:
      None
    • Environment:

      CentOS7, Postgres 9.2.7

    • Template:
    • Story Points:
      5
    • Sprint:
      PuppetDB 2015-07-29, PuppetDB 2015-08-12

      Description

      We're seeing PuppetDB regularly run out of memory every 3-4 weeks still.

      I did report something similar back in PDB-1484, but that was when we had troubles with 'replace facts' failing (PDB-1448).

      We're getting this in our logs, as a first sign:

      2015-07-02 08:05:05,553 ERROR [c.p.p.command] [08271ea0-ad7c-42c7-a0e5-3810fa668118] [replace catalog] Fatal error on attemp
      t 1
      java.lang.OutOfMemoryError: GC overhead limit exceeded
      	at clojure.lang.PersistentArrayMap.create(PersistentArrayMap.java:57) ~[puppetdb.jar:na]
      	at clojure.lang.PersistentArrayMap.assoc(PersistentArrayMap.java:208) ~[puppetdb.jar:na]
      	at clojure.lang.APersistentMap.cons(APersistentMap.java:36) ~[puppetdb.jar:na]
      	at clojure.lang.RT.conj(RT.java:562) ~[puppetdb.jar:na]
      	at clojure.core$conj.invoke(core.clj:83) ~[puppetdb.jar:na]
      	at schema.utils$result_builder$conjer__1981.invoke(utils.clj:139) ~[na:na]
      	at schema.core$map_walker$fn__2760$fn__2763$fn__2764.invoke(core.clj:731) ~[na:na]
      	at clojure.core.protocols$fn__6086.invoke(protocols.clj:143) ~[puppetdb.jar:na]
      	at clojure.core.protocols$fn__6057$G__6052__6066.invoke(protocols.clj:19) ~[puppetdb.jar:na]
      	at clojure.core.protocols$seq_reduce.invoke(protocols.clj:31) ~[puppetdb.jar:na]
      	at clojure.core.protocols$fn__6078.invoke(protocols.clj:54) ~[puppetdb.jar:na]
      	at clojure.core.protocols$fn__6031$G__6026__6044.invoke(protocols.clj:13) ~[puppetdb.jar:na]
      	at clojure.core$reduce.invoke(core.clj:6289) ~[puppetdb.jar:na]
      	at schema.core$map_walker$fn__2760$fn__2763.invoke(core.clj:735) ~[na:na]
      	at schema.core$map_walker$fn__2760.invoke(core.clj:726) ~[na:na]
      	at schema.core.MapEntry$fn__2721.invoke(core.clj:675) ~[na:na]
      	at schema.core$map_walker$fn__2760$fn__2763$fn__2764.invoke(core.clj:731) ~[na:na]
      	at clojure.core.protocols$fn__6086.invoke(protocols.clj:143) ~[puppetdb.jar:na]
      	at clojure.core.protocols$fn__6057$G__6052__6066.invoke(protocols.clj:19) ~[puppetdb.jar:na]
      	at clojure.core.protocols$seq_reduce.invoke(protocols.clj:31) ~[puppetdb.jar:na]
      	at clojure.core.protocols$fn__6078.invoke(protocols.clj:54) ~[puppetdb.jar:na]
      	at clojure.core.protocols$fn__6031$G__6026__6044.invoke(protocols.clj:13) ~[puppetdb.jar:na]
      	at clojure.core$reduce.invoke(core.clj:6289) ~[puppetdb.jar:na]
      	at schema.core$map_walker$fn__2760$fn__2763.invoke(core.clj:735) ~[na:na]
      	at schema.core$map_walker$fn__2760.invoke(core.clj:726) ~[na:na]
      	at schema.core.Either$fn__2541.invoke(core.clj:469) ~[na:na]
      	at schema.core.MapEntry$fn__2715.invoke(core.clj:665) ~[na:na]
      	at schema.core$map_walker$fn__2760$fn__2763.invoke(core.clj:739) ~[na:na]
      	at schema.core$map_walker$fn__2760.invoke(core.clj:726) ~[na:na]
      	at clojure.core$comp$fn__4192.invoke(core.clj:2403) ~[puppetdb.jar:na]
      	at schema.core$check.invoke(core.clj:155) ~[na:na]
      	at schema.core$validate.invoke(core.clj:160) ~[na:na]
      

      and Java dumps a hprof file. We've had a look at that and seen:

      5,169 instances of "org.postgresql.jdbc4.Jdbc4PreparedStatement", loaded by "sun.misc.Launcher$AppClassLoader @ 0xd0288580" occupy 584,559,496 (76.68%) bytes. 
       
      3,174 instances of "org.postgresql.core.v3.SimpleQuery", loaded by "sun.misc.Launcher$AppClassLoader @ 0xd0288580" occupy 82,194,760 (10.78%) bytes.
      

      Our heap size is 750M, and we have 108 puppet agents, so our heap size should be plenty (the rule of thumb suggest 256MB heap). The dashboard says we have 88,142 resources, out agents check in every 15 minutes.

      Last time, Kenneth suggested we tune statement-cache-size, which we have not done, as the default of 1000 seemed fine - but we seem to have five times that number.

      Are we seeing a possible leak around the prepared statements?

        Attachments

          Issue Links

            Activity

              jsd-sla-details-panel

                People

                • Assignee:
                  Unassigned
                  Reporter:
                  soxwellfb Simon Oxwell
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  4 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: