Uploaded image for project: 'Puppet Server'
  1. Puppet Server
  2. SERVER-858

Puppet Server opens a large number of file descriptors during jruby startup, causing spurious "NoClassDefFoundError" crashes

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Normal
    • Resolution: Fixed
    • Affects Version/s: SERVER 1.1.1
    • Fix Version/s: SERVER 1.2.0, SERVER 2.6.0
    • Component/s: Puppet Server
    • Labels:
      None
    • Environment:

      Scientific Linux 6.6 x86_64 (RHEL 6.6 clone)
      OpenJDK 8
      Puppet 3.8.2

    • Template:
    • Team:
      Systems Engineering
    • Sub-team:
    • Story Points:
      5
    • Sprint:
      Server Jade 2015-09-02, Server Jade 2015-09-16, Server Emerald 2016-07-27, Server Emerald 2016-08-10, Server Emerald 2016-08-24, Server Emerald 2016-09-07, SE 2016-09-21, SE 2016-10-05
    • Release Notes:
      Bug Fix
    • Release Notes Summary:
      Hide
      For cases where a large JVM memory heap and large number of JRuby instances is being used with Puppet Server, a fix was implemented which should prevent Puppet Server from failing to startup properly with error messages like the following in the puppetserver.log file:

      java.lang.IllegalStateException: There was a problem adding a JRubyPuppet instance to the pool.
      Caused by: org.jruby.embed.EvalFailedException: (LoadError) load error: jopenssl/load -- java.lang.NoClassDefFoundError: org/jruby/ext/openssl/NetscapeSPKI
      Show
      For cases where a large JVM memory heap and large number of JRuby instances is being used with Puppet Server, a fix was implemented which should prevent Puppet Server from failing to startup properly with error messages like the following in the puppetserver.log file: java.lang.IllegalStateException: There was a problem adding a JRubyPuppet instance to the pool. Caused by: org.jruby.embed.EvalFailedException: (LoadError) load error: jopenssl/load -- java.lang.NoClassDefFoundError: org/jruby/ext/openssl/NetscapeSPKI

      Description

      The Problem

      we tried to today to migrate our Puppet Masters from Apache/Passenger to Puppet Server 1.1.1.
      However, Puppet Server just dies with error messages as soon as we increase the number of JRuby instances to >24 and a JVM heapsize of > 16GB.

      During startup of Puppet Server, it starts to spawn the JRuby instances one after another and around ~8 instances an exception is logged:

      2015-08-25 10:25:05,676 INFO  [puppet-server] Puppet Puppet settings initialized; run mode: master
      2015-08-25 10:25:06,254 INFO  [p.s.j.jruby-puppet-agents] Finished creating JRubyPuppet instance 7 of 32
      2015-08-25 10:25:08,567 ERROR [p.t.internal] shutdown-on-error triggered because of exception!
      java.lang.IllegalStateException: There was a problem adding a JRubyPuppet instance to the pool.
      Caused by: org.jruby.embed.EvalFailedException: (LoadError) load error: jopenssl/load -- java.lang.NoClassDefFoundError: org/jruby/ext/openssl/NetscapeSPKI
              at org.jruby.embed.internal.EmbedEvalUnitImpl.run(EmbedEvalUnitImpl.java:132) ~[puppet-server-release.jar:na]
              at org.jruby.embed.ScriptingContainer.runUnit(ScriptingContainer.java:1341) ~[puppet-server-release.jar:na]
      

      I have attached the complete puppetserver.log to this issue.
      The log file is from the initial setup with max-active-instances set to 32 and a JVM heap size of 48gb.
      We had a working setup with 16GB Heap and 16 instances. Sometimes 24 worked as well, but not always.
      With 24 instances, we expected it to fail at 17 instances, but the error happened at instance 20.
      However, 16 instances will be too small to handle all the Puppet agents.
      Increasing the timeout in /etc/sysconfig/puppetserver did not help either.

      We use rather beefy HW for our 3x Puppet Masters (2x Dell R715, 1x R815), for Apache/Passenger this scaled nicely.

      The OS on the Puppet Masters is Scientific Linux 6.6 (RHEL 6.6 clone) and OpenJDK 8 is used.
      We tried the Oracle JRE as well, but this did not change anything.
      HTTPS is terminated at our F5 Loadbalancer, which forwards the traffic unencrypted to Puppet Server.

      Follow up from this mailing list post.

      A Workaround

      Enabling GC1 garbage collection on the puppetserver JVM seems to be a stable workaround to the issue.

      The argument is -XX:+UseG1GC

      http://www.oracle.com/technetwork/articles/java/vmoptions-jsp-140102.html

      https://tickets.puppetlabs.com/browse/SERVER-858?focusedCommentId=225898&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-225898

        Attachments

        1. 32core-1-cpu.png
          32core-1-cpu.png
          23 kB
        2. 32core-1-load.png
          32core-1-load.png
          16 kB
        3. 32core-2-cpu.png
          32core-2-cpu.png
          24 kB
        4. 32core-2-load.png
          32core-2-load.png
          17 kB
        5. 64core-1-cpu.png
          64core-1-cpu.png
          24 kB
        6. 64core-1-load.png
          64core-1-load.png
          19 kB
        7. puppetserver_2.1.1.log
          29 kB
        8. puppetserver_gc.log
          42 kB
        9. puppetserver.log
          10 kB
        10. puppetserver.log
          14 kB
        11. puppetserver8u25.log
          18 kB

          Issue Links

            Activity

              People

              Assignee:
              Unassigned
              Reporter:
              stdietrich Stefan Dietrich
              QA Contact:
              Erik Dasher
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Zendesk Support