Uploaded image for project: 'Puppet Server'
  1. Puppet Server
  2. SERVER-389

Bump default-borrow-timeout to something significantly higher

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Normal
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: SERVER 1.0.8, SERVER 2.0.0
    • Component/s: None
    • Labels:
      None

      Description

      The default-borrow-timeout for JRuby instances is currently set at 60 seconds. See:

      https://github.com/puppetlabs/puppet-server/blob/cd659b95ac557502f3bab2eea878fa7e6a131b6f/src/clj/puppetlabs/services/jruby/jruby_puppet_service.clj#L13

      It doesn't seem unrealistic in production for this timeout to be hit. For example, all of the allocated JRubies could be concurrently used for compiling expensive catalog compilations, each of which could take longer than 60 seconds to complete. We should think about setting a significantly larger default for this - ideally one with some significant buffer above the maximum time a realistic, expensive catalog compilation would take to complete. Not sure what that is yet and may need to solicit some outside feedback to come up with a good number for that...

      There's quite a bit of discussion in the thread below around what we think the timeout for JRuby pool borrowing should be. We ultimately settled on 20 minutes. The number is somewhat arbitrary in that we don't have much real-world data on catalog compilations and report processing to suggest a specific number. An agent which has successfully made a socket connection with a master would effectively use an infinite timeout on the read it would perform to wait for the response to a request. Considering that, we also discussed the possibility of reverting back to the "infinite" timeout that is in use on currently released puppetserver code. Ultimately, though, the majority was concerned that not bounding the wait time could lead to thread "hangs" on the server, ultimately manifested as "hangs" in the response back to a client. 20 minutes was a compromise that seems to be "long enough" to account for the upper bound on worst-case catalog compilations / report processing while not leaving the server vulnerable to these "hangs".

        Attachments

          Issue Links

            Activity

              jsd-sla-details-panel

                People

                • Assignee:
                  erik Erik Dasher
                  Reporter:
                  jeremy.barlow Jeremy Barlow
                  QA Contact:
                  Erik Dasher
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  5 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:

                    Zendesk Support