Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Done
-
SERVER 5.3.1
-
None
-
Normal
-
2 - 5-25% of Customers
-
3 - Serious
-
4 - $$$$$
-
-
New Feature
-
-
Needs Assessment
Description
As an administrator of Puppet Infrastructure, I need to determine why a Puppet Server request using a JRuby instance has stalled before I restart the service. This can happen if the request has triggered methods from core Puppet code or a Ruby plugin that will never return.
Reproduction Case
- Install and start Puppet Server on a CentOS 7 VM:
rpm -Uvh http://yum.puppetlabs.com/puppet5/puppet5-release-el-7.noarch.rpm |
yum install -y puppetserver |
|
/opt/puppetlabs/puppet/bin/puppet config set --section main server "$(hostname -f)" |
systemctl start puppetserver
|
- Configure site.pp to sleep for an absurd amount of time in response to a catalog request:
cat <<"EOM" > /etc/puppetlabs/code/environments/production/manifests/site.pp |
node default { |
inline_template('<% Kernel.sleep(1800) %>') |
# I feel much better after a 30 minute catnap! |
}
|
EOM
|
- Kick off a puppet run: /opt/puppetlabs/bin/puppet agent -t
- Send the node's hostname to your coworker on Friday afternoon and tell them to figure out what is going on without looking at site.pp (because this could be a problem with 1 out of 100s of installed modules). Leave for a two week vacation.
Current Options
Below are listed two current options for checking the state of stalled JRuby requests. Both can help identify that a request is indeed stalled, but neither provides visibility into what Ruby code is responsible.
Checking the status of the compiler pool shows that a catalog request is hung, but gives no info as to where the request is stalled:
# curl -k https://localhost:8140/status/v1/services/jruby-metrics?level=debug|python -m json.tool
|
{
|
"active_alerts": [],
|
"detail_level": "debug",
|
"service_name": "jruby-metrics",
|
"service_status_version": 1,
|
"service_version": "5.3.1",
|
"state": "running",
|
"status": {
|
"experimental": {
|
"jruby-pool-lock-status": {
|
"current-state": ":not-in-use",
|
"last-change-time": "2018-04-20T18:24:02.904Z"
|
},
|
"metrics": {
|
"average-borrow-time": 82,
|
"average-free-jrubies": 0.000271409884577374,
|
"average-lock-held-time": 0,
|
"average-lock-wait-time": 0,
|
"average-requested-jrubies": 0.0,
|
"average-wait-time": 0,
|
"borrow-count": 5,
|
"borrow-retry-count": 0,
|
"borrow-timeout-count": 0,
|
"borrowed-instances": [
|
{
|
"duration-millis": 470045,
|
"reason": {
|
"request": {
|
"request-method": "post",
|
"route-id": "puppet-v3-catalog-/*/",
|
"uri": "/puppet/v3/catalog/tvtyjvubmmaixqi.delivery.puppetlabs.net"
|
}
|
},
|
"time": 1524248690580
|
}
|
],
|
"num-free-jrubies": 0,
|
"num-jrubies": 1,
|
"num-pool-locks": 0,
|
"queue-limit-hit-count": 0,
|
"queue-limit-hit-rate": 0.0,
|
"requested-count": 5,
|
"requested-instances": [],
|
"return-count": 4
|
}
|
}
|
}
|
}
|
Pulling a thread dump using jstack after installing java-1.8.0-openjdk-devel shows a JRuby thread waiting (on RubyKernel.sleep, but thats a lucky coincidence), but the stack trace is at the Java and JRuby AST level which gives no indications as to what Ruby code caused it:
# sudo -u puppet jstack $(cat /var/run/puppetlabs/puppetserver/puppetserver.pid)
|
...
|
"qtp281635025-65" #65 prio=5 os_prio=0 tid=0x00007f30f00c4000 nid=0x9f3 in Object.wait() [0x00007f30edaa4000]
|
java.lang.Thread.State: TIMED_WAITING (on object monitor)
|
at java.lang.Object.wait(Native Method)
|
- waiting on <0x00000000f568d888> (a org.jruby.RubyThread)
|
at org.jruby.RubyThread.sleep(RubyThread.java:1002)
|
- locked <0x00000000f568d888> (a org.jruby.RubyThread)
|
- locked <0x00000000f568d888> (a org.jruby.RubyThread)
|
at org.jruby.RubyKernel.sleep(RubyKernel.java:803)
|
at org.jruby.RubyKernel$INVOKER$s$0$1$sleep.call(RubyKernel$INVOKER$s$0$1$sleep.gen)
|
at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodN.call(JavaMethod.java:667)
|
at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:206)
|
at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:326)
|
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:170)
|
at org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
|
at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
|
at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
|
at org.jruby.ast.RootNode.interpret(RootNode.java:129)
|
at org.jruby.evaluator.ASTInterpreter.INTERPRET_EVAL(ASTInterpreter.java:95)
|
|
...skipping a lot of JRuby AST calls for brevity...
|
|
puppetlabs.services.request_handler.request_handler_core$jruby_request_handler$fn__31686.invoke(request_handler_core.clj:273)
|
at puppetlabs.puppetserver.jruby_request$wrap_with_jruby_instance$fn__31597.invoke(jruby_request.clj:48)
|
at puppetlabs.puppetserver.jruby_request$wrap_with_error_handling$fn__31593.invoke(jruby_request.clj:33)
|
at puppetlabs.services.request_handler.request_handler_service$reify__31694$service_fnk__5266__auto___positional$reify__31709.handle_request(request_handler_service.clj:47)
|
at puppetlabs.services.protocols.request_handler$eval27407$fn__27408$G__27399__27411.invoke(request_handler.clj:3)
|
at puppetlabs.services.protocols.request_handler$eval27407$fn__27408$G__27398__27415.invoke(request_handler.clj:3)
|
at clojure.core$partial$fn__4759.invoke(core.clj:2515)
|
at puppetlabs.trapperkeeper.authorization.ring_middleware$eval39879$wrap_authorization_check__39888$fn__39891$fn__39892.invoke(ring_middleware.clj:284)
|
at puppetlabs.ring_middleware.core$eval21857$wrap_bad_request__21866$fn__21869$fn__21874.invoke(core.clj:168)
|
at puppetlabs.ring_middleware.core$eval21953$wrap_uncaught_errors__21962$fn__21965$fn__21970.invoke(core.clj:211)
|
at puppetlabs.ring_middleware.core$eval21584$wrap_request_logging__21589$fn__21590$fn__21592.invoke(core.clj:45)
|
at puppetlabs.i18n.core$locale_negotiator$fn__124.invoke(core.clj:357)
|
at puppetlabs.ring_middleware.core$eval21613$wrap_response_logging__21618$fn__21619$fn__21620.invoke(core.clj:51)
|
at puppetlabs.puppetserver.ringutils$wrap_with_puppet_version_header$fn__17331.invoke(ringutils.clj:83)
|
at puppetlabs.services.master.master_core$eval38154$v3_ruby_routes__38159$fn__38160$fn__38177.invoke(master_core.clj:738)
|
at bidi.ring$eval22897$fn__22898.invoke(ring.cljc:25)
|
at bidi.ring$eval22876$fn__22877$G__22867__22886.invoke(ring.cljc:16)
|
at puppetlabs.comidi$make_handler$fn__25007.invoke(comidi.clj:245)
|
at puppetlabs.metrics.http$eval37332$wrap_with_request_metrics__37337$fn__37341$fn__37343$fn__37344$fn__37345.invoke(http.clj:152)
|
at puppetlabs.metrics.http.proxy$java.lang.Object$Callable$7da976d4.call(Unknown Source)
|
at com.codahale.metrics.Timer.time(Timer.java:101)
|
at puppetlabs.metrics.http$eval37332$wrap_with_request_metrics__37337$fn__37341$fn__37343$fn__37344.invoke(http.clj:152)
|
at puppetlabs.metrics.http.proxy$java.lang.Object$Callable$7da976d4.call(Unknown Source)
|
at com.codahale.metrics.Timer.time(Timer.java:101)
|
at puppetlabs.metrics.http$eval37332$wrap_with_request_metrics__37337$fn__37341$fn__37343.invoke(http.clj:148)
|
at puppetlabs.comidi$eval25068$wrap_with_route_metadata__25073$fn__25074$fn__25076.invoke(comidi.clj:332)
|
at puppetlabs.trapperkeeper.services.webserver.jetty9_core$ring_handler$fn__35219.invoke(jetty9_core.clj:433)
|
at puppetlabs.trapperkeeper.services.webserver.jetty9_core.proxy$org.eclipse.jetty.server.handler.AbstractHandler$ff19274a.handle(Unknown Source)
|
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
|
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
|
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
|
at java.lang.reflect.Method.invoke(Method.java:498)
|
at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
|
at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)
|
at puppetlabs.trapperkeeper.services.webserver.normalized_uri_helpers$eval34795$normalize_uri_handler__34800$fn__34801$fn__34802.invoke(normalized_uri_helpers.clj:74)
|
at puppetlabs.trapperkeeper.services.webserver.normalized_uri_helpers.proxy$org.eclipse.jetty.server.handler.HandlerWrapper$ff19274a.handle(Unknown Source)
|
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
|
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:190)
|
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
|
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:170)
|
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
|
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
|
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
|
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
|
at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:521)
|
at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:56)
|
at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:169)
|
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
|
at org.eclipse.jetty.server.Server.handle(Server.java:564)
|
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:317)
|
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
|
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
|
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:110)
|
at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:278)
|
at org.eclipse.jetty.io.ssl.SslConnection$3.succeeded(SslConnection.java:148)
|
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:110)
|
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
|
at org.eclipse.jetty.util.thread.Invocable.invokePreferred(Invocable.java:128)
|
at org.eclipse.jetty.util.thread.Invocable$InvocableExecutor.invoke(Invocable.java:222)
|
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:294)
|
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:199)
|
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:672)
|
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:590)
|
at java.lang.Thread.run(Thread.java:748)
|
...
|
Attachments
Issue Links
- is blocked by
-
SERVER-2611 Update JRuby to 9.2.8.0
-
- Resolved
-