[PDB-686] java.util.concurrent.TimeoutException: Idle timeout expired on Ubuntu 12.04 and Ruby 1.9.3p0 since PDB 2.0.0 Created: 2014/05/21  Updated: 2018/09/18  Resolved: 2014/05/29

Status: Closed
Project: PuppetDB
Component/s: None
Affects Version/s: PDB 2.0.0
Fix Version/s: PDB 2.1.0

Type: Bug Priority: Normal
Reporter: Jürgen Weber Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates
relates to PDB-716 HTTP API gets stuck Closed
Template:
Story Points: 5
Sprint: 20140507 to 20140521, 20140521 to 20140604

 Description   

on puppet agent run:

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed to submit 'replace catalog' command for vagrant-tools01.intfd.local to PuppetDB at vagrant-tools01.intfd.local:8081: [500 java.util.concurrent.TimeoutException: Idle timeout expired: 30000/30000 ms] <html><head><meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/><title>Error 500 </title></head><body><h2>HTTP ERROR: 500</h2><p>Problem accessing /v3/commands. Reason:<pre>    java.util.concurrent.TimeoutException: Idle timeout expired: 30000/30000 ms</pre></p><hr /><i><small>Powered by Jetty://</small></i></body></html>

Forum discussion here: https://groups.google.com/forum/#!msg/puppet-users/ktJWWqtmen0/eaLyAG0o2VoJ



 Comments   
Comment by Ken Barber [ 2014/05/21 ]

Jürgen Weber can we get some more details about this bug, I'm seeing two different outcomes when we've seen this error message in the past, one is that the whole HTTPS and HTTP responses lock up completely, so when it gets into this state, even a simple curl 'http://localhost:8080/' hangs for ever.

In other circumstances, we've seen the service continue - but these error messages become intermittent. There is an example of this here: https://groups.google.com/forum/#!searchin/puppet-users/idle$20timeout/puppet-users/ktJWWqtmen0/mIVdzOFWA2wJ

Comment by Jürgen Weber [ 2014/05/21 ]

Yeah, sure.

It is not the former for me, I can still curl and telnet to the port after the run fails. I can also run puppet agent again and it will just fail in the same way again. At first I thought it may have been some resource exhaustion in my testing VM environment (using vagrant) so I adding more CPU's (8) and RAM (4GB) and ensured the VM does not swap which did not help. Now the CPU/RAM remain relatively idle but it still fails. Before the upgrade a smaller system worked fine. Some system details it is a clean install using vagrant:

Ubuntu 12.04 LTS
Linux vagrant-tools01 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
ii facter 2.0.1-1puppetlabs1 Ruby module for collecting simple facts about a host operating system
ii hiera 1.3.2-1puppetlabs1 A simple pluggable Hierarchical Database.
ii puppet 3.6.0-1puppetlabs1 Centralized configuration management - agent startup and compatibility scripts
ii puppet-common 3.6.0-1puppetlabs1 Centralized configuration management
ii puppetdb 2.0.0-1puppetlabs1 PuppetDB Centralized Storage.
ii puppetdb-terminus 2.0.0-1puppetlabs1 Connect Puppet to PuppetDB by setting up a terminus for PuppetDB.
ii puppetlabs-release 1.0-7 "Package to install Puppet Labs gpg key and apt repo"
ii puppetmaster 3.6.0-1puppetlabs1 Centralized configuration management - master startup and compatibility scripts
ii puppetmaster-common 3.6.0-1puppetlabs1 Puppet master common scripts
ii ruby-rgen 0.6.5-1puppetlabs1 A framework supporting Model Driven Software Development (MDSD)

ii ca-certificates-java 20110912ubuntu6 Common CA certificates (JKS keystore)
ii java-common 0.43ubuntu2 Base of all Java packages
ii tzdata-java 2014c-0ubuntu0.12.04 time zone and daylight-saving time data for use by java runtimes

I have also tried in /etc/default/puppetdb
JAVA_ARGS="-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/puppetdb/puppetdb-oom.hprof -Xms1024m"

Running puppetdb on the console with verbose logging did not seem to tell me nothing exciting either. If you need anything else please let me know.

Comment by Ken Barber [ 2014/05/21 ]

So far we've been able to work around this by downgrading the puppetdb-terminus package to 1.6.3. Can you confirm this works for you at least?

Comment by Jürgen Weber [ 2014/05/21 ]

I downgraded but I get the same error:

root@vagrant-tools01:~# dpkg -l | grep puppet
ii facter 2.0.1-1puppetlabs1 Ruby module for collecting simple facts about a host operating system
ii hiera 1.3.2-1puppetlabs1 A simple pluggable Hierarchical Database.
ii puppet 3.6.0-1puppetlabs1 Centralized configuration management - agent startup and compatibility scripts
ii puppet-common 3.6.0-1puppetlabs1 Centralized configuration management
ii puppetdb 2.0.0-1puppetlabs1 PuppetDB Centralized Storage.
ii puppetdb-terminus 1.6.3-1puppetlabs1 Connect Puppet to PuppetDB by setting up a terminus for PuppetDB.
ii puppetlabs-release 1.0-7 "Package to install Puppet Labs gpg key and apt repo"
ii puppetmaster 3.6.0-1puppetlabs1 Centralized configuration management - master startup and compatibility scripts
ii puppetmaster-common 3.6.0-1puppetlabs1 Puppet master common scripts
ii ruby-rgen 0.6.5-1puppetlabs1 A framework supporting Model Driven Software Development (MDSD)
and then:

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed to submit 'replace catalog' command for vagrant-tools01.intfd.local to PuppetDB at vagrant-tools01.intfd.local:8081: [500 java.util.concurrent.TimeoutException: Idle timeout expired: 30005/30000 ms] <html><head><meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/><title>Error 500 </title></head><body><h2>HTTP ERROR: 500</h2><p>Problem accessing /v3/commands. Reason:<pre> java.util.concurrent.TimeoutException: Idle timeout expired: 30005/30000 ms</pre></p><hr /><i><small>Powered by Jetty://</small></i></body></html>

Comment by Phil Fenstermacher [ 2014/05/21 ]

Did you restart your puppetmaster between downgrading puppetdb-terminus and re-running puppet? That made a difference for me when I downgraded.

Comment by Jürgen Weber [ 2014/05/22 ]

no I did not and yes, now after the restart it works great.

Comment by Ken Barber [ 2014/05/22 ]

I've been able to replicate this on Ubuntu 12.04 using Ruby 1.9.3p0. This is an old release of Ruby 1.9.3, so I'm guessing its a bug in that ancient release.

Switching to Ruby 1.8.7 actually fixes the issue, so does installing the latest Ruby 1.9.3 from the brightbox repos.

Also, this problem doesn't seem to affect Ubuntu 14.04 LTS either FWIW.

Comment by Ken Barber [ 2014/05/22 ]

Looks like this happens when the payload being submitted with Net::HTTP is bigger than 16384 bytes. Below that number its fine, above that number we see a timeout. There are other reports of this on the internet.

I can confirm this is a bug with Ruby 1.9.3-p0 (even in RVM on a Mac) and its fixed in the next release of Ruby 1.9.3-p125 but I've been unable to confirm a particular bug fix. It seems to be fixed with later 1.9.3 rubies, Ruby 2.0.0 and Ruby 2.1.0. We haven't yet seen the symptoms for this bug in Ruby 1.8.7.

I think users for now can either downgrade to Ruby 1.8.7, install a more recent ruby from brightbox PPA repos: https://launchpad.net/~brightbox/+archive/ruby-ng or upgrade to Ubuntu 14.04 which has a more recent Ruby release.

Comment by Jürgen Weber [ 2014/05/23 ]

Hi, yes. I added the brightbox PPA and it works. Thanks

Comment by Ryan Senior [ 2014/05/29 ]

The real issue here was Ruby 1.9.3p0 that Ubuntu 12.04 ships. The solutions for users of Ubuntu 12.04 experiencing this issue is to downgrade to Ruby 1.8.7, or upgrade to a more recent version of Ruby 1.9.3. A newer version of Ruby 1.9.3 is not available in the default Ubuntu repos and needs to come from a third party provider.

Generated at Wed Oct 16 12:06:15 PDT 2019 using JIRA 7.7.1#77002-sha1:e75ca93d5574d9409c0630b81c894d9065296414.