[PUP-2692] Puppet master passenger processes keep growing Created: 2014/05/28  Updated: 2019/04/04  Resolved: 2014/06/10

Status: Closed
Project: Puppet
Component/s: None
Affects Version/s: PUP 3.6.1
Fix Version/s: PUP 3.6.2

Type: Bug Priority: Normal
Reporter: Andrew Parker Assignee: Unassigned
Resolution: Done Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

passenger, ruby 1.9.3 software collections library


Attachments: JPEG File Memory usage on master.jpg     Text File functions.patch     PNG File graph-cpu-week.png     PNG File graph-day.png     PNG File graph-day.png     PNG File graph-week.png     PNG File graph.php.png     PNG File graph.php.png     PNG File graph.php.png     PNG File graph.php.png    
Template:
Story Points: 2
Sprint: Week 2014-5-28 to 2014-6-4

 Description   

Joshua Hoblitt reported on IRC in #puppet-dev that after upgrading from 3.6.0 to 3.6.1 his master processes started growing to 1.5GB.



 Comments   
Comment by Andrew Parker [ 2014/05/28 ]

The memory usage on the master graph shows the upgrade from 3.6.0 to 3.6.1 on Tuesday at about 16:00. Immediately after that the passenger processes quickly grew and started swapping.

ps showed:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
31578 puppet    20   0 1758m 1.6g 2564 R 84.2 13.4  66:36.38 ruby               
15792 puppet    20   0 1728m 1.5g 2476 S  5.2 13.3  40:15.84 ruby               
10220 puppet    20   0 1716m 1.5g 2484 R 67.7 13.1  44:31.14 ruby               
25291 puppet    20   0 1442m 1.3g 2568 R 98.2 10.9  34:07.42 ruby               
27753 puppet    20   0  973m 822m 2476 R 79.1  6.9  25:25.86 ruby               
18022 puppet    20   0  704m 584m 2484 R 97.7  4.9  13:00.75 ruby               
12813 puppet    20   0  548m 429m 3840 S  0.5  3.6   8:43.32 ruby               
30465 foreman   20   0  614m 261m 2044 S  0.0  2.2  15:29.27 ruby               
17145 foreman   20   0  614m 204m 3048 S  0.0  1.7  10:53.72 ruby               
12128 foreman   20   0  614m 201m 2064 S  0.0  1.7   5:04.61 ruby               
13193 foreman   20   0  626m 190m 1932 S  8.8  1.6   7:27.82 ruby               
11999 foreman   20   0  614m 180m 2044 S  0.0  1.5  10:05.63 ruby 

Comment by Chuck Schweizer [ 2014/05/28 ]

I have seen this on RHEL 6.5 with puppet 3.6.1, ruby 1.8.7 and passenger 4.0.18. I had to set PassengerMaxRequests to 5000 in my environment to keep this in check.

Comment by Joshua Hoblitt [ 2014/05/28 ]

It appears that the puppet processes are leaking memory until the Linux OOM kicks in (my dmesg is full of OOM message). Here's what top looked like about 20 minutes after what Andy cut'n'pasted.

top - 11:38:39 up 8 days, 20:19,  2 users,  load average: 42.33, 24.24, 12.80
Tasks: 473 total,   1 running, 469 sleeping,   0 stopped,   3 zombie
Cpu(s):  1.6%us, 13.2%sy,  0.8%ni,  0.0%id, 84.5%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  12137736k total, 11958116k used,   179620k free,     3044k buffers
Swap:  2097144k total,   959332k used,  1137812k free,    31372k cached
 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
31578 puppet    20   0 2196m 1.9g 1900 D  0.0 16.1  79:20.31 ruby               
10220 puppet    20   0 1974m 1.8g 1892 D  4.6 15.3  57:01.99 ruby               
15792 puppet    20   0 1947m 1.8g 1920 D  0.0 15.2  53:44.97 ruby               
25291 puppet    20   0 1795m 1.6g 1896 D  0.0 13.8  44:05.28 ruby               
27753 puppet    20   0 1114m 961m 1920 D  0.0  8.1  34:19.15 ruby               
18022 puppet    20   0 1080m 956m 2092 D  0.0  8.1  20:05.17 ruby               
12813 puppet    20   0  871m 747m 2108 D  0.0  6.3  20:30.50 ruby               
30465 foreman   20   0  614m 254m 1708 S  0.0  2.2  15:41.30 ruby               
17145 foreman   20   0  614m 203m 1912 S  0.0  1.7  11:05.40 ruby               
13193 foreman   20   0  626m 190m 1692 S  0.0  1.6   7:46.49 ruby               
11999 foreman   20   0  614m 177m 1912 S  0.0  1.5  10:56.76 ruby               
12128 foreman   20   0  614m 156m 1820 S  0.0  1.3   5:50.51 ruby               
20897 activemq  20   0 4198m  85m  900 S  0.0  0.7   0:32.14 java               
 6801 mysql     20   0 1461m  44m 2540 S  0.0  0.4   1144:30 mysqld             
23666 apache    20   0 2177m  42m 2748 S  0.0  0.4   0:26.37 httpd.worker       
15151 apache    20   0 2175m  39m 2768 S  0.0  0.3   0:29.48 httpd.worker       
20720 apache    20   0 2171m  36m 2744 S  0.0  0.3   0:02.60 httpd.worker 

I'm working on trying to get the heap_dump gem to work (at all) and then with the SCL ruby193 that foreman is providing.

Comment by Joshua Hoblitt [ 2014/05/28 ]

Chuck Schweizer

I'm on SL6.5 and using the foreman ruby193 SCL packages:

ruby193-ruby-1.9.3.327-29.el6.x86_64
ruby193-rubygem-passenger-4.0.5-3.el6.x86_64
puppet-server-3.6.1-1.el6.noarch

Comment by Joshua Hoblitt [ 2014/05/28 ]

Well, it seems I didn't even have a good understanding of what software versions I am running. The instance of passenger for puppet is running under the system EL ruby 1.8.7. Only the passenger instance for foreman is using the ruby193 SCL. I tried hacking both the apache passenger.conf and init script to use the ruby193 SCl and it didin't work – likely due to missing deps so I backed it off and will test this setup in a sandbox.

Comment by Joshua Hoblitt [ 2014/05/29 ]

Per discussion on #puppet-dev, I've downgraded from 3.6.1 to 3.6.0 to see if this stops memory growth. Downgrading also required disabled env dirs so there are two variables that have changed. Assuming this has indeed stopped the memory growth (it will take several hours to know), I believe the next step it to go back to 3.6.1 without enabling env dirs to see if the memory issues return.

yum downgrade puppet-3.6.0 puppet-server-3.6.0

Comment by Joshua Hoblitt [ 2014/05/29 ]

graph of memory usage after downgrading from 3.6.1 -> 3.6.0

Comment by Joshua Hoblitt [ 2014/05/29 ]

graph of memory usage puppet 3.6.0 -> 3.6.1 -> 3.6.0

Comment by Joshua Hoblitt [ 2014/05/29 ]

About 2 hours after the last restart of the puppet master on 3.6.0, the memory usage and grown much more slowly.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
16889 puppet    20   0  624m 503m 4100 S 56.3  4.2  32:52.69 ruby               
16734 puppet    20   0  618m 496m 4100 S  0.0  4.2  31:25.72 ruby               
17930 puppet    20   0  613m 491m 4100 S  0.3  4.1  22:21.42 ruby               
17921 puppet    20   0  596m 476m 4100 S  0.3  4.0  29:30.85 ruby               
19965 puppet    20   0  576m 457m 4036 S  0.0  3.9  20:47.40 ruby               
13824 puppet    20   0  569m 448m 4100 S  0.3  3.8  14:12.90 ruby               
20897 activemq  20   0 4381m 295m 3612 S  0.3  2.5   3:11.37 java               
18123 puppet    20   0  403m 281m 4100 S  0.3  2.4  14:20.57 ruby               
13762 puppet    20   0  400m 279m 4100 S  0.0  2.4  18:51.45 ruby               
16530 foreman   20   0  615m 227m 4648 S  0.0  1.9   1:19.78 ruby               
18009 foreman   20   0  615m 223m 4648 S  1.3  1.9   2:21.38 ruby               
18081 foreman   20   0  628m 213m 4648 S  0.0  1.8   2:17.52 ruby               
12089 foreman   20   0  614m 208m 4644 S  0.0  1.8   1:23.78 ruby       

Comment by Joshua Hoblitt [ 2014/05/29 ]

plot of puppet master CPU usage over the last week

Comment by Joshua Hoblitt [ 2014/05/29 ]

graph of 3.6.1 w/ env dirs -> 3.6.0 w/o -> 3.6.1 w/0

Comment by Joshua Hoblitt [ 2014/05/29 ]

graph showing 3.6.1 + env dir + environment_timeout = unlimited appearing to be memory stable.

Comment by Joshua Hoblitt [ 2014/05/29 ]

It's looking pretty good for 3.6.1 +

    environmentpath   = $confdir/environments 
    environment_timeout = unlimited

Per prior discussion on #puppet-dev, I'm going to change the timeout to 180s for the rest of the night.

Comment by Joshua Hoblitt [ 2014/05/30 ]

graph showing the transition to

environment_timeout = 180

Comment by Joshua Hoblitt [ 2014/05/30 ]

Setting

environment_timeout = 180

appears to have a similar affect on memory growth as leaving it at the default value of 5s. The graph looks a bit different in that an addition 2GB of memory was added to the VM and all processes belonging to the puppet user were put into a new cgroup with a memory limit of 8GB. The sharp peaks in the graph is the result of the OOM activating in the cgroup. Eg.

Memory cgroup out of memory: Kill process 13301 (ruby) score 172 or sacrifice child
Killed process 13301, UID 52, (ruby) total-vm:1569376kB, anon-rss:1441948kB, file-rss:3932kB

Comment by Andrew Parker [ 2014/05/30 ]

I've attached a patch (functions.patch) that can be applied to a puppet 3.6.1 installation and should get rid of most of the memory leak. I've applied it locally and run a puppet agent against a master repeatedly and it showed significant improvement in memory growth.

The setup was:

  • Directory environments enabled
  • environment_timeout as the default value
  • puppetlabs-apache installed in the base modulepath
  • production directory environment with no environment specific modules
  • production directory environment manifest is named site.pp and contains only include apache

before the patch a webrick master grew very quickly from 190MB to 400MB in a matter of just a few agent runs. After the patch it took about 12 hours to reach the same size, which is what makes me think there is still a leak there, albeit a much slower one.

Comment by Joshua Hoblitt [ 2014/05/30 ]

I've applied the patch and commented out environment_timeout. We should have a good idea in about an hour.

Comment by Joshua Hoblitt [ 2014/05/30 ]

graph of memory usage. The last "hump" is 3.6.1 + the functions.patch

Comment by Joshua Hoblitt [ 2014/05/30 ]

functions.patch, after about 2 hours of run time, appears to have fixed the memory growth.

+1!!!

Comment by Joshua Hoblitt [ 2014/06/02 ]

Memory usage for the last week. 3.6.0 -> (upgrade fun) -> 3.6.1 w/ env dirs + functions.patch.

Comment by Joshua Hoblitt [ 2014/06/02 ]

It appears that with the default 5s environment_timeout, assuming that 3.6.1 has similar across the board CPU usage as 3.6.0, that CPU usage has increased ~20%.

Comment by Henrik Lindberg [ 2014/06/02 ]

merged: b6a2561

Comment by Kurt Wall [ 2014/06/03 ]

I ran a pathological loop that called puppet agent --test 100 times while monitoring master passenger memory usage with GNOME System Monitor. RSS size grew from 66.8 MiB (at the beginning) to 83.2 MiB (around the 40th iteration), then seemed to plateau at the larger value. Killing the puppet master and restarting freed the memory, of course.

Comment by Kurt Wall [ 2014/06/03 ]

Resolved per previous comment.

Comment by Melissa Stone [ 2014/06/10 ]

Released in Hiera 1.3.4

Comment by Melissa Stone [ 2014/06/10 ]

Apologies, bulk edit got away from me

Comment by Melissa Stone [ 2014/06/10 ]

Released in Puppet 3.6.2

Generated at Mon Dec 16 02:25:44 PST 2019 using JIRA 7.7.1#77002-sha1:e75ca93d5574d9409c0630b81c894d9065296414.