[PUP-5693] benchmark:function_loading aborts with stack overflow Created: 2016/01/11  Updated: 2016/03/17  Resolved: 2016/01/31

Status: Closed
Project: Puppet
Component/s: None
Affects Version/s: None
Fix Version/s: PUP 4.4.0

Type: Bug Priority: Normal
Reporter: Kylo Ginsberg Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Template:
Story Points: 1
Sprint: Language 2016-01-27, Language 2016-02-10
Release Notes: Not Needed

 Description   

E.g.

be rake benchmark:function_loading
mkdir -p /var/folders/6v/63w2j07x5vv4fdcx449fphxw0000gp/T/function_loading20160111-22619-d335fu
                 user     system      total        real
Run 1      rake aborted!
stack level too deep



 Comments   
Comment by Kylo Ginsberg [ 2016/01/11 ]

I bisected this down to https://github.com/puppetlabs/puppet/commit/3ce8b6ce93e3bf5fe99587a750dfa580bc1b823e, but didn't get farther than that.

Henrik Lindberg or Thomas Hallgren, I put this in Language Triage. It'd be nice to resurrect these benchmarks.

Comment by Henrik Lindberg [ 2016/01/12 ]

Ha! It seems to be caused by:

["/Users/henrik/git/puppet/lib/puppet/pops/loader/ruby_function_instantiator.rb:19"]

Which is interesting as we have another ticket where a user is struggling with a case where the function_instantiator seems to go into a loop.

Comment by Kylo Ginsberg [ 2016/01/12 ]

I wondered if there was a connection between these two tickets .... perhaps if we resolve this (which is nicely isolated) we can see if it helps with that other overflow ticket.

Comment by Henrik Lindberg [ 2016/01/12 ]

I thought so too - investigating what the cause is - then maybe a patch will help in the other ticket.

Comment by Henrik Lindberg [ 2016/01/13 ]

We may want to backport the fixes to be able to compare across versions.

Comment by Kylo Ginsberg [ 2016/01/13 ]

Yeah, I'm thinking we target stable for this.

Comment by Henrik Lindberg [ 2016/01/14 ]

I found the cause of the problem. There were also additional problems as the 4.x version is more strict.

Basically, the reason an OutOfStack occurs when a function i loaded is simply that it invokes he Ruby parser/interpreter and it adds a number of levels to the stack. therefore, when the stack is filled up for other reasons, calling a function for the first time will trigger loading of it, and then the stack blows.

What has happened over time is that we have refactored code; there is now more delegation (more smaller methods delegating to more specialized version), and wrapping (e.g. doing the work in a lambda yielded to by a wrapper that does profiling). We also tend to use recursion more than in the past. For those reasons there will more frequent out of stack problems for legitimate reasons.

Specifically the function_loading benchark loads and calls functions recursively to a depth of 1000. Reducing this to 800 made the benchmark work without an out of stack. There were also other problems as the benchmark generated module dependencies the wrong way. They are now needed to be correct as modules can only call functions in modules it depends on (bugs in earlier versions did not prevent that).

The good news - it was easy to fix this specifically for the benchmark by reducing the recursion. The bad news is that it was not a bug that when fixed would make the out of stack problem others have reported go away.

We should also for the future consider reducing the recursion/call-levels we use. We should also consider bumping the amount of stack we use in general when running puppet as we are too close to the edge. For more recent Ruby versions (> 2.x) it should be possible to set this with an environment variable.

We should also try to design a solution for recursion in puppet; i.e. when users write recursive puppet functions. We could then prevent the ruby oout of stack from happening in favor of our own error that would be able to tell the user where the recursion is too deep (instead of the Booom - goodbye).

The fix to the benchmark will be merged to stable.

Comment by Kylo Ginsberg [ 2016/01/14 ]

I created a PUP 4.3.3 for this to indicate it can land on stable but shouldn't hold the 4.3.2 release.

Comment by Henrik Lindberg [ 2016/01/14 ]

I was just about to push this up to stable as a maint fix - so, I should wait with that?

Comment by Henrik Lindberg [ 2016/01/18 ]

Added PR against master. Should be on stable as well (either cherry pick, or ask me to rebase it onto stable - cannot be merged quite yet to stable since it is blocked for release work.

Comment by Henrik Lindberg [ 2016/01/18 ]

When running this benchmark it seems to indicate that we have a memory leak in function loading or calling. This cam be observed by simply running the benchmark - each round takes longer than the previous. I can accept that the first round is different from the subsequent rounds, and also random deviation in the numbers, but this is consistently slowing down.

One suspicion was that the GC did not keep up. However, manually inserting gc calls between runs did not reduce the amount of objects allocated. A bit of poking revealed that the stats reported by ObjectSpace go up dramatically with > 100.000 object slots per round.

These numbers differ between ruby versions; 1.9.3 being the worse, but they all behave in a similar way.

Not sure if we should use this ticket to also look at the memory leak, or create a new ticket.

Comment by Thomas Hallgren [ 2016/01/19 ]

Merged to master at b3f7d95

Comment by Henrik Lindberg [ 2016/01/19 ]

PUP-5724 was created to handle the backport to stable PUP 4.3.3 when stable is free for merges. This ticket retargeted for PUP 4.4.0

Comment by Kylo Ginsberg [ 2016/01/31 ]

I verified it now passes without a stack overflow.

Generated at Sat Aug 08 08:33:51 PDT 2020 using Jira 8.5.2#805002-sha1:a66f9354b9e12ac788984e5d84669c903a370049.