Bug #2824

Failing to account for bug in ruby GC

Added by Markus Roberts 8 months ago. Updated 8 months ago.

Status:Closed Start:11/17/2009
Priority:High Due date:
Assigned to:Markus Roberts % Done:

0%

Category:plumbing
Target version:0.25.2
Affected version:0.25.1 Branch:http://github.com/MarkusQ/puppet/tree/ticket/0.25.x/2824
Keywords:memory leak
Votes: 0

Description

There is a known (albeit poorly understood) bug in MRI’s GC/stack frame handling that can cause extreme memory leakage.

Ruby manages a number of special “global” variables that are actually scope limited transitory thread local state accessors; for example $_ (the last line read with gets) $~,$1,$2,… (the details of the last regular expression match).

These are normally stored in the stack frame of the routine that triggered them (so if you do “Bob’s your uncle” =~ /(.*)’s/ in a routine you will subsequently see $1 == “Bob” until the next match is performed, but the routine that called you will not see a change in $1 on return).

A problem arises when a routine that does not have its own stack frame (because of a compiler optimization) performs an operation that creates one or more of these variables. In this case MRI dynamically creates a stack frame but (due to the bug) neither cleans it up on exit nor properly manages for the GC to collect later. This is worse than a normal memory leak (due to careless object reference management for example) in that the orphaned chunks apparently can not be moved. Thus the heap grows increasingly fragmented causing even more memory to be wasted; hundreds of megabytes can be consumed in a few minutes, only to be released when the program exits.

The simplest characterization of the sort of routines that trigger the optimization is “routines with no local variable assignments”. Characterizing the operations that will trigger the bug is a bit harder (see the attached files for a few examples that explore the boundary).

We have around ten routines that meet the definition, several of which can be shown to exhibit the problem.

The following files show cases that do (and do not) cause explosive memory consumption. They are all as simple as possible to demonstrate the point; their behavior under MRI 1.8.6 can be inferred from their names.

match_boom.rb (53 Bytes) Markus Roberts, 11/17/2009 06:20 am

match_no_boom.rb (53 Bytes) Markus Roberts, 11/17/2009 06:20 am

optional_args_no_boom.rb (69 Bytes) Markus Roberts, 11/17/2009 06:20 am

regex_match_boom.rb (83 Bytes) Markus Roberts, 11/17/2009 06:20 am

args_no_boom.rb (66 Bytes) Markus Roberts, 11/17/2009 06:20 am

fork_no_boom.rb (83 Bytes) Markus Roberts, 11/17/2009 06:20 am

gets_boom.rb (187 Bytes) Markus Roberts, 11/17/2009 06:20 am

gets_no_boom.rb (187 Bytes) Markus Roberts, 11/17/2009 06:20 am

regex_subscript_boom.rb (51 Bytes) Markus Roberts, 11/17/2009 06:20 am

lambda_no_boom.rb (111 Bytes) Markus Roberts, 11/17/2009 06:20 am

regex_subscript_boom.rb (51 Bytes) Markus Roberts, 11/17/2009 06:25 am

simple_boom.rb (60 Bytes) Markus Roberts, 11/17/2009 06:25 am

simple_no_boom.rb (70 Bytes) Markus Roberts, 11/17/2009 06:25 am

tricky_boom.rb (84 Bytes) Markus Roberts, 11/17/2009 06:25 am

tricky_no_boom.rb (120 Bytes) Markus Roberts, 11/17/2009 06:25 am

History

Updated by Markus Roberts 8 months ago

Updated by Markus Roberts 8 months ago

  • Branch set to http://github.com/MarkusQ/puppet/tree/ticket/0.25.x/2824

Possible patch up at http://github.com/MarkusQ/puppet/tree/ticket/0.25.x/2824

Updated by Brice Figureau 8 months ago

Markus Roberts wrote:

There is a known (albeit poorly understood) bug in MRI’s GC/stack frame handling that can cause extreme memory leakage.

What version of MRI are affected? Do they plan to fix it? Do you have any pointers to the MRI bug report/ticket/whatever?

Updated by Markus Roberts 8 months ago

  • Status changed from Accepted to Ready for Testing

The bug is known and apparently affects ruby 1.8.2 through 1.8.7, though the biggest impact is in 1.8.5 & 1.8.6 (my understanding is that the bug was partially fixed in 1.8.7 and finally killed in 1.9)

English bug reports are scattered, anecdotal, and often only partially characterize the problem (see, for example) http://rubyforge.org/tracker/?group_id=426&atid=1698&func=detail&aid=19088 http://groups.google.com/group/god-rb/browse_thread/thread/1cca2b7c4a581c2/f0f040d41d7c49ea http://stackoverflow.com/questions/181406/ruby-memory-management

Much more detailed information is available to people who read either Japanese or C. My summary above is based on babble fish translations, C-delving in the 1.8.4-1.8.7 source, and experimentation with small ruby programs like the ones attached to test my understanding.

I’m marking this “Ready for testing”; the change from 0.25.x has no semantic impact, so the only thing to test for is memory usage.

Updated by Brice Figureau 8 months ago

Markus Roberts wrote:

The bug is known and apparently affects ruby 1.8.2 through 1.8.7, though the biggest impact is in 1.8.5 & 1.8.6 (my understanding is that the bug was partially fixed in 1.8.7 and finally killed in 1.9)

English bug reports are scattered, anecdotal, and often only partially characterize the problem (see, for example) http://rubyforge.org/tracker/?group_id=426&atid=1698&func=detail&aid=19088 http://groups.google.com/group/god-rb/browse_thread/thread/1cca2b7c4a581c2/f0f040d41d7c49ea http://stackoverflow.com/questions/181406/ruby-memory-management

Much more detailed information is available to people who read either Japanese or C. My summary above is based on babble fish translations, C-delving in the 1.8.4-1.8.7 source, and experimentation with small ruby programs like the ones attached to test my understanding.

That’s the issue with MRI, most of the developper are Japanese, that doesn’t help :–( Hopefully we speak C fluently :–)

I’m marking this “Ready for testing”; the change from 0.25.x has no semantic impact, so the only thing to test for is memory usage.

Yes, I read the patch and it’s mostly cosmetic things with no impact. I’ll put it on one of my node ASAP to see.

Updated by James Turnbull 8 months ago

  • Status changed from Ready for Testing to Closed

Pushed in commit:“bd5dc649ad55fc4724cafad99852b825adfde182” in branch 0.25.x

Also available in: Atom PDF