Bug #3558

Performance improvement in lexer.rb

Added by Ken Barber about 2 years ago. Updated about 2 years ago.

Status:Closed Start date:04/14/2010
Priority:Normal Due date:
Assignee:Markus Roberts % Done:

0%

Category:parser
Target version:0.25.5
Affected Puppet version:0.25.5 Branch:MarkusQ:ticket/0.25.x/3558
Keywords:
Votes: 0

Description

Dev thread here: http://groups.google.com/group/puppet-dev/browse_thread/thread/4b4e356721b67ea

Basically we noticed that under our run conditions:

Test manifests: This is with 55 *.pp files … totalling 63526 Puppet dsl lines across all files. System: Intel® Core™2 Quad CPU Q9400 @ 2.66GHz, 3G ram, Ubuntu 10.x (64 bit)

We were getting long wait times in parsing our manifests. We found that by changing a single line in the lexer.rb from using += to << we got a massive improvement in speed:

this is probably pretty damned inefficient…

it’d be nice not to have to load the whole file first…

def file=(file) @file = file @line = 1 File.open(file) { |of|

 str = "" 
  • of.each { |line| str += line }
  • of.each { |line| str << line } @scanner = StringScanner.new(str) }

For example – without the change the parsing was taking us 3 minutes to complete on my desktop:

kbarber@purple:~/tmp/puppet/lak/jaro$ time ~/tmp/puppet/lak/real-puppet/bin/puppet —noop ~/tmp/puppet/lak/jaro/manifests/site.pp —modulepath=/home/ken/tmp/puppet/lak/jaro/modules/ Could not find default node or by name with ‘purple.rim.net, purple.rim, purple’ on node purple.rim.net

real 2m59.575s user 2m54.580s sys 0m4.950s kbarber@purple:~/tmp/puppet/lak/jaro$

After the change from += to << – it was averaging about ~18s:

kbarber@purple:~/tmp/puppet/lak/jaro$ time ~/tmp/puppet/lak/real-puppet/bin/puppet —noop ~/tmp/puppet/lak/jaro/manifests/site.pp —modulepath=/home/ken/tmp/puppet/lak/jaro/modules/ Could not find default node or by name with ‘purple.rim.net, purple.rim, purple’ on node purple.rim.net

real 0m18.183s user 0m16.250s sys 0m1.900s kbarber@purple:~/tmp/puppet/lak/jaro$

More samples:

18.218 18.025 18.154

As suggested in thread, we also tested MR’s patch instead:

—– a/lib/puppet/parser/lexer.rb +++ b/lib/puppet/parser/lexer.rb @@ -267,11 +267,7 @@ class Puppet::Parser::Lexer

 def file=(file)
     @file = file
     @line = 1
  •  File.open(file) { |of|
    
  •      str = ""
    
  •      of.each { |line| str += line }
    
  •      @scanner = StringScanner.new(str)
    
  •  }
    
  • @scanner = StringScanner.new(File.read(file))
    

    end

    def find_string_token

Which seemed to average ~17-18s (slightly faster then the previous suggestion):

kbarber@purple:~/tmp/puppet/lak/jaro$ time ~/tmp/puppet/lak/real-puppet/bin/puppet —noop ~/tmp/puppet/lak/jaro/manifests/site.pp —modulepath=/home/ken/tmp/puppet/lak/jaro/modules/ Could not find default node or by name with ‘purple.rim.net, purple.rim, purple’ on node purple.rim.net

More samples:

real 0m17.936s user 0m16.020s sys 0m1.930s kbarber@purple:~/tmp/puppet/lak/jaro$

17.936 18.011 17.945

In summary – the results were a ~90% improvement which is pretty damn good.

Can you guys review and apply one of these patches? It would be great to see this in 0.25.6 and 2.6.0 …

Thanks.

History

Updated by James Turnbull about 2 years ago

  • Status changed from Unreviewed to Needs Decision
  • Assignee set to Luke Kanies

Updated by Luke Kanies about 2 years ago

  • Status changed from Needs Decision to Ready For Checkin
  • Assignee changed from Luke Kanies to Markus Roberts

IMO this is ready and should be included.

Updated by Markus Roberts about 2 years ago

  • Branch set to MarkusQ:ticket/0.25.x/3558

The File.read version of the fix (see discussion on list) is up on my github.

Updated by James Turnbull about 2 years ago

  • Status changed from Ready For Checkin to Closed
  • Target version set to 0.25.5

Pushed in commit:6739bab16e3126ccba13f025a4b47d38f15c1f67 in branch 0.25.x

Also available in: Atom PDF