Bug #5318

puppet master behind passenger does not re-parse manifests correctly when running with default environment.

Added by Dan Bode over 1 year ago. Updated 4 months ago.

Status:Closed Start date:11/16/2010
Priority:Urgent Due date:
Assignee:Daniel Pittman % Done:

0%

Category:compiler
Target version:2.6.9
Affected Puppet version:2.6.0 Branch:
Keywords:
Votes: 7

Description

When running puppet behind passenger

  • apache - 2.2.3
  • puppet 2.6.3
  • rack 1.1.0
  • passenger - 2.2.11
  • centos 5.4
  • ruby 1.8.5

site.pp

node default {
  file { '/tmp/two/three':
    content => 'blah',
    mode => '661',
  }
}

when I change the mode of the file resource, I have to run the client twice for it to detect the change.

after changing the manifest, do the dishes, run pupppet:

puppetd -t
info: Loading facts in certname
info: Loading facts in certname
info: Caching catalog for puppetclient
info: Applying configuration version '1289885403'
notice: Finished catalog run in 0.20 seconds
[root@puppetmaster templates]# puppetd -t
info: Loading facts in certname
info: Loading facts in certname
info: Caching catalog for puppetclient
info: Applying configuration version '1289886035'
notice: /Stage[main]//Node[default]/File[/tmp/two/three]/mode: mode changed '664' to '661'
notice: Finished catalog run in 0.20 seconds

I could not reproduce with edp.

If I turn off apache and turn the puppetmaster service on (webbrick), this issue goes away.


Related issues

related to Puppet - Bug #4397: A toplevel class is shadowing a class inside a namespace ... Closed 07/29/2010
related to Puppet - Bug #4344: Templates fails sometimes on 2.6 Closed 07/23/2010
related to Puppet - Bug #4257: include ::foo is not working anymore Closed 07/16/2010
related to Puppet - Bug #8433: Seemingly random failures after 2.7.1 Accepted 07/15/2011
related to Puppet - Bug #8033: Puppet 2.7 => cannot find classes Duplicate 06/21/2011
related to Puppet - Refactor #7703: Handling of Known Resource Types data in the environment ... Accepted 05/27/2011
duplicated by Puppet - Bug #12397: Puppet doesn't work initially after touch'ing any .pp sou... Accepted 02/02/2012
duplicated by Puppet - Bug #8750: puppetmaster needs restart to pick up on changes in types... Accepted 08/03/2011
blocks Puppet Acceptance - Bug #5541: Tests run agent multiple times...because you have to. Closed 12/13/2010

History

Updated by John Warburton over 1 year ago

FYI – on my environment, I do not see this. All home compiled. Most interesting would be the later version of passenger:

  • Solaris 10 Update 8
  • puppet (2.6.3rc3)
  • ruby (1.8.7-p249)
  • apache (2.2.15)
  • fastthread (1.0.7)
  • passenger (2.2.14)
  • rack (1.1.0)
  • rake (0.8.7)

Updated by Dan Bode over 1 year ago

I also tried upgrading passenger to 2.2.14, and also tried upgrading ruby to 1.8.6, could reproduce in both cases

Updated by Kent Shultz over 1 year ago

I am having the same behavior with a stack similar to Dan’s:

  • RHEL 5.5
  • Puppet 2.6.1
  • Ruby 1.8.5
  • Passenger 2.2.15
  • Rack 1.1.0
  • Apache 2.2.3

Could it be the older version of Apache?

Updated by Nigel Kersten over 1 year ago

  • Status changed from Unreviewed to Accepted
  • Affected Puppet version deleted (2.6.3rc3)

Updated by Dominic Maraglia over 1 year ago

Reproduced with edp version 0.1 Centos x86_64 CentOS release 5.5 (Final) 2.6.18-194.el5 x86_64 GNU/Linux

Updated by Dominic Maraglia over 1 year ago

I’ve tested and re-tested this and, despite recent discussions to the contrary; the problem appears repeatable on edp, including when taking “filetimeout = 15” into account. The problem is not 100% repeatable each test run. Will research further.

Updated by Nigel Kersten over 1 year ago

  • Priority changed from Normal to High

Updated by Nigel Kersten about 1 year ago

Reproduced on:

  • Ubuntu LTS 10.04 Lucid
  • Ruby 1.8.7.249-2
  • Puppet 2.6.7
  • Facter 1.5.9
  • Apache 2.2.14-5ubuntu8.4
  • Passenger 2.2.7debian-1
  • Rails 2.2.3-2
  • Rack 1.1.0-3

Updated by Nigel Kersten about 1 year ago

I’ve isolated the problem further.

If you have a custom environment defined, this problem does not occur when puppeting against that environment.

If you do any of the following, the problem is reproducible:

  • Do not specify an environment
  • Specify the environment as “production” (the default) without [production] being defined in puppet.conf on the master
  • Specify the environment as “production” (the default) with [production] defined in puppet.conf on the master

then the problem is reproducible.

Updated by Justin Honold about 1 year ago

Ditto here.

Ubuntu LTS 10.04 Lucid
Ruby 1.8.7.249
Puppet 2.6.7
Facter 1.5.8
Apache 2.2.14-5ubuntu8.4
Passenger 2.2.7debian-1

Updated by Nigel Kersten about 1 year ago

  • Target version set to 2.7.x

Updated by Nigel Kersten about 1 year ago

  • Priority changed from High to Urgent
  • Target version changed from 2.7.x to 2.6.x

Although the above target version says 2.6.x, this is to be targeted at both 2.6.x and 2.7.x

Updated by Nigel Kersten about 1 year ago

  • Subject changed from puppet master behind passenger does not re-parse manifests correctly to puppet master behind passenger does not re-parse manifests correctly when running with default environment.

Updated by Daniel Pittman 12 months ago

The root cause of this problem is actually that the puppet master uses a current cache of the environment, which includes the data parsed from the manifests, but fails to update and manage that cache effectively.

Specifically, we flush that cache after the compile of a catalog is complete, but before we serialize it and send it to the client. This was done to satisfy the precondition of the compile function, that it would refresh the available type data for the compile. Which, indeed, is a weak way to ensure that this precondition was satisfied.

I suspect that it was also constructed this way because it would result (theoretically) in the environment being parsed at the start of compile, but would then stay consistent until the next compile run; it would try and avoid an update to a file in the middle of a compile run resulting in an incoherent set of definitions. (Note: it would absolutely not close the race, just reduce it in scope by only doing this once.)

When we changed to putting more data in the catalog, to reduce the need for type and provider code on the client during catalog manipulation, we actually started using the known_resource_types from the environment while we serialized the catalog to send to the client. This happened after the cache flush, and resulted in the cache being renewed with a new environment (and a second parse of the files) during serialization.

This, in turn, led to the precondition no longer being satisfied (although no one noticed because it was never documented or asserted as a precondition of compile). We would now parse changes from manifests only when we serialized the catalog, not when we started to compile – and the previous set of changes would leak over. Which, in turn, resulted in our taking two passes to notice the change; our sequence was that we would use the previous cache to compile, flush it, reparse in the serialization pass, then next time around would have the right input. Without looking at exact and correlated agent/master traces this looked to be doing the right things in the right places.

Updated by Daniel Pittman 12 months ago

Further details: after thinking for a while, and checking the documentation, it has become clear why this problem is only evident under Passenger, and not under Webrick and similar servers. It circles around the thread cache, and the way these servers handle requests.

Specifically, Passenger doesn’t do threads, so every HTTP request lands in the same Ruby (green) thread. This means that the cache problem would manifest: because we used the same thread for multiple requests, the bad cache flush was exposed, and we did the wrong thing.

Webrick, and the other application servers we tested, are threaded. Specifically, they create a new thread for the request no matter what. This disguised the caching bug around the threads, both when running on small scale in production, and during our testing where we almost exclusively use the Webrick or Morgrel servers.

So, this was a constant bug, but was disguised by the runtime environment in most circumstances, leading to it appearing to be Passenger related.

Updated by Daniel Pittman 12 months ago

  • Assignee set to Daniel Pittman

A minimal fix for this is now pushed to the 2.7rc branch, as well as the 2.6.x branch, for the next two releases.

This ensures the precondition assumed by the compiler is actually true, rather than trying to fix the underlying problems with the design of the thread cache, but it will be effective at the potential cost of a second copy of the known resource type data lingering on the server if a second thread updates the data concurrent to a compile run in the first thread updating the local cache. Overall cost: tiny, in the real world, compared to having things actually work.

Updated by Nigel Kersten 12 months ago

Daniel, I’d like us to close off this bug and set up another one for the more complex solution. Is that possible?

Updated by Daniel Pittman 12 months ago

  • Status changed from Accepted to Merged - Pending Release
  • Affected Puppet version set to 2.6.0

This entered 2.6.x at https://github.com/puppetlabs/puppet/commit/75e2764d15de6cf1dee923019f579f436d5b1587

This entered 2.7rc at https://github.com/puppetlabs/puppet/commit/68c106e3ef192d64eb5a1e8daa1e070774909728

These changes will merge upwards during the normal merge cycle, and should resolve the underlying bug on all platforms.

Updated by James Turnbull 11 months ago

  • Status changed from Merged - Pending Release to Closed
  • Target version changed from 2.6.x to 2.6.9

Updated by Daniel Pittman 4 months ago

For the record, this was also fixed in 2.7.3.

Also available in: Atom PDF