Bug #1473
Puppetd stops with error after puppetmasterd is unavailable
| Status: | Closed | Start date: | 07/31/2008 | |
|---|---|---|---|---|
| Priority: | High | Due date: | ||
| Assignee: | % Done: | 0% |
||
| Category: | executables | |||
| Target version: | 0.24.6 | |||
| Affected Puppet version: | 0.24.5 | Branch: | ||
| Keywords: | timeout puppetd | |||
| Votes: | 0 |
Description
We still have a lot of problems with puppetmasterd crashing because it runs out of memory, but we’re noticing that puppetd on clients crashes too, in those cases. It first runs the stored config and then crashes with the following trace (after I add —trace):
/usr/lib/ruby/1.8/timeout.rb:54:in `rbuf_fill': execution expired (Timeout::Error)
from /usr/lib/ruby/1.8/timeout.rb:56:in `timeout'
from /usr/lib/ruby/1.8/timeout.rb:76:in `timeout'
from /usr/lib/ruby/1.8/net/protocol.rb:132:in `rbuf_fill'
from /usr/lib/ruby/1.8/net/protocol.rb:116:in `readuntil'
from /usr/lib/ruby/1.8/net/protocol.rb:126:in `readline'
from /usr/lib/ruby/1.8/net/http.rb:2020:in `read_status_line'
from /usr/lib/ruby/1.8/net/http.rb:2009:in `read_new'
from /usr/lib/ruby/1.8/net/http.rb:1050:in `request'
... 42 levels...
from /usr/lib/ruby/1.8/puppet/network/client/master.rb:254:in `run'
from /usr/lib/ruby/1.8/sync.rb:229:in `synchronize'
from /usr/lib/ruby/1.8/puppet/network/client/master.rb:236:in `run'
from /usr/sbin/puppetd:417
We can simulate this behaviour with sending a “kill -STOP” to the puppetmasterd, starting the puppetd on the client and wait for several minutes until it times out. The “kill -STOP” should simulate a crash, since it keeps the port open, but simply makes the puppetmasterd not respond anymore.
It’s easy to work around this, of course, by having a cronjob that regularly checks if the puppetd is still running. But it would be better if this was fixed in the code, since it’s probably not the last time that a connection from puppetd to puppetmasterd times out.
Tested in Debian Etch with 0.24.5-1 packages from testing.
(Note for those who try, you can restart the puppetmasterd after you send it the STOP signal by sending it a CONT signal.)
History
Updated by Luke Kanies over 3 years ago
- Category changed from unknown to executables
- Status changed from Unreviewed to Accepted
- Priority changed from Normal to High
- Keywords changed from timeout to timeout puppetd
- 3 changed from Unknown to Easy
Updated by James Turnbull over 3 years ago
- Target version set to 0.24.6
Updated by Andrew Shafer over 3 years ago
Looks like a time out applying the catalog, could just wrap things in a rescue, will discuss tomorrow
Updated by Andrew Shafer over 3 years ago
I’m having a hard time reproducing this. I think I know what is happening but I’m not 100% certain.
Can someone that can reproduce this, run the client with —debug —trace and get the logs.
Also, the example for the stack trace that was given looks like it was run with —test, which you would expect to stop after the failure.
Updated by Andrew Shafer over 3 years ago
I have a full stack trace.
I can see where and what to rescue to keep puppet from falling over.
That won’t fix the timeout problem though. Which could/should be a separate issue.
Patch submitted to dev list, need a testing strategy
Updated by Andrew Shafer over 3 years ago
- Status changed from Accepted to Ready For Checkin
- Assignee changed from Andrew Shafer to James Turnbull
Patch to list, and rebased on github
ticket/0.24.x/1473
Updated by James Turnbull over 3 years ago
- Status changed from Ready For Checkin to Closed
Pushed in commit:fb14e91226e494210c3b6c88d8553a745e4ac3ed in branch 0.24.x