Bug #1563
[PATCH] Change Util::Execute to use pipes instead of temporary files for capturing output
| Status: | Duplicate | Start: | 09/07/2008 | ||
|---|---|---|---|---|---|
| Priority: | High | Due date: | |||
| Assignee: | - | % Done: | 90% |
||
| Category: | plumbing | ||||
| Target version: | unplanned | ||||
| Affected version: | 0.24.8 | Branch: | |||
| Keywords: | SELinux execute Tempfile | ||||
| Votes: | 0 |
Description
Patch attached to fix reported behavior.
When triggering Puppet runs which included initscript starts/stops I noticed that I would receive three SELinux AVC denials logged for the process that was being started/stopped for a file of the form /tmp/puppet.$PID.0. Many of the system daemons which ship with CentOS 5 have confined SELinux domains which don’t permit access to much of the system – including these Puppet temp files.
Trying to figure out where to create the file (and with which context) for every service would be impractical (impossible? some services may not have any context that would be usable for write permissions) so I decided to just rewrite it to use Unix pipes.
WorksForMe in my testing.
I’m marking this as high because, depending on what commands are being run and their SELinux policies, this could cause command output to silently disappear (other then the denials in the logs). This could be very frustrating for someone who is trying to use that output.
Related issues
| related to Puppet - Bug #3033: Using temporary files in /tmp to comunicate with child pr... | Investigating | 01/12/2010 | ||
| related to Puppet - Bug #374: Exec as a user doesn't always work | Closed | |||
| related to Puppet - Bug #3013: util.rb:execute broken on Ruby <1.8.3 | Closed | 01/07/2010 | ||
| related to Puppet - Bug #662: hacky patch for avoiding child hangs | Closed | |||
| duplicates Puppet - Bug #2731: problem communicating with processes in SELinux | Closed | 10/16/2009 |
History
Updated by James Turnbull almost 2 years ago
- Category set to plumbing
- Status changed from Unreviewed to Needs design decision
- Assignee set to Luke Kanies
- Target version set to 0.24.6
I don’t have an issue with this… Luke?
Updated by Luke Kanies almost 2 years ago
- Status changed from Needs design decision to Needs more information
We moved to temporary files about 18 months ago because we had so many problems trying to directly capture output. I believe a lot of the problems were related to running as other users, but I can’t actually remember. I’m nearly positive it involved significant behavioral differences between different versions of Ruby, or maybe different operating systems.
I’d like to see the old changes and ticket looked up and compared to this code before it gets accepted, and I’d also like to see it tested in many environments. This is a surprisingly thorny issue, and I’m hesitant to accept the patch unless it’s gone through a lot more than basic usage testing.
Updated by Sean Millichamp almost 2 years ago
Thanks for the reply Luke (and congrats!)
Is this the bug you were referring to: http://reductivelabs.com/redmine/issues/show/374 ?
I did some searching and that seemed to be the closest match of the ones I found.
Updated by Sean Millichamp almost 2 years ago
I did some digging through the git history and found reference to this: http://reductivelabs.com/redmine/issues/show/662 – that looks like the more likely bug.
Updated by Luke Kanies almost 2 years ago
Is that bug enough to convince you that this patch won’t always suffice, then? Seems like this approach has already been tried.
Updated by Sean Millichamp almost 2 years ago
It is enough to convince me that the patch won’t always suffice. However, since I don’t see a reliable way of working with SELinux-confined processes using file-based IO I kept digging. I contacted wyvern (from bug 662) and he told me that the original problem he was having was on Debian Etch with the osirisd package so I downloaded an Etch installed and created myself a test bed system. I WAS able to reliably reproduce the problem, both with the original Puppet code and my patch.
If you add a “require ‘io/nonblock’” and a “poutput[0].nonblock = true” before the first poutput[0].read call then I can not reproduce the Puppet-freezes-forever problem on the Debian system and it satisfies the no-tempfile design for SELinux.
There is only one sticking point so far. For the “package { "osirisd”: ensure => latest }“ case listed in 662 the non-blocking I/O method causes the package installation to not complete successfully. If I add a sleep(10) in between two successive calls to read then the osirisd package is successfully installed. Also, if I put a sleep(10) before the first call to read and don’t make the second call then the package is also successfully installed.
I have tried using select() to check and see if there is data waiting to be read, but it seems to always return that there is.
I’m not sure how to fix that case elegantly yet. There may not be a way other then to sleep for a few seconds by default (and optionally have a parameter to specify the sleep delay?). The waitpid2 calls seems to be returning before the command is really done. I haven’t figured that one out yet, but I’ll poke at it a bit more.
Updated by Sean Millichamp almost 2 years ago
I have spent some time with Etch, apt-get, osirisd, and Puppet and I believe I have narrowed down exactly what the problem is/was with the pipe-based method:
Puppet creates the pipe, sets stdin to /dev/null, and points stdout/stderr to the pipe.
Then Puppet fork()s and exec()s
“/usr/bin/apt-get -q -y -o DPkg::Options::=—force-confold install osirisd”, passing the pipe to apt-get’s stdout/stderr as we would hope and expect.
apt-get then goes through a series of about 47 PIDs (presumably from fork/execing subprocesses and shell commands during script execution, I strace’d it following all children), and apparently still passing our pipe to all of them on stdout/stderr.
During the package install the osirisd daemon is started (presumably as part of a post-install script). This daemon ALSO inherits our pipe on stdout/stderr, but it doesn’t daemonize properly and fails to close stdin/stdout/stderr. Then it remains running, holding the pipe open beyond the life of apt-get and all of the other apt-get generated PIDs.
The initial PID we had to exec() apt-get finishes, triggering continuation from the Process.waitpid2() in Puppet. Puppet promptly tries to read from the pipe, but the pipe is still open in osirisd on its stdout/stderr so the data in the pipe isn’t flushed, EOF is never reached, and Puppet never returns from the IO.read() call (well, never unless someone stops osirisd, releasing and closing the pipes).
I think it is safe to say that this situation will only occur when the process Puppet exec()s is, or calls, a poorly written daemon. For “normal” process runs this won’t happen because Puppet won’t return from Process.waitpid2() until the child is done with an exit code, and by the time the child has exited the pipe resources have been closed and flushed by the OS (if not previously by the child process).
This is probably more of a minor concern with generic exec{}s, but is clearly a moderate concern with package{}. Interestingly, this would be presumably most noticeable with service{} definitions, except the service{} provider calls Util::execute() with “:squelch => true”, which sets stdout/stderr to /dev/null instead of a pipe, thus nicely side-stepping the block-on-read problem.
So where does this leave us?
1) I think everyone agrees having Puppet block forever (for any reason) is a bad thing.
2) This can be fixed effectively with temporary files, but temporary files as Puppet implements them now aren’t happy with SELinux.
3) This can be fixed effectively with non-blocking reads on pipes. It works great for the cases where the programs/daemons are well-behaved but might cause odd problems for the exec’d children (like apt-get not completing the package installs) for the corner cases where badly daemonized processes are started from a Util::execute() where “:squelch = false”. In all cases non-blocking reads causes Puppet to always return. Based on some testing, the corner cases can be worked around by introducing a “sleep (N)” between the two nonblocking IO.read() calls. However, while a value of N=3 seems safe for the apt-get install osirisd case, it seems to me that the safe value of N could vary depending on what was being run. Also, introducing a sleep measured in seconds for every Util::Execute call just to compensate for some corner cases caused by buggy software seems very suboptimal and like the wrong approach.
I have posted on the SELinux mailing list to see if there is any way to use Puppet-generated temporary files in a way that satisfies SELinux for any possible policy and domain transition. If there is I can try to modify the temporary file approach to use it.
I’ll update this ticket when I know more.
Updated by Sean Millichamp almost 2 years ago
As a variation to 3 above, instead of adding an arbitrary sleep(N), another approach would be to implement non-blocking pipes and then add a parameter to exec{} and package{} so that when admins see this mis-behavior they can set something like:
package { osirisd:
ensure => latest,
misbehaved => true,
}
And then anytime misbehaved == true Puppet calls Util::execute() with :squelch => true
Updated by Adrian Bridgett almost 2 years ago
FWIW I think I tried closing /dev/stdin explicitly (as well as redirecting it from /dev/null) so I think it’s more likely to be stdout/stderr. I’ve seen a few programs have this bad behaviour (tomcat is another I’ve seen I think).
Updated by Luke Kanies almost 2 years ago
Is there a way to implement the tempfiles solution that is compatible with SELinux? Is it just that SELinux forbids services from writing temp files, or what?
Updated by Sean Millichamp almost 2 years ago
For the purposes of this discussion I will be referring to the SELinux Tresys Reference Policy as if it is the only possible SELinux policy. As it is the one most SELinux-shipping distros base off of currently, it is a good pick but certainly this could work any way you imagine depending on the policy in place.
“Out of the box” Puppet has no specific SELinux policy, resulting in Puppet running under the initrc_t or unconfined_t domains (depending on if you start it via /etc/rc.d/init.d/puppet or the command line). Both of these domains give Puppet essentially unrestricted access to the system to do as it pleases. This is good and is what you would want under the principle of least surprise – keep things that don’t have a specific policy working as they always have.
The specific problem is that, since Puppet has no policy, when it creates tmpfiles the system uses the default policy rules. For files in /tmp/puppet.#####.# that is the tmp_t label. However, any services which DO have SELinux policies (and which you would want Puppet to manage with service start/stops, etc.) do not have permission to write to files labelled with tmp_t. This is good policy since /tmp exploits have been numerous over the years, but it means that when Puppet passes those file descriptors to the programs they don’t have permissions to write to their STDOUT/STDERR – since those file descriptors are in the tmp_t domain.
After talking with the SELinux guys and detailing the problem they said that this should be possible to do without modifying any Puppet code. However, it requires writing a policy for Puppet so that when Puppet is executed it transitions to the puppet_t domain (for example) and any files written to /tmp from the puppet_t domain get labelled with puppet_tmp_t (for example). Then, in the Puppet policy, you can add a rule that says “allow all SELinux domains to read and write files labelled puppet_tmp_t”.
I am working on a policy file that does this now, except I know nothing about writing SELinux policy files so it is slow going. Right now it transitions to puppet_t (but only if the init script is used) and puppetd immediately dies due to an SELinux denial for access to /sbin, since I haven’t yet given puppet_t processes access to /sbin. :)
I expect/hope to be able to figure this out and have at least some level of a functional SELinux policy for Puppet. I’ll update here (probably in a few days) with my success/failure :)
If I attach a Puppet policy file that compiles against the SELinux reference policy to this ticket would it be something you would entertain shipping with Puppet for testing/use by a wider audience? It would be available to anyone interested and if someone didn’t choose to load it they would just get the current behavior.
Updated by James Turnbull almost 2 years ago
Happy to ship it!
Updated by Luke Kanies almost 2 years ago
It sounds a lot like these problems would just go away if Puppet subprocesses did not inherit file descriptors, right?
That problem is being worked on in another ticket as we speak, so maybe this is unnecessary. Can you try to reproduce your SElinux problems with the patch applied that closes the file descriptors?
Updated by Sean Millichamp almost 2 years ago
Are you referring to the patch posted to puppet-dev to close FDs 3 – 256? If so, then that will solve one of the three denial messages I was noticing. The one file descriptor I can tell you that always leaked was the Tempfile created in util::execute(). The child process needs to call output_file.close before execing but after getting duped on to stdout and stderr. If it is possible for other FDs to leak into the children I haven’t noticed it yet.
stdout and stderr are the real “problems”. We either need to decide that capturing output is always unnecessary and always call execute with
:squelch = trueor make sure the system resource that is passed to the subprocess for stdout and stderr will be allowed for write in all SELinux domains. Unnamed pipes seem to be always allowed since that is an extremely common way for two programs to talk, which is why I originally went that direction. Files seem doable with a custom policy. I’ve made some minimal progress on a puppet SELinux policy so far. I expect/hope to have something simple but functional this week.
Updated by Luke Kanies almost 2 years ago
- Assignee changed from Luke Kanies to Puppet Community
- Target version changed from 0.24.6 to unplanned
I don’t think we can comfortably include this in 0.24.6, so it’s going to have to wait until the solution is clearer, and it may be that SELinux policies are the right fix anyway.
Updated by James Turnbull about 1 year ago
- Assignee deleted (
Puppet Community)
Updated by Jonathan Stanton 11 months ago
- Affected version changed from 0.24.5 to 0.24.8
Has anything been planned for this? It’s still occurring in the latest packaged version 0.24.8 (EPEL packages don’t include 0.25 yet, but from reading the changelog for 0.25 I didn’t see a fix) It’s a really annoying issue that causes lots of wasted log records that confuse real issues.
It appears that some work was done last year on a solution (not just the pipe patch, but also a different approach that moves the puppet temporary files to /var/run/puppet and assigns correct selinux context/policy so there are no warnings. But the last mention of it was in June 2009.
The RH bug 460039 at https://bugzilla.redhat.com/show_bug.cgi?id=460039 discusses this issue and says that some efforts were made to move puppet to use /var/run/puppet instead of /tmp.
From Puppet’s perspective it almost looks like the only change required is to switch the puppet/ruby temporary directory to /var/run/puppet. Then the RH packages can include a selinux policy.
Updated by Luke Kanies 11 months ago
I don’t really know that we came to a final solution. We know that we had problems using pipes, and we know that we have problems with selinux, but I don’t know that we have a potential solution that avoids both of them.
Updated by Markus Roberts 8 months ago
- Status changed from Needs more information to Duplicate
This should be resolved as of 0.25.2 with the fix for #2731.