Bug #2731
problem communicating with processes in SELinux
| Status: | Closed | Start date: | 10/16/2009 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | % Done: | 0% |
||
| Category: | SELinux | |||
| Target version: | 0.25.2 | |||
| Affected Puppet version: | 0.24.8 | Branch: | http://github.com/MarkusQ/puppet/tree/ticket/0.25.x/2731 | |
| Keywords: | ||||
| Votes: | 0 |
Description
On a Centos 5.3 system with NFS mounting home directories, every time puppet runs it tries to update the status of the nfs mount from “present” to “mounted” but fails because the call to mount returns -32 not 0. But as documented on the man page -32 means ‘already mounted’ and so should be treated as success. Here is the logged message from puppetd.
Oct 16 10:28:07 savage-file puppetd[11739]: (//Node[savage-file.livetimenet.com]/nfs_moose_client/Mount[/mnt/moose_home]/ensure) change from present to mounted failed: Execution of '/bin/mount -o rw,async /mnt/moose_home' returned 32:
This has been going on for awhile (certainly back several versions of puppet) and I’ve tried some ideas with restarting it, and letting puppet do the original mount instead of the OS mounting it directly.
Related issues
History
Updated by Markus Roberts over 2 years ago
- Status changed from Unreviewed to Needs More Information
Which man page are you seeing that on? AFAIK, 32 means “mount failure” not “already mounted” — it’s an error, not a synonym for success.
Updated by Jonathan Stanton over 2 years ago
I can’t find the man page where I saw the 32 return value meaning “already mounted” but I can duplicate how the mount command line program does know the difference between a failed mount because it’s already mounted and a failed mount for other reasons (permisssions, no portmap, etc). See the following commands I ran on the same machine as before when trying the same mount command that puppet does (copied from the puppet log)
root@savage-file:/var/log# /bin/mount -o rw,async /mnt/moose_home mount.nfs: /mnt/moose_home is already mounted or busy root@savage-file:/var/log# echo $? 32
So mount.nfs is actually getting an EBUSY errno (from the mount syscall) not a EPERM or EINVAL. So it prints the error that it is already mounted, but also has a return value of 32.
Now if I try the exact same mount but give the “remount” option so it tries to remount an existing mount I get success:
root@savage-file:/var/log# /bin/mount -o remount,rw,async /mnt/moose_home root@savage-file:/var/log# echo $? 0
which would indicate success to Puppet and allow it to mark the nfs partition as in mounted state.
Maybe the linux/mount provider in puppet needs to either check if the partition is already mounted (and then stop trying to mount it) or it should try with the “remount” option first and if that fails (as it is documented in the mount man page to fail with EINVAL if it is attempted on a source that is not already mounted) then it will mount it without the remount option.
Updated by Markus Roberts over 2 years ago
- Status changed from Needs More Information to Investigating
Updated by Ricky Zhou about 2 years ago
Hi, I looked through the code a bit, and it seems that puppet does try to check if the filesystem is mounted first.
I’ve been getting this with 0.25.1 as well, and I think it could be related to SELinux preventing puppet from checking if the filesystem is mounted (I have gotten AVC denials about puppet attempting to redirect output from mount into a temporary file).
Jonathan, do you have SELinux enabled, and can you check if this still happens with it disabled?
I’m not completely sure what the best solution is to this. I guess an SELinux policy fix is in order, but I’m not too familiar with why puppet needs a temporary file to get the output of a command to begin with. There is a comment in util.rb about this:
# There are problems with read blocking with badly behaved children # read.partialread doesn't seem to capture either stdout or stderr # We hack around this using a temporary file # The idea here is to avoid IO#read whenever possible.
it seems that this was added as a somewhat hacky fix for bug #662, so it’s also possible that the fix belongs in puppet.
By the way, this isn’t the first time that we’ve seen bugs involving hanging reads in ruby. I wonder if anything in bug #1963 could be relevant to finding a better solution to bug #662.
Updated by Ricky Zhou about 2 years ago
I just filed a bug against the puppet SELinux policy in Fedora about this: https://bugzilla.redhat.com/show_bug.cgi?id=546550
Apologies if I jumped the gun a bit and the fix really does belong in puppet.
Updated by Ricky Zhou about 2 years ago
Here is a potential fix for the the issue. This patch changes the execute function to use a pipe instead of a temporary file for getting command output, which should avoid the SELinux denials. I also haven’t had a chance to test this inside of puppetd yet, so I’m not yet sure that it doesn’t fail as described in bug #662. I’ll get this tested soon, but it would still be good for somebody familiar with the old issue to take a look at this:
http://ricky.fedorapeople.org/0001-Use-a-pipe-instead-of-a-temp-file-for-command-output.patch
Updated by Markus Roberts about 2 years ago
- Target version set to 0.25.2
This will probably get bounced to 25.3 but I’m putting it on 0.25.2 for review
Updated by Markus Roberts about 2 years ago
- Subject changed from mount on linux does not recognize errror code 32 meaning already mounted to problem communicating with processes in SELinux
- Category changed from mount to SELinux
Updated by Markus Roberts about 2 years ago
- Status changed from Investigating to Needs Decision
Updated by Markus Roberts about 2 years ago
My thought on the patch:
- Overall, looks plausible
- The change:
- puts detail.to_s + # Write errors to stderr. + $stderr.puts detail.to_s
while arguably “the right way to do it” is inconsistent with all other occurrences and should be rejected on that bases (though I’m going to add that point to my code smell list). * I’m not too crazy about adding all the blank lines. * This is a much further reaching change than the title suggests * I’m also concerned about the #662 question, though not as much as I was at first.
This list started off longer, and shrunk as I dug into the code.
Updated by Ricky Zhou about 2 years ago
Markus Roberts wrote:
My thought on the patch:
- Overall, looks plausible
- The change: […] while arguably “the right way to do it” is inconsistent with all other occurrences and should be rejected on that bases (though I’m going to add that point to my code smell list).
Ah, the reason I made that change was that I was getting inconsistent output from
Puppet::Util.execute(["does_not_exist"], :failonfail => false)
using the original version versus my version. For some reason, my version would return the “No such file or directory…” error while the current version would return a blank string. I’ll look more into why this happens.
- I’m not too crazy about adding all the blank lines.
I don’t know the first thing about ruby style, so if they’re not helpful, I’m happy to get rid of them :–)
- This is a much further reaching change than the title suggests
- I’m also concerned about the #662 question, though not as much as I was at first.
For what it’s worth, I’ve finished my testing on Fedora 12, and I was able to reproduce the issue and confirm that the patch fixes it for me. I wasn’t able to reproduce on CentOS 5.4 (most likely due to different SELinux policy versions?), but the patch didn’t seem to break anything either.
Like you said, this seems to be a pretty far-reaching change, so the more scrutiny and testing this gets, the better. Thanks for taking a look at this!
Updated by Ricky Zhou about 2 years ago
Ricky Zhou wrote:
Ah, the reason I made that change was that I was getting inconsistent output from […] using the original version versus my version. For some reason, my version would return the “No such file or directory…” error while the current version would return a blank string. I’ll look more into why this happens.
OK, it looks like this stems from the use of the exit! function, which closed the file before the error message was properly written into it (using the exit function instead made the output consistent with my non-stderr version). I think it’s safe to consider this a bug in the current implementation that this patch would fix as well.
Here’s a new patch with the stderr change and whitespace changes removed.
http://ricky.fedorapeople.org/0001-Use-a-pipe-instead-of-a-temp-file-for-command-output.patch
Updated by Markus Roberts about 2 years ago
- Status changed from Needs Decision to In Topic Branch Pending Review
- Branch set to http://github.com/MarkusQ/puppet/tree/ticket/0.25.x/2731
Thanks Ricky!
Updated by Markus Roberts about 2 years ago
- Status changed from In Topic Branch Pending Review to Ready For Checkin
Updated by James Turnbull about 2 years ago
- Status changed from Ready For Checkin to Closed
Pushed in commit:6ab2453d966d1d48e12d8a8cec34b9e460597d04 in branch 0.25.x
Updated by James Turnbull about 2 years ago
- Status changed from Closed to Code Insufficient
- Assignee set to Markus Roberts
Re-opening in light of regression as indicated in #2997.
Updated by James Turnbull about 2 years ago
- Target version changed from 0.25.2 to 0.25.3
Updated by Ricky Zhou about 2 years ago
Here is a new patch that solves a hanging issue when the pipe’s kernel buffer is filled up.
http://ricky.fedorapeople.org/0001-Use-a-pipe-instead-of-a-temp-file-for-command-output.patch (this is on the 0.25.x branch instead of master this time).
Updated by Ricky Zhou about 2 years ago
The latest version of the patch (same URL) seems to fix the issue mentioned in #2997. Would it be too late at this point for this to make it into 0.25.2?
Updated by James Turnbull about 2 years ago
- Status changed from Code Insufficient to Ready For Checkin
- Target version changed from 0.25.3 to 0.25.2
Updated by James Turnbull about 2 years ago
- Status changed from Ready For Checkin to Closed
Pushed in commit:5c6f07b404e946266b33f08855116f7bb1a1800c in branch 0.25.x
Updated by Ricky Zhou about 2 years ago
- Status changed from Closed to Re-opened
My apologies, testing rc3, I noticed that I introduced one more bug in that patch. Closing stdout/stderr was a bad idea, as it could cause failures when executed programs tried to write to them. This patch reverts to the previous behavior of reopening stdout/stderr to /dev/null.
Sorry about the short notice for this, and happy new year :–)
http://ricky.fedorapeople.org/0001-Do-not-close-stdout-or-stderr-in-execute.patch
Updated by James Turnbull about 2 years ago
- Status changed from Re-opened to Closed
Pushed in commit:fd631b9945cf33a1e5af849900cf6219b050e321 in branch 0.25.x
Updated by Markus Roberts about 2 years ago
The fix for this ticket was reverted for 0.25.3 as it caused numerous problems. See #3033 for the resumed search for a solution to the problem of communicating with SELinux processes.