Bug #2731

problem communicating with processes in SELinux

Added by Jonathan Stanton over 2 years ago. Updated almost 2 years ago.

Status:Closed Start date:10/16/2009
Priority:Normal Due date:
Assignee:Markus Roberts % Done:

0%

Category:SELinux
Target version:0.25.2
Affected Puppet version:0.24.8 Branch:http://github.com/MarkusQ/puppet/tree/ticket/0.25.x/2731
Keywords:
Votes: 0

Description

On a Centos 5.3 system with NFS mounting home directories, every time puppet runs it tries to update the status of the nfs mount from “present” to “mounted” but fails because the call to mount returns -32 not 0. But as documented on the man page -32 means ‘already mounted’ and so should be treated as success. Here is the logged message from puppetd.

Oct 16 10:28:07 savage-file puppetd[11739]: (//Node[savage-file.livetimenet.com]/nfs_moose_client/Mount[/mnt/moose_home]/ensure) change from present to mounted failed: Execution of '/bin/mount -o rw,async /mnt/moose_home' returned 32: 

This has been going on for awhile (certainly back several versions of puppet) and I’ve tried some ideas with restarting it, and letting puppet do the original mount instead of the OS mounting it directly.


Related issues

related to Puppet - Bug #2997: Puppet hangs on executable under HP-UX Closed 12/30/2009
related to Puppet - Bug #3033: Using temporary files in /tmp to comunicate with child pr... Investigating 01/12/2010
related to Puppet - Bug #3023: Tests needed to show that "exec"ed scripts don't get pare... Closed 01/08/2010
related to Puppet - Bug #3013: util.rb:execute broken on Ruby <1.8.3 Closed 01/07/2010
duplicated by Puppet - Bug #1563: [PATCH] Change Util::Execute to use pipes instead of temp... Duplicate 09/07/2008

History

Updated by Markus Roberts over 2 years ago

  • Status changed from Unreviewed to Needs More Information

Which man page are you seeing that on? AFAIK, 32 means “mount failure” not “already mounted” — it’s an error, not a synonym for success.

Updated by Jonathan Stanton over 2 years ago

I can’t find the man page where I saw the 32 return value meaning “already mounted” but I can duplicate how the mount command line program does know the difference between a failed mount because it’s already mounted and a failed mount for other reasons (permisssions, no portmap, etc). See the following commands I ran on the same machine as before when trying the same mount command that puppet does (copied from the puppet log)

root@savage-file:/var/log# /bin/mount -o rw,async /mnt/moose_home
mount.nfs: /mnt/moose_home is already mounted or busy
root@savage-file:/var/log# echo $?
32

So mount.nfs is actually getting an EBUSY errno (from the mount syscall) not a EPERM or EINVAL. So it prints the error that it is already mounted, but also has a return value of 32.

Now if I try the exact same mount but give the “remount” option so it tries to remount an existing mount I get success:

root@savage-file:/var/log# /bin/mount -o remount,rw,async /mnt/moose_home
root@savage-file:/var/log# echo $?
0

which would indicate success to Puppet and allow it to mark the nfs partition as in mounted state.

Maybe the linux/mount provider in puppet needs to either check if the partition is already mounted (and then stop trying to mount it) or it should try with the “remount” option first and if that fails (as it is documented in the mount man page to fail with EINVAL if it is attempted on a source that is not already mounted) then it will mount it without the remount option.

Updated by Markus Roberts over 2 years ago

  • Status changed from Needs More Information to Investigating

Updated by Ricky Zhou about 2 years ago

Hi, I looked through the code a bit, and it seems that puppet does try to check if the filesystem is mounted first.

I’ve been getting this with 0.25.1 as well, and I think it could be related to SELinux preventing puppet from checking if the filesystem is mounted (I have gotten AVC denials about puppet attempting to redirect output from mount into a temporary file).

Jonathan, do you have SELinux enabled, and can you check if this still happens with it disabled?

I’m not completely sure what the best solution is to this. I guess an SELinux policy fix is in order, but I’m not too familiar with why puppet needs a temporary file to get the output of a command to begin with. There is a comment in util.rb about this:

# There are problems with read blocking with badly behaved children
# read.partialread doesn't seem to capture either stdout or stderr
# We hack around this using a temporary file

# The idea here is to avoid IO#read whenever possible.

it seems that this was added as a somewhat hacky fix for bug #662, so it’s also possible that the fix belongs in puppet.

By the way, this isn’t the first time that we’ve seen bugs involving hanging reads in ruby. I wonder if anything in bug #1963 could be relevant to finding a better solution to bug #662.

Updated by Ricky Zhou about 2 years ago

I just filed a bug against the puppet SELinux policy in Fedora about this: https://bugzilla.redhat.com/show_bug.cgi?id=546550

Apologies if I jumped the gun a bit and the fix really does belong in puppet.

Updated by Ricky Zhou about 2 years ago

Here is a potential fix for the the issue. This patch changes the execute function to use a pipe instead of a temporary file for getting command output, which should avoid the SELinux denials. I also haven’t had a chance to test this inside of puppetd yet, so I’m not yet sure that it doesn’t fail as described in bug #662. I’ll get this tested soon, but it would still be good for somebody familiar with the old issue to take a look at this:

http://ricky.fedorapeople.org/0001-Use-a-pipe-instead-of-a-temp-file-for-command-output.patch

Updated by Markus Roberts about 2 years ago

  • Target version set to 0.25.2

This will probably get bounced to 25.3 but I’m putting it on 0.25.2 for review

Updated by Markus Roberts about 2 years ago

  • Subject changed from mount on linux does not recognize errror code 32 meaning already mounted to problem communicating with processes in SELinux
  • Category changed from mount to SELinux

Updated by Markus Roberts about 2 years ago

  • Status changed from Investigating to Needs Decision

Updated by Markus Roberts about 2 years ago

My thought on the patch:

  • Overall, looks plausible
  • The change:
-                puts detail.to_s
+                # Write errors to stderr.
+                $stderr.puts detail.to_s

while arguably “the right way to do it” is inconsistent with all other occurrences and should be rejected on that bases (though I’m going to add that point to my code smell list). * I’m not too crazy about adding all the blank lines. * This is a much further reaching change than the title suggests * I’m also concerned about the #662 question, though not as much as I was at first.

This list started off longer, and shrunk as I dug into the code.

Updated by Ricky Zhou about 2 years ago

Markus Roberts wrote:

My thought on the patch:

  • Overall, looks plausible
  • The change: […] while arguably “the right way to do it” is inconsistent with all other occurrences and should be rejected on that bases (though I’m going to add that point to my code smell list).

Ah, the reason I made that change was that I was getting inconsistent output from

Puppet::Util.execute(["does_not_exist"], :failonfail => false)

using the original version versus my version. For some reason, my version would return the “No such file or directory…” error while the current version would return a blank string. I’ll look more into why this happens.

  • I’m not too crazy about adding all the blank lines.

I don’t know the first thing about ruby style, so if they’re not helpful, I’m happy to get rid of them :–)

  • This is a much further reaching change than the title suggests
  • I’m also concerned about the #662 question, though not as much as I was at first.

For what it’s worth, I’ve finished my testing on Fedora 12, and I was able to reproduce the issue and confirm that the patch fixes it for me. I wasn’t able to reproduce on CentOS 5.4 (most likely due to different SELinux policy versions?), but the patch didn’t seem to break anything either.

Like you said, this seems to be a pretty far-reaching change, so the more scrutiny and testing this gets, the better. Thanks for taking a look at this!

Updated by Ricky Zhou about 2 years ago

Ricky Zhou wrote:

Ah, the reason I made that change was that I was getting inconsistent output from […] using the original version versus my version. For some reason, my version would return the “No such file or directory…” error while the current version would return a blank string. I’ll look more into why this happens.

OK, it looks like this stems from the use of the exit! function, which closed the file before the error message was properly written into it (using the exit function instead made the output consistent with my non-stderr version). I think it’s safe to consider this a bug in the current implementation that this patch would fix as well.

Here’s a new patch with the stderr change and whitespace changes removed.

http://ricky.fedorapeople.org/0001-Use-a-pipe-instead-of-a-temp-file-for-command-output.patch

Updated by Markus Roberts about 2 years ago

  • Status changed from Needs Decision to In Topic Branch Pending Review
  • Branch set to http://github.com/MarkusQ/puppet/tree/ticket/0.25.x/2731

Thanks Ricky!

Updated by Markus Roberts about 2 years ago

  • Status changed from In Topic Branch Pending Review to Ready For Checkin

Updated by James Turnbull about 2 years ago

  • Status changed from Ready For Checkin to Closed

Pushed in commit:6ab2453d966d1d48e12d8a8cec34b9e460597d04 in branch 0.25.x

Updated by James Turnbull about 2 years ago

  • Status changed from Closed to Code Insufficient
  • Assignee set to Markus Roberts

Re-opening in light of regression as indicated in #2997.

Updated by James Turnbull about 2 years ago

  • Target version changed from 0.25.2 to 0.25.3

Updated by Ricky Zhou about 2 years ago

Here is a new patch that solves a hanging issue when the pipe’s kernel buffer is filled up.

http://ricky.fedorapeople.org/0001-Use-a-pipe-instead-of-a-temp-file-for-command-output.patch (this is on the 0.25.x branch instead of master this time).

Updated by Ricky Zhou about 2 years ago

The latest version of the patch (same URL) seems to fix the issue mentioned in #2997. Would it be too late at this point for this to make it into 0.25.2?

Updated by James Turnbull about 2 years ago

  • Status changed from Code Insufficient to Ready For Checkin
  • Target version changed from 0.25.3 to 0.25.2

Updated by James Turnbull about 2 years ago

  • Status changed from Ready For Checkin to Closed

Pushed in commit:5c6f07b404e946266b33f08855116f7bb1a1800c in branch 0.25.x

Updated by Ricky Zhou about 2 years ago

  • Status changed from Closed to Re-opened

My apologies, testing rc3, I noticed that I introduced one more bug in that patch. Closing stdout/stderr was a bad idea, as it could cause failures when executed programs tried to write to them. This patch reverts to the previous behavior of reopening stdout/stderr to /dev/null.

Sorry about the short notice for this, and happy new year :–)

http://ricky.fedorapeople.org/0001-Do-not-close-stdout-or-stderr-in-execute.patch

Updated by James Turnbull about 2 years ago

  • Status changed from Re-opened to Closed

Pushed in commit:fd631b9945cf33a1e5af849900cf6219b050e321 in branch 0.25.x

Updated by Markus Roberts about 2 years ago

The fix for this ticket was reverted for 0.25.3 as it caused numerous problems. See #3033 for the resumed search for a solution to the problem of communicating with SELinux processes.

Also available in: Atom PDF