Bug #791

Users and groups created mid-transaction are not found

Added by Marcin Owsiany over 4 years ago. Updated 9 days ago.

Status:Accepted Start date:
Priority:Normal Due date:
Assignee:- % Done:

0%

Category:user
Target version:-
Affected Puppet version:0.24.4 Branch:
Keywords:
Votes: 3

Description

I recently noticed the following:

Aug 23 18:09:35 extdns03 puppetdr987: [ID 702911 daemon.error] (//extdns03/common_foglight/foglight_agents_setup/Exec[/opt/foglight/script/setup.sh]) Failed to call refresh on Exec[/opt/foglight/script/setup.sh]: Invalid user: foglight

While I’m reasonably sure that the user did exist at that point. So after long and painful debugging I did the following:

extdns01# RUBYLIB=. irb
irb(main):001:0> require 'puppet'
=> true
irb(main):002:0> Puppet::Util.uid('boom')
=> nil

[ Now, in a different window, ran "adduser boom" ]

irb(main):003:0> Puppet::Util.uid('boom')
=> nil
irb(main):004:0>

I guess puppet should flush the uid/gid cache on any user/group operation, or possibly even every time.


Related issues

duplicated by Puppet - Bug #1678: Nameservice caching causes problems with changed groups/u... Duplicate 10/23/2008
duplicated by Puppet - Bug #3159: LDAP groups are being mis-interpretted by RAL Duplicate 02/07/2010

History

Updated by Adrian Bridgett over 4 years ago

I had this for gid caching too. It was biting me when I had this situation: I wanted to create say /etc/bind/foo owned by bind/bind but wanted the package to create the user if possible. The package would be installed first, but then within the same run puppet would try and create the file and complain that it couldn’t find the bind group.

The workaround I did which seems to fix it is to put a Etc.endgrent() call in the posix.rb/gid() call. Presumably an equivalent Etc.endpwent() in the uid() call would be in order?

However when I had some debug in get_posix_fields it showed gid() iterating over all the groups which seems suboptimal compared to just looking for that exact group.

Updated by Anonymous over 4 years ago

  • Status changed from 1 to 2

Updated by Anonymous over 4 years ago

  • Status changed from 2 to 1

Updated by Luke Kanies over 4 years ago

  • Status changed from 1 to 2

The reason that we’re iterating over all of the entries, instead of just looking for a specific entry, is that some platforms always return the current user when you call ‘getXbyY’. It looks like a system library bug, since it happens in perl, also.

I think the platform we encountered it on was Debian, but I do not remember. You should be able to track the details down by following the change history of util.rb.

Updated by Adrian Bridgett over 4 years ago

many thanks for the pointer – looks like commit:6fe01cedeb2fe00df62bf63d7f7375d18d0663ca (new to git I’m afraid)

I wish my ruby/puppet internals skills were better

Updated by David Schmitt over 4 years ago

Hmm, adding Etc.endpwent() did not help. Also it looks like lib/puppet/util/posix.rb already evolved further. This bug is still reproducible in git.

Updated by Luke Kanies about 4 years ago

  • Status changed from 2 to Closed
  • 7 set to fixed

Fixed in commit:dbe70a10dd5d3d29f6529b1a0e6b6545308c6a7d.

In my testing, the endgrent/endpwent did fix the problems. It was definitely a Ruby caching problem, not a Puppet cache.

Updated by Adrian Bridgett about 4 years ago

FWIW I think I’ve found a trickier one to resolve :( Either that or as DavidS says, it’s not a 100% fix.

Part of the install I’m doing sets up LDAP for uid/gid – if this is done during a puppet run then it seems to think that the user doesn’t exist (looking it up in the cache) and then it fails to create the user.

...
notice: //basenode_vars/basenode_node_vars/mer_basenode_node_vars/somebox.example.com
/basenode/basenode_node/ldap::ldap/ldap::core/Service[nscd]:
 Triggering 'refresh' from 13 dependencies
...
err: //basenode_vars/basenode_node_vars/mer_basenode_node_vars/somebox.example.com
/basenode/basenode_node/hids::hids/User[osirisd]/ensure: change
 from absent to present failed: Could not create user osirisd: Execution of '/us
r/sbin/useradd osirisd' returned 2304: useradd: user osirisd exists

Updated by Martha Greenberg almost 4 years ago

This is broken again in 0.24.1. This manifest:

  file { "/etc/bind/named.conf.options":
    mode    => "644",
    source  => "puppet:///dns/named.conf.options",
    owner   => "root",
    group   => "bind",
    require => Package[[bind9]],
  }

  package { "bind9": ensure => present; }

  service { "bind9":
    ensure    => running,
    enable    => true,
    pattern   => "named",
    subscribe => File[[etcbindnamedconfoptions]],
  }

produces the following:

notice: Starting catalog run
debug: Prefetching aptitude resources for package
debug: Executing '/usr/bin/dpkg-query -W --showformat '${Status} ${Package} ${Version}\n_
debug: Puppet::Type::Package::ProviderAptitude: Executing '/usr/bin/dpkg-query -W --showformat '${Status} ${Package} ${Version}\n_
debug: //dns::named/Service[bind9]/subscribe: subscribes to File[/etc/bind/named.conf.options]
debug: //dns::named/File[/etc/bind/named.conf.options]/require: requires Package[bind9]
debug: Puppet::Type::Package::ProviderAptitude: Executing '/usr/bin/dpkg-query -W --showformat ${Status} ${Package} ${Version}\n bind9'
debug: //dns::named/Package[bind9]: Changing ensure
debug: //dns::named/Package[bind9]: 1 change(s)
debug: Puppet::Type::Package::ProviderAptitude: Executing '/usr/bin/aptitude -y -o DPkg::Options::=--force-confold install bind9'
notice: //dns::named/Package[bind9]/ensure: ensure changed 'purged' to 'present'debug: //dns::named/File[/etc/bind/named.conf.options]/checksum: Initializing checksum hash
debug: //dns::named/File[/etc/bind/named.conf.options]: Creating checksum {md5}d6b678ac90fd6ab163d74dfe5d68c2c9
debug: Calling fileserver.describe
err: //dns::named/File[/etc/bind/named.conf.options]: Failed to retrieve current state of resource: Could not find group bind
notice: //dns::named/Service[bind9]: Dependency file[/etc/bind/named.conf.options] has 1 failures
warning: //dns::named/Service[bind9]: Skipping because of failed dependencies
debug: Finishing transaction -613724178 with 1 changes

Updated by Luke Kanies almost 4 years ago

  • Status changed from Closed to 4
  • 7 deleted (fixed)

Updated by Jean-Baptiste Quenot almost 4 years ago

Hi there, I’m facing the same problem:

Here is the first run with

puppetd —test
:

Feb 27 09:54:25 nc80 puppetdr5944: Starting catalog run
Feb 27 09:54:25 nc80 puppetdr5944: (//Node[nc80]/Exec[disk-space 90]/returns) executed successfully
Feb 27 09:54:25 nc80 puppetdr5944: (//Node[nc80]/testuser/User[testuser]/ensure) created
Feb 27 09:54:26 nc80 puppetdr32484: Lock file /var/lib/puppet/state/puppetdlock exists; skipping catalog run
Feb 27 09:54:28 nc80 puppetdr5944: (//Node[nc80]/testuser/Ssh_keys[testuser]/File[/var/testuser/.ssh]/ensure) change from absent to directory failed: Could not set directory on ensure: Could not find group testuser at /home/svnup/_svn/conf/puppet/manifests/site.pp:73
Feb 27 09:54:28 nc80 puppetdr5944: (//Node[nc80]/testuser/Ssh_keys[testuser]/File[/var/testuser/.ssh/known_hosts]/ensure) created
Feb 27 09:54:28 nc80 puppetdr5944: (//Node[nc80]/testuser/Ssh_keys[testuser]/File[/var/testuser/.ssh/id_dsa.pub]/ensure) created
Feb 27 09:54:28 nc80 puppetdr5944: (//Node[nc80]/testuser/Ssh_keyscommit:testuser]/File[/var/testuser/.ssh/id_dsa) Dependency file[/var/testuser/.ssh] has 1 failures
Feb 27 09:54:28 nc80 puppetdr5944: (//Node[nc80]/testuser/Ssh_keyscommit:testuser]/File[/var/testuser/.ssh/id_dsa) Skipping because of failed dependencies
Feb 27 09:54:31 nc80 puppetdr5944: (//Node[nc80]/svnup/Exec[svn up]/returns) executed successfully
Feb 27 09:54:31 nc80 puppetdr5944: Finished catalog run in 6.43 seconds

The second run of

puppetd —test
works however:

Feb 27 09:55:09 nc80 puppetdr6121: Ignoring cache
Feb 27 09:55:10 nc80 puppetdr6121: Caching catalog at /var/lib/puppet/state/localconfig.yaml
Feb 27 09:55:10 nc80 puppetdr6121: Starting catalog run
Feb 27 09:55:12 nc80 puppetdr6121: (//Node[nc80]/testuser/Ssh_keys[testuser]/File[/var/testuser/.ssh]/group) group changed 'root' to 'testuser'
Feb 27 09:55:13 nc80 puppetdr6121: (//Node[nc80]/Exec[disk-space 90]/returns) executed successfully
Feb 27 09:55:15 nc80 puppetdr6121: (//Node[nc80]/svnup/Exec[svn up]/returns) executed successfully
Feb 27 09:55:15 nc80 puppetdr6121: (//Node[nc80]/testuser/Ssh_keyscommit:testuser]/File[/var/testuser/.ssh/id_dsa/ensure) created
Feb 27 09:55:15 nc80 puppetdr6121: Finished catalog run in 5.38 seconds

Then the first run that is triggered by the long-running

puppetd
daemon fails:

Feb 27 09:59:27 nc80 puppetdr32484: Starting catalog run
Feb 27 09:59:28 nc80 puppetdr32484: (//Node[nc80]/Exec[disk-space 90]/returns) executed successfully
Feb 27 09:59:30 nc80 puppetdr32484: (//Node[nc80]/svnup/Exec[svn up]/returns) executed successfully
Feb 27 09:59:32 nc80 puppetdr32484: (//Node[nc80]/testuser/Ssh_keys[testuser]/File[/var/testuser/.ssh]) Failed to retrieve current state of resource: Could not find group testuser
Feb 27 09:59:32 nc80 puppetdr32484: (//Node[nc80]/testuser/Ssh_keyscommit:testuser]/File[/var/testuser/.ssh/id_dsa) Dependency file[/var/testuser/.ssh] has 1 failures
Feb 27 09:59:32 nc80 puppetdr32484: (//Node[nc80]/testuser/Ssh_keyscommit:testuser]/File[/var/testuser/.ssh/id_dsa) Skipping because of failed dependencies
Feb 27 09:59:32 nc80 puppetdr32484: Finished catalog run in 5.12 seconds

While the second run of the long-running puppetd daemon works OK:

Feb 27 10:04:33 nc80 puppetdr32484: Starting catalog run
Feb 27 10:04:33 nc80 puppetdr32484: (//Node[nc80]/Exec[disk-space 90]/returns) executed successfully
Feb 27 10:04:37 nc80 puppetdr32484: (//Node[nc80]/svnup/Exec[svn up]/returns) executed successfully
Feb 27 10:04:38 nc80 puppetdr32484: (//Node[nc80]/testuser/Ssh_keys[testuser]/File[/var/testuser/.ssh]/checksum) checksum changed '{time}Wed Feb 27 09:54:28 +0100 2008' to '{time}Wed Feb 27 09:55:15 +0100 2008'
Feb 27 10:04:39 nc80 puppetdr32484: Finished catalog run in 5.69 seconds

Note that I didn’t had to restart the long-running puppetd daemon. The next run recognizes the new group OK. Looks like puppet does not properly refresh the groups.

This is on Ubuntu 7.10.

Updated by Redmine Admin over 3 years ago

  • Status changed from 4 to Accepted

Updated by Lawrence Ludwig over 3 years ago

  • Affected Puppet version set to 0.24.4

I am also seeing this bug in CentOS 3.9, 4.6 and 5.2 with group permissions created by the rpm package.

Updated by James Turnbull over 3 years ago

  • Target version set to 4

Luke? Any ideas on this one that you can point me at?

Updated by Luke Kanies over 3 years ago

  • Assignee changed from Luke Kanies to James Turnbull

I just took another look at this, and I think there are some stupid logic errors in there somewhere, but I can’t quite peg where.

It’s somewhere between the type/file/group.rb file and util/posix.rb. This Puppet code works fine for me:

file { "/tmp/testing": group => foo, ensure => directory }
group { foo: ensure => present }

So I can’t reproduce the problem in a small case on Debian.

If someone who can reproduce the problem spent some time mucking around in type/file/group.rb, I bet you could track down the logic problems. Ping me on IRC and I’ll help resolve it if I have the time.

Updated by Luke Kanies over 3 years ago

  • Assignee changed from James Turnbull to Luke Kanies
  • Target version changed from 4 to 0.24.6

As I said, I think this is a logic problem, so I’m going to take a crack at it today.

Updated by Luke Kanies over 3 years ago

  • Status changed from Accepted to In Topic Branch Pending Review
  • Assignee changed from Luke Kanies to James Turnbull

I think this is fixed in the tickets/0.24.x/791 branch in my repo. It works on my machine.

Updated by James Turnbull over 3 years ago

  • Status changed from In Topic Branch Pending Review to Closed

Pushed in commit:862077513570c996e9124743369c9af92485151f in branch 0.24.x

Updated by Craig Box over 1 year ago

  • Status changed from Closed to Re-opened
  • Target version changed from 0.24.6 to 4

I am seeing this problem on 0.25 (0.25.4-2ubuntu6) with LDAP users:

  • install libnss-ldapd, libpam-ldapd and nslcd
  • install ldap-auth-config & various config files
  • run auth-client-config, which sets up the PAM config files to include LDAP users (1)
  • create homedirs for LDAP user

causes error output:

err: //homedirs/Homedirectory[example]/File[/home/example]/ensure: change from absent to directory failed: Could not set directory on ensure: Could not find user example at /etc/puppet/modules/homedirs/manifests/init.pp:37 

even though at point (1), if run manually, I can ‘su’ to the user successfully.

See also my ML thread: http://groups.google.com/group/puppet-users/browse_thread/thread/8b3cda86c03f05e1/71cddda9c57d2201?lnk=gst&q=craig.box#71cddda9c57d2201

Updated by Kjetil Torgrim Homme over 1 year ago

Craig Box wrote:

I am seeing this problem on 0.25 (0.25.4-2ubuntu6) with LDAP users:

  • install libnss-ldapd, libpam-ldapd and nslcd
  • install ldap-auth-config & various config files

I believe this is a general libc problem: a process which has initialised the naming service lookup order, can’t/won’t change it. In other words, puppetd needs to re-exec to get the new NSS configuration. This might be solvable in the 2.6 series by using a “pre” stage, but AFAIK all the stages are in the same configuration transaction, so some more explicit support is needed. If a stage can be specified to run in its own transaction, it is a simple matter to send SIGHUP to the running puppetd to make it respawn after the transaction has been applied.

Updated by James Turnbull 11 months ago

  • Status changed from Re-opened to Accepted
  • Assignee deleted (James Turnbull)

Updated by James Turnbull 6 months ago

  • Target version deleted (4)

Updated by Jo Rhett 3 months ago

So we just observed that LDAP groups can be not seen as long as a month later, if puppet was used to modify nsswitch.conf and hasn’t restarted since then. I don’t know about the mid-run re-exec, but I think it should be simple for puppet to make sure it rereads nsswitch.conf (or re-execs) when it starts a run…

Also available in: Atom PDF