Bug #616

service resource takes over puppetd port - 0.22.4

Added by Brendan Beveridge almost 5 years ago. Updated over 4 years ago.

Status:Closed Start date:
Priority:High Due date:
Assignee:Luke Kanies % Done:

0%

Category:-
Target version:0.25.0
Affected Puppet version:0.25.4 Branch:
Keywords:
Votes: 0

Description

Strange one: Now and again some puppetd clients stop responding on port 8139 if i log onto the client and do the following: NOTE: x.x.x.x = client1 ip, y.y.y.y = puppetmaster server, z.z.z.z = snmp polling device (graphing server)

client1     # netstat -anp  | grep 8139
tcp        0      0 0.0.0.0:8139            0.0.0.0:*               LISTEN     21419/snmpd         
tcp        0      0 x.x.x.x:8139        y.y.y.y:60796     SYN_RECV   -                   
tcp        0      0 x.x.x.x:8139        y.y.y.y:60778     SYN_RECV   -                   
tcp        0      0 x.x.x.x:8139        z.z.z.z:49714     SYN_RECV   -                   
tcp        1      0 x.x.x.x:8139        z.z.z.z:58764     CLOSE_WAIT -                   
tcp        1      0 x.x.x.x:8139        z.z.z.z:55994     CLOSE_WAIT -                   
tcp        1      0 x.x.x.x:8139        z.z.z.z:34520     CLOSE_WAIT -                   
tcp        1      0 x.x.x.x:8139        z.z.z.z:37074     CLOSE_WAIT -                   
tcp        1      0 x.x.x.x:8139        z.z.z.z:33729     CLOSE_WAIT -                   
tcp        1      0 x.x.x.x:8139        z.z.z.z:42101     CLOSE_WAIT -  

Even stranger:

client1#netstat -anp  | grep 161
udp        0      0 0.0.0.0:161             0.0.0.0:*                          21419/snmpd

client1# /etc/init.d/snmpd restart
Restarting network management services: snmpd.

client1# netstat -anp  | grep 8139
 /etc/init.d/puppetd restart
* Stopping puppetd...[fail] 
* Starting puppetd...[ ok ]

client1#netstat -anp | grep 8139
tcp        0      0 0.0.0.0:8139            0.0.0.0:*               LISTEN     27612/ruby1.8

Config for that snmp resource:

package { 'snmp': ensure => latest }
package { 'snmpd': ensure => latest }
configfile { "/etc/snmp/snmpd.conf": source => "/snmpd.conf", mode => 600 }
exec { "/etc/init.d/snmpd restart": subscribe => FILE[[etcsnmpsnmpdconf]], refreshonly => true }

logs when this happened showed very little:

Apr 27 11:21:09 syd-spool-02 puppetdr5392: Reopening log files
Apr 27 11:21:10 syd-spool-02 puppetdr5392: Starting Puppet client version 0.22.3
Apr 27 11:21:17 syd-spool-02 puppetdr5392: Could not trigger run; already running
Apr 27 11:21:18 syd-spool-02 puppetdr5392: Starting configuration run

cloexec.patch (704 Bytes) Valentin Vidic, 06/04/2007 01:37 pm

History

Updated by Brendan Beveridge almost 5 years ago

I can confirm this still happens in 0.22.4 now

I have now tracked down how to replicate this on my systems (debian 3.1, ubuntu 6.06, ubuntu 6.10) puppetd —server $servername —waitforcert 60 —listen —debug if its starts a service in this: notice: //client.hostname.blah/template-class/client-ntp/Service[ntp-server]/ensure: ensure changed ‘stopped’ to ‘running’

if i leave it running its fine, ie puppetd is still listening on 8139 if i then stop puppetd from running in the foreground then run a netstat -anpt | grep 8139

instead of the puppet process in control of the port its now ntpd

my ntp service config is:

package { ‘ntp-server’: ensure => latest } service { ‘ntp-server’: ensure => true, status => “/bin/pidof ntpd”, enable => true } configfile { “/etc/ntp.conf”: source => “/clients/ntp/ntp.conf”, notify => service[‘ntp-server’] }

at first i thought it may have been something funky with using status => “/bin/pidof ntpd”

but i then tested this with a variety of services syslog-ng, am-utils etc and they just use: service { ‘servicename’: ensure => true, enable => true }

Updated by David Lutterkort almost 5 years ago

It sounds a lot like there’s a fork somewhere and the child process doesn’t close all the sockets it received from the parents. Haven’t looked at the code though.

Updated by Luke Kanies almost 5 years ago

  • Status changed from 1 to 2

Yep. This is where the ObjectSpace loop looking for TCPServer instances was from, but I never got it to work apparently, it caused other problems.

Updated by Valentin Vidic over 4 years ago

It seems that puppet doesn’t set close-on-exec flag for opened files so they get inherited by the children. For example, lsof lists these open files for puppetd:

ruby 2383 root 0u CHR 3,0 1713 /dev/ttyp0 ruby 2383 root 1u CHR 3,0 1713 /dev/ttyp0 ruby 2383 root 2u CHR 3,0 1713 /dev/ttyp0 ruby 2383 root 3r FIFO 0,6 7841 pipe ruby 2383 root 4w FIFO 0,6 7841 pipe ruby 2383 root 5u unix 0xc76cb6c0 7842 socket ruby 2383 root 6u REG 8,1 20762 182340 /var/log/puppet/http.log ruby 2383 root 7u REG 8,1 20762 182340 /var/log/puppet/http.log ruby 2383 root 8u REG 8,1 20762 182340 /var/log/puppet/http.log ruby 2383 root 9u IPv4 7848 TCP *:8139 (LISTEN) ruby 2383 root 10r REG 8,1 598 49553 /etc/ssh/ssh_host_dsa_key.pub ruby 2383 root 11r REG 8,1 390 49550 /etc/ssh/ssh_host_rsa_key.pub

The same files are open in the ntpd started by puppetd:

ntpd 2436 ntp 0u CHR 1,3 753 /dev/null ntpd 2436 ntp 1u CHR 1,3 753 /dev/null ntpd 2436 ntp 2u CHR 1,3 753 /dev/null ntpd 2436 ntp 3r FIFO 0,6 7841 pipe ntpd 2436 ntp 4w FIFO 0,6 7841 pipe ntpd 2436 ntp 5u unix 0xc76cb860 7895 socket ntpd 2436 ntp 6u REG 8,1 20762 182340 /var/log/puppet/http.log ntpd 2436 ntp 7u REG 8,1 20762 182340 /var/log/puppet/http.log ntpd 2436 ntp 8u REG 8,1 20762 182340 /var/log/puppet/http.log ntpd 2436 ntp 9u IPv4 7848 TCP :8139 (LISTEN) ntpd 2436 ntp 10r REG 8,1 598 49553 /etc/ssh/ssh_host_dsa_key.pub ntpd 2436 ntp 11r REG 8,1 390 49550 /etc/ssh/ssh_host_rsa_key.pub ntpd 2436 ntp 12w REG 8,1 21975 182323 /var/log/ntpstats/peerstats.20070604 ntpd 2436 ntp 13w REG 8,1 596 182384 /var/log/ntpstats/loopstats.20070604 ntpd 2436 ntp 16u IPv4 7899 UDP :ntp ntpd 2436 ntp 17u IPv6 7900 UDP *:ntp ntpd 2436 ntp 18u IPv6 7901 UDP [::1]:ntp ntpd 2436 ntp 19u IPv6 7902 UDP [fe80::216:3eff:fe27:4eea]:ntp ntpd 2436 ntp 20u IPv4 7903 UDP front.local:ntp ntpd 2436 ntp 21u IPv4 7904 UDP dhcp132.nat11.irb.lo:ntp

You need to call io.fcntl(Fcntl::F_SETFD, Fcntl::FD_CLOEXEC) on every opened io object that has a file descriptor associated.

Updated by Valentin Vidic over 4 years ago

It seems that puppet doesn’t set close-on-exec flag for opened files so they get inherited by the children. For example, lsof lists these open files for puppetd:

ruby    2383 root    0u   CHR        3,0           1713 /dev/ttyp0
ruby    2383 root    1u   CHR        3,0           1713 /dev/ttyp0
ruby    2383 root    2u   CHR        3,0           1713 /dev/ttyp0
ruby    2383 root    3r  FIFO        0,6           7841 pipe
ruby    2383 root    4w  FIFO        0,6           7841 pipe
ruby    2383 root    5u  unix 0xc76cb6c0           7842 socket
ruby    2383 root    6u   REG        8,1   20762 182340 /var/log/puppet/http.log
ruby    2383 root    7u   REG        8,1   20762 182340 /var/log/puppet/http.log
ruby    2383 root    8u   REG        8,1   20762 182340 /var/log/puppet/http.log
ruby    2383 root    9u  IPv4       7848            TCP *:8139 (LISTEN)
ruby    2383 root   10r   REG        8,1     598  49553 /etc/ssh/ssh_host_dsa_key.pub
ruby    2383 root   11r   REG        8,1     390  49550 /etc/ssh/ssh_host_rsa_key.pub

The same files are open in the ntpd started by puppetd:

ntpd    2436  ntp    0u   CHR        1,3            753 /dev/null
ntpd    2436  ntp    1u   CHR        1,3            753 /dev/null
ntpd    2436  ntp    2u   CHR        1,3            753 /dev/null
ntpd    2436  ntp    3r  FIFO        0,6           7841 pipe
ntpd    2436  ntp    4w  FIFO        0,6           7841 pipe
ntpd    2436  ntp    5u  unix 0xc76cb860           7895 socket
ntpd    2436  ntp    6u   REG        8,1   20762 182340 /var/log/puppet/http.log
ntpd    2436  ntp    7u   REG        8,1   20762 182340 /var/log/puppet/http.log
ntpd    2436  ntp    8u   REG        8,1   20762 182340 /var/log/puppet/http.log
ntpd    2436  ntp    9u  IPv4       7848            TCP *:8139 (LISTEN)
ntpd    2436  ntp   10r   REG        8,1     598  49553 /etc/ssh/ssh_host_dsa_key.pub
ntpd    2436  ntp   11r   REG        8,1     390  49550 /etc/ssh/ssh_host_rsa_key.pub
ntpd    2436  ntp   12w   REG        8,1   21975 182323 /var/log/ntpstats/peerstats.20070604
ntpd    2436  ntp   13w   REG        8,1     596 182384 /var/log/ntpstats/loopstats.20070604
ntpd    2436  ntp   16u  IPv4       7899            UDP *:ntp
ntpd    2436  ntp   17u  IPv6       7900            UDP *:ntp
ntpd    2436  ntp   18u  IPv6       7901            UDP [::1]:ntp
ntpd    2436  ntp   19u  IPv6       7902            UDP [fe80::216:3eff:fe27:4eea]:ntp
ntpd    2436  ntp   20u  IPv4       7903            UDP front.local:ntp
ntpd    2436  ntp   21u  IPv4       7904            UDP dhcp132.nat11.irb.lo:ntp

You need to call io.fcntl(Fcntl::F_SETFD, Fcntl::FD_CLOEXEC) on every opened io object that has a file descriptor associated.

Updated by Valentin Vidic over 4 years ago

I have attached a small patch that sets this flag on the socket descriptor. Similar change should be done for other open file descriptors.

It seems the socket problem is a bug in WEBrick, since it also tries to set FD_CLOEXEC but calls fcntl with the wrong parameters –

  io.fcntl(Fcntl::FD_CLOEXEC, 1)

instead of

  io.fcntl(Fcntl::F_SETFD, Fcntl::FD_CLOEXEC)

Updated by Luke Kanies over 4 years ago

  • Status changed from 2 to Closed
  • 7 set to fixed

Fixed in r2553, thanks to patches from vvidic.

Also available in: Atom PDF