Bug #2635

"Could not find value for 'hostname'" after puppetd has been running >24 hours.

Added by Josh Anderson over 2 years ago. Updated about 2 years ago.

Status:Rejected Start date:09/14/2009
Priority:Normal Due date:
Assignee:Josh Anderson % Done:

0%

Category:serialization
Target version:-
Affected Puppet version:0.25.0 Branch:
Keywords:
Votes: 0

Description

Something’s going wrong with fact generation after puppetd has been running for about 24 hours. I’m seeing the following error in puppetd’s log:

Could not retrieve catalog from remote server: Error 400 on SERVER: Failed to parse template templatetest/test.erb: Could not find value for ‘hostname’ at /etc/puppet/modules/templatetest/manifests/init.pp:3 on node puppetm.foo.com

Here’s what I’ve see for this node in the master’s /var/puppet/yaml/facts directory:

--- !ruby/object:Puppet::Node::Facts 
expiration: 2009-09-14 11:48:30.435619 -07:00
name: puppetm.foo.com
values: 
  clientversion: 0.25.0
  rubysitedir: /opt/ruby/lib/ruby/site_ruby/1.8
  ps: ps -ef
  domain: foo.com
  timezone: PDT
  ? !ruby/sym _timestamp
  : Mon Sep 14 11:43:30 -0700 2009

  puppetversion: 0.25.0
  environment: production
  sshrsakey: AAAAB3NzaC1yc2EAAAABIwAAAIEAqJ4c0t/M2wgrbjo...
  facterversion: 1.5.6
  sshdsakey: AAAAB3NzaC1kc3MAAACBAMAIzIP2T9LiVqKV/iyk+...
  rubyversion: 1.8.7

There’s a lot missing there, and it seems like the YAML is malformed.

This has been 100% reproducible for me. This is with Puppet 0.25 and Facter 0.5.6 on Ruby 1.8.7p160 (Solaris SPARC).

A final note: This started happening after I added “path = /usr/bin:/usr/sbin:/bin:/sbin” to the puppetd section of puppet.conf. That was an attempt to resolve some mysterious provider failures (also 100% reproducible for me.) The error messages for those were:

Failed to retrieve current state of resource: Provider groupadd is not functional on this platform

and:

Failed to retrieve current state of resource: Provider crontab is not functional on this platform

So now I’m seeing the hostname error instead of the provider failures, but these two problems may be related.

log.txt - Puppetd log excerpt (2.4 kB) Josh Anderson, 09/14/2009 07:45 pm

History

Updated by Markus Roberts over 2 years ago

  • Status changed from Unreviewed to Accepted
  • Assignee set to Markus Roberts
  • Target version set to 0.25.1

This may be related to #2598.

Updated by Markus Roberts over 2 years ago

  • Category set to serialization

I can reproduce this, at least in part. It looks like there are minor differences between 1.8.7 and earlier versions.

Updated by Josh Anderson over 2 years ago

Markus Roberts wrote:

I can reproduce this, at least in part. It looks like there are minor differences between 1.8.7 and earlier versions.

Excellent! Anything I can do to help troubleshoot?

Updated by Markus Roberts over 2 years ago

So one finding: it’s not related to #2598, and the yaml, though odd looking, is fine.

To see this, go into irb and type

class Puppet; class Node; class Facts; attr_accessor :expiration,:name,:values; end; end; end
x = YAML.load(File.read('/var/puppet/yaml/facts/YOURNODENAME.yaml'))
x.values

And you can see that it loads fine.

Updated by Markus Roberts over 2 years ago

Josh —

Can you run facter stand-alone and post the results?

— Markus

Updated by Luke Kanies over 2 years ago

  • Status changed from Accepted to Needs More Information

Josh, I have two questions:

  • What is the path if you don’t specify it? E.g., if you do puppetd —configprint path, what do you get?

  • You didn’t actually run puppetd for 24 hours until you fixed the path, right? I mean, if it’s broken you wouldn’t let it run that long, so you probably fixed it quickly, and once it was working let it run, and only then saw the problem. If this is the case, then I expect that the provider/path and Facter issues are orthogonal.

Updated by Luke Kanies over 2 years ago

  • Assignee changed from Markus Roberts to Josh Anderson

Updated by Josh Anderson over 2 years ago

Luke Kanies wrote:

Josh, I have two questions:

  • What is the path if you don’t specify it? E.g., if you do puppetd —configprint path, what do you get?
# puppetd --configprint path
none
#
  • You didn’t actually run puppetd for 24 hours until you fixed the path, right? I mean, if it’s broken you wouldn’t let it run that long, so you probably fixed it quickly, and once it was working let it run, and only then saw the problem. If this is the case, then I expect that the provider/path and Facter issues are orthogonal.

I allowed puppetd to run long enough to encounter the provider errors several times to verify that I could reproduce it. E.g., restart, run for 24 hours, check for errors. Digging around in the code and looking at tickets for similar issues made me think that the provider issue had to do with the path that puppetd was using, so, at that point, I added the path to puppet.conf. Then I started to get the template errors, so I restarted puppetd and watched it to see if it was reproducible. It was, so I opened this ticket.

I’m not sure whether or not the two errors have anything to do with each other, but it’s slightly suspicious that they both crop up after puppetd has been running for about the same length of time.

Updated by Josh Anderson over 2 years ago

Markus Roberts wrote:

Josh —

Can you run facter stand-alone and post the results?

— Markus

Yes:

domain => foo.com
facterversion => 1.5.6
fqdn => puppetm.foo.com
hardwareisa => sparc
hardwaremodel => sun4v
hostname => puppetm
id => root
interfaces => lo0_17,e1000g0_5,e1000g201000_5
ipaddress => 10.13.23.49
ipaddress_e1000g0_5 => 10.8.23.49
ipaddress_e1000g201000_5 => 10.13.23.49
ipaddress_lo0_17 => 127.0.0.1
is_virtual => false
kernel => SunOS
kernelmajversion => Generic_127111-02
kernelrelease => 5.10
kernelversion => Generic_127111-02
netmask_e1000g0_5 => 255.0.0.0
netmask_e1000g201000_5 => 255.255.0.0
netmask_lo0_17 => 255.0.0.0
network_e1000g0_5 => 10.0.0.0
network_e1000g201000_5 => 10.13.0.0
network_lo0_17 => 127.0.0.0
operatingsystem => Solaris
operatingsystemrelease => 5.10
ps => ps -ef
puppetversion => 0.25.0
rubysitedir => /opt/ruby/lib/ruby/site_ruby/1.8
rubyversion => 1.8.7
sshdsakey => AAAAB3NzaC1kc3MAAACBAMAIzIP2T9LiVqKV/iyk...
sshrsakey => AAAAB3NzaC1yc2EAAAABIwAAAIEAqJ4c0t/M2w...
timezone => PDT
uniqueid => 84abda4c
uptime => 7 day
virtual => zone

Updated by Markus Roberts over 2 years ago

My first, possibly naive attempt to reproduce this failed. Letting a 25.0 puppetmasterd/puppetd run for over 24hrs did not do anything interesting.

Updated by Markus Roberts over 2 years ago

Well over 100 hours and nothing odd. I’m starting to look for various hypothetical ancillary causes (memory leaks, protocol issues, etc.). Any suggestions would be welcome.

Updated by Josh Anderson over 2 years ago

Markus Roberts wrote:

Well over 100 hours and nothing odd. I’m starting to look for various hypothetical ancillary causes (memory leaks, protocol issues, etc.). Any suggestions would be welcome.

I’m going to run my puppet master with —debug and see if anything shows up there. In the meantime, where in the serialization code were you looking for problems?

Updated by Markus Roberts over 2 years ago

I’m just poking. The idea was that something might change over time or by chance that would cause the server’s parsing of the facts to terminate prematurely and give the appearance that facts had been omitted. It was the result of a “how could this possibly happen” brainstorming session and not the result of any direct evidence.

Updated by James Turnbull over 2 years ago

  • Target version changed from 0.25.1 to 0.25.2

Bumped to 0.25.2.

Updated by Josh Anderson over 2 years ago

Here’s that the puppetmaster logs:

Thu Sep 24 10:36:34 -0700 2009 Puppet (info): Expiring the node cache of puppetm.foo.com
Thu Sep 24 10:36:34 -0700 2009 Puppet (info): Not using expired node for puppetm.foo.com from cache; expired at Thu Sep 24 10:35:34 -0700 2009
Thu Sep 24 10:36:34 -0700 2009 Puppet (info): Caching node for puppetm.foo.com
Thu Sep 24 10:36:34 -0700 2009 Puppet (warning): Host is missing hostname and/or domain: puppetm.foo.com
Thu Sep 24 10:36:34 -0700 2009 Puppet (err): Failed to parse template templatetest/test.erb: Could not find value for 'hostname' at /etc/puppet/modules/templatetest/manifests/init.pp:3 on node puppetm.foo.com
Thu Sep 24 10:36:34 -0700 2009 Puppet (err): Failed to parse template templatetest/test.erb: Could not find value for 'hostname' at /etc/puppet/modules/templatetest/manifests/init.pp:3 on node puppetm.foo.com

I’m afraid that’s not very helpful. However, I can say for sure that this is something going wrong with puppetd, as restarting the puppetmaster doesn’t affect the problem at all.

Updated by Josh Anderson over 2 years ago

Okay, it turns out I was shooting myself in the foot on this one. I had a fact that I wrote six-plus months ago that appended an application’s bin directory to ENV[‘PATH’]. Once ENV[‘PATH’] reached a certain length, fact resolution stopped working properly. This same problem was causing nil:nilClass errors on 0.24.8.

Updated by James Turnbull over 2 years ago

  • Status changed from Needs More Information to Rejected

Closed as end user issue.

Updated by James Turnbull over 2 years ago

  • Target version deleted (0.25.2)

Also available in: Atom PDF