Bug #2942

store configs error: After adding a new external node, all clients fail with unhelpful err: "invalid ip address"

Added by Rob Terhaar about 2 years ago. Updated almost 2 years ago.

Status:Closed Start date:12/16/2009
Priority:Normal Due date:
Assignee:Markus Roberts % Done:

0%

Category:-
Target version:0.25.2
Affected Puppet version:0.25.1 Branch:
Keywords:
Votes: 0

Description

I added our first external node to our puppetmaster today, and now I receive this message on all nodes whenever they try to refresh their catalog: err: Could not run Puppet configuration client: Parameter ip failed: Invalid IP address

I know that it’s related to the new external node (nagios2) because when i use the ‘kill in db’ script to remove nagios2 from store configs, everything works as usual.

The logs do not say anything useful ether: puppetd trace/debug: http://pastebin.com/f2dad53d2

puppetmaster trace/debug: http://pastebin.com/f600413a3

other - database output, known working node (31.6 kB) Rob Terhaar, 12/20/2009 07:15 pm

nagios2 - database output, trouble nagios2 node (27 kB) Rob Terhaar, 12/20/2009 07:15 pm


Related issues

related to Puppet - Bug #2949: host can be exported without ipaddress Rejected 12/17/2009
related to Puppet - Bug #2964: updated resources cannot be collected until they are expo... Closed 12/19/2009

History

Updated by Markus Roberts about 2 years ago

  • Status changed from Unreviewed to Investigating
  • Target version set to 0.25.2

Updated by Markus Roberts about 2 years ago

  • Status changed from Investigating to Needs More Information
  • Assignee set to defunt Bode

Rob —

You are correct that the logs do not provide anything enlightening (at least to me). Could you please try running it with —trace and —debug, and provide more information about your setup (e.g., do you know what it is objecting to? What does the output of your external node tool look like? Are you using IPv6? Are there any variables / parameters that were being set explicitly by the puppet code that are now set in the external node tool and may be used as IP addresses? And so forth.

— Markus

Updated by Rob Terhaar about 2 years ago

Hi Markus,

A bit more info about this.

First off, I compared the fact_values for a known working node to the trouble nagios2 node. What I found was that nagios2 fact values seem to missing the lsbmajordistrelease row. Screenshots included below. I will run the tests with —trace as you have instructed and post the results shortly.

Known working node(id=16), fact_values table, sorted by fact_name ID: (select * from fact_values where “host_id”= 16 order by “fact_name_id” asc;) http://img.skitch.com/20091220-e97sjp77yjhjef598xynqxxqgp.jpg

Trouble nagios2(40) node, fact_values table, also sorted by fact_name ID: (select * from fact_values where “host_id”= 40 order by “fact_name_id” asc;) http://img.skitch.com/20091220-g675pj3mi57149ttih399x59ae.jpg

Update: I’ve also attached the raw database output files from postgres.

Updated by Rob Terhaar about 2 years ago

Hi Markus,

We’re actually not using external nodes, just regex in nodes.pp. re:IPv6, we have a bit of puppet code that disables IPv6 on the initial server provision, and we have not experienced any problems with our IPv6 disabler working in Debian5 thus far.

Updated by Markus Roberts about 2 years ago

  • Assignee changed from defunt Bode to Dan Bode

Updated by Rob Terhaar about 2 years ago

I ran the same test with —trace and —debug enabled for both puppetmaster and puppetd and found nothing of interest in the logs. Is there anyway to find out the query that activerecord sends to postgres? The fact that the table data becomes misalligned feels like a Rails bug to me (but what do i know?!)

Updated by Markus Roberts about 2 years ago

Interesting. I’m surprised that running the puppetmaster with —trace & —debug did not produce anything about the “err: Could not run Puppet configuration client: Parameter ip failed: Invalid IP address”; there are only three places that message could have originated, and each of them should have produced a stack trace on the console if run with —trace.

Are you watching the console output (e.g. —no-daemonize) or just the logs?

— Markus

Updated by Rob Terhaar about 2 years ago

Thanks Markus, you are correct I was watching the logs. puppetmasterd —trace —no-damonize produces a lot more output. I will update this bug shortly with the new information.

Updated by Rob Terhaar about 2 years ago

puppetd —debug —trace http://pastebin.com/f76447bee

puppetmaster —debug —trace http://pastebin.com/f6db3d10c

However, I don’t see anything specific to the database node insert after the catalog is compiled:

notice: Compiled catalog for nagios2.domain.com in 4.24 seconds info: Caching catalog for nagios2.domain.com debug: Searched for resources in 0.00 seconds debug: Searched for resource params and tags in 0.00 seconds debug: Resource removal in 0.00 seconds debug: Resource merger in 0.00 seconds debug: Added resources(parameters) in 1.49 seconds debug: Added resources(tags) in 1.46 seconds debug: Added resources(initialization) in 0.49 seconds debug: Resource addition in 17.93 seconds debug: Performed resource comparison in 17.93 seconds debug: Saved catalog to database in 17.94 seconds

Updated by Markus Roberts about 2 years ago

  • Assignee changed from Dan Bode to Markus Roberts

Rob —

Can you try changing line 10 of /usr/lib/ruby/1.8/puppet/type/host.rb to:

         raise Puppet::Error, "Invalid IP address: #{value.inspect}"

and try again. No need for trace/debug this time, and I really only expect that one line of output on the client to change. That should tell us what the bad IP looks like, which may help determine where it’s coming from. Meanwhile, I’ll be exploring another angle.

— Markus

Updated by Markus Roberts about 2 years ago

Also, do you know which facts are producing the “xx.xx.210.11x” values? I’ve not seen those before (which isn’t to say they might not be normal). It might help to do:

select * from fact_values,fact_names as hn where "host_id"= 40 and "fact_name_id" = hn.id order by "fact_name_id" asc;

Updated by Rob Terhaar about 2 years ago

Hi Marcus,

I’ll try that change and report back shortly. Sorry for misleading you about the xx.xx IPs; I intended to blank out proprietary IP information from the ticket. (the real IPs from facter and in the database are actually correct)

The only odd thing about the facts collected from this node is that when facts are initially inserted into fact_values, the data is misaligned.

Updated by Rob Terhaar about 2 years ago

nagios2:~# vim /usr/lib/ruby/1.8/puppet/type/host.rb (made change as requested)

nagios2:~# puppetd —test info: Retrieving plugin err: /File[/var/lib/puppet/lib]: Failed to retrieve current state of resource: Could not retrieve information from source(s) puppet://mgmt.domain.com/plugins info: Caching catalog for nagios2.domain.com err: Could not run Puppet configuration client: Parameter ip failed: Invalid IP address: “”

(subsequent runs return the same blank value for ip address)

Results from your suggested sql query:

working host: http://img.skitch.com/20091221-khxge4nu6b36ecihfse78pkppw.jpg

nagios2 broken host: http://img.skitch.com/20091221-1g5dby5iqij4drhefddqydwbqx.jpg

Updated by Markus Roberts about 2 years ago

Sorry for misleading you about the xx.xx IPs; I intended to blank out proprietary IP information from the ticket. (the real IPs from facter and in the database are actually correct)

Ah, that makes much more sense. I was really scratching my head over those “x"s.

The only odd thing about the facts collected from this node is that when facts are initially inserted into fact_values, the data is misaligned.

Can you clarify what you mean by “misaligned”? Also, I note that the nagios2 node appears to have 7 fewer facts.

’m going to look at the query results now. I’m suspecting that the broken one are just dropping a value somewhere.

Updated by Rob Terhaar about 2 years ago

sorry I know it makes it a bit hard to troubleshoot this when IP and hostname information is redacted, i’ve included output from facter and ifconfig on the troubled nagios2 node:

facter: http://img.skitch.com/20091221-d8uk1xmbdjai546qc79i7rd1u3.jpg

ifconfig: http://img.skitch.com/20091221-gaaryqmkexsepesyui5dtsskih.jpg

Updated by Markus Roberts about 2 years ago

So the non-working one is missing macaddress_eth1, netmask_eth1, and, most interestingly, ipaddress_eth1, but has four additional values (macaddress_dummy, processor1, processor2, processor3). I’d suspect that ipaddress_eth1 is the problem.

It would seem very odd to me if these changes were being made in the store config process. How is ipaddress_eth1 being set in the working case?

Updated by Markus Roberts about 2 years ago

Here’s a thought: could it be that this box doesn’t have an eth1 but something in your manifest is assuming that there will be one and trying to use its ip address?

Updated by Rob Terhaar about 2 years ago

yes that’s a very good possibility! I’ll check the catalog yaml and the manifests assigned to this node and respond in a moment…

Updated by Rob Terhaar about 2 years ago

I’ve added this ugly hack to my site.pp for now:

case $fqdn { ‘nagios2.domain.com’: {

$ipaddress_eth1 = $ipaddress_eth0

} }

This is a terrible hack, but it has fixed the problem until we can rewrite the various manifests assigned to this node which reference the non-existent $ipaddress_eth1. We’re using exported resources that reference $ipaddress_eth1 as well as referencing that variable from within .erb files.

Updated by Rob Terhaar about 2 years ago

so it seems that the bug can be potentially summed up with:

if you realized exported resources which contain variables based on facts that are not present on the node where the exported resources were collected from, puppetmaster’s database becomes inconsistent.

Updated by Markus Roberts about 2 years ago

Just to clarify, was the address being set before, by the explicit node, and (if so) would it make more sense to have the external node tool do “the same thing”?

In any case, this sounds like a site problem, not a puppet bug, correct?

Updated by Luke Kanies about 2 years ago

Rob Terhaar wrote:

so it seems that the bug can be potentially summed up with:

if you realized exported resources which contain variables based on facts that are not present on the node where the exported resources were collected from, puppetmaster’s database becomes inconsistent.

I think it’s simpler than that – if you attempt to export an invalid resource, then it only fails on collection rather than export.

I think this is a real bug, but the bug is the above – invalid resources should fail on export, rather than on collection. However, in many cases this isn’t actually possible – we can’t do much resource validation until it reaches the client, which in the case of exported resources can take a while and seem terribly disconnected.

Updated by Markus Roberts about 2 years ago

  • Status changed from Needs More Information to Closed

In this case it wasn’t so much a question of the resources being invalid as of them depending on variables that were never set. That would pose the same problems regardless of whether they were exported or not.

Updated by Luke Kanies about 2 years ago

Markus Roberts wrote:

In this case it wasn’t so much a question of the resources being invalid as of them depending on variables that were never set. That would pose the same problems regardless of whether they were exported or not.

I think it should be opened as a new bug, though — validation should happen on export rather than collect.

Also available in: Atom PDF