Bug #2942
store configs error: After adding a new external node, all clients fail with unhelpful err: "invalid ip address"
| Status: | Closed | Start date: | 12/16/2009 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | % Done: | 0% |
||
| Category: | - | |||
| Target version: | 0.25.2 | |||
| Affected Puppet version: | 0.25.1 | Branch: | ||
| Keywords: | ||||
| Votes: | 0 |
Description
I added our first external node to our puppetmaster today, and now I receive this message on all nodes whenever they try to refresh their catalog: err: Could not run Puppet configuration client: Parameter ip failed: Invalid IP address
I know that it’s related to the new external node (nagios2) because when i use the ‘kill in db’ script to remove nagios2 from store configs, everything works as usual.
The logs do not say anything useful ether: puppetd trace/debug: http://pastebin.com/f2dad53d2
puppetmaster trace/debug: http://pastebin.com/f600413a3
Related issues
History
Updated by Markus Roberts about 2 years ago
- Status changed from Unreviewed to Investigating
- Target version set to 0.25.2
Updated by Markus Roberts about 2 years ago
- Status changed from Investigating to Needs More Information
- Assignee set to defunt Bode
Rob —
You are correct that the logs do not provide anything enlightening (at least to me). Could you please try running it with —trace and —debug, and provide more information about your setup (e.g., do you know what it is objecting to? What does the output of your external node tool look like? Are you using IPv6? Are there any variables / parameters that were being set explicitly by the puppet code that are now set in the external node tool and may be used as IP addresses? And so forth.
— Markus
Updated by Rob Terhaar about 2 years ago
Hi Markus,
A bit more info about this.
First off, I compared the fact_values for a known working node to the trouble nagios2 node. What I found was that nagios2 fact values seem to missing the lsbmajordistrelease row. Screenshots included below. I will run the tests with —trace as you have instructed and post the results shortly.
Known working node(id=16), fact_values table, sorted by fact_name ID: (select * from fact_values where “host_id”= 16 order by “fact_name_id” asc;) http://img.skitch.com/20091220-e97sjp77yjhjef598xynqxxqgp.jpg
Trouble nagios2(40) node, fact_values table, also sorted by fact_name ID: (select * from fact_values where “host_id”= 40 order by “fact_name_id” asc;) http://img.skitch.com/20091220-g675pj3mi57149ttih399x59ae.jpg
Update: I’ve also attached the raw database output files from postgres.
Updated by Rob Terhaar about 2 years ago
Hi Markus,
We’re actually not using external nodes, just regex in nodes.pp. re:IPv6, we have a bit of puppet code that disables IPv6 on the initial server provision, and we have not experienced any problems with our IPv6 disabler working in Debian5 thus far.
Updated by Markus Roberts about 2 years ago
- Assignee changed from defunt Bode to Dan Bode
Updated by Rob Terhaar about 2 years ago
I ran the same test with —trace and —debug enabled for both puppetmaster and puppetd and found nothing of interest in the logs. Is there anyway to find out the query that activerecord sends to postgres? The fact that the table data becomes misalligned feels like a Rails bug to me (but what do i know?!)
Updated by Markus Roberts about 2 years ago
Interesting. I’m surprised that running the puppetmaster with —trace & —debug did not produce anything about the “err: Could not run Puppet configuration client: Parameter ip failed: Invalid IP address”; there are only three places that message could have originated, and each of them should have produced a stack trace on the console if run with —trace.
Are you watching the console output (e.g. —no-daemonize) or just the logs?
— Markus
Updated by Rob Terhaar about 2 years ago
Thanks Markus, you are correct I was watching the logs. puppetmasterd —trace —no-damonize produces a lot more output. I will update this bug shortly with the new information.
Updated by Rob Terhaar about 2 years ago
puppetd —debug —trace http://pastebin.com/f76447bee
puppetmaster —debug —trace http://pastebin.com/f6db3d10c
However, I don’t see anything specific to the database node insert after the catalog is compiled:
notice: Compiled catalog for nagios2.domain.com in 4.24 seconds info: Caching catalog for nagios2.domain.com debug: Searched for resources in 0.00 seconds debug: Searched for resource params and tags in 0.00 seconds debug: Resource removal in 0.00 seconds debug: Resource merger in 0.00 seconds debug: Added resources(parameters) in 1.49 seconds debug: Added resources(tags) in 1.46 seconds debug: Added resources(initialization) in 0.49 seconds debug: Resource addition in 17.93 seconds debug: Performed resource comparison in 17.93 seconds debug: Saved catalog to database in 17.94 seconds
Updated by Markus Roberts about 2 years ago
- Assignee changed from Dan Bode to Markus Roberts
Rob —
Can you try changing line 10 of /usr/lib/ruby/1.8/puppet/type/host.rb to:
raise Puppet::Error, "Invalid IP address: #{value.inspect}"
and try again. No need for trace/debug this time, and I really only expect that one line of output on the client to change. That should tell us what the bad IP looks like, which may help determine where it’s coming from. Meanwhile, I’ll be exploring another angle.
— Markus
Updated by Markus Roberts about 2 years ago
Also, do you know which facts are producing the “xx.xx.210.11x” values? I’ve not seen those before (which isn’t to say they might not be normal). It might help to do:
select * from fact_values,fact_names as hn where "host_id"= 40 and "fact_name_id" = hn.id order by "fact_name_id" asc;
Updated by Rob Terhaar about 2 years ago
Hi Marcus,
I’ll try that change and report back shortly. Sorry for misleading you about the xx.xx IPs; I intended to blank out proprietary IP information from the ticket. (the real IPs from facter and in the database are actually correct)
The only odd thing about the facts collected from this node is that when facts are initially inserted into fact_values, the data is misaligned.
Updated by Rob Terhaar about 2 years ago
nagios2:~# vim /usr/lib/ruby/1.8/puppet/type/host.rb (made change as requested)
nagios2:~# puppetd —test info: Retrieving plugin err: /File[/var/lib/puppet/lib]: Failed to retrieve current state of resource: Could not retrieve information from source(s) puppet://mgmt.domain.com/plugins info: Caching catalog for nagios2.domain.com err: Could not run Puppet configuration client: Parameter ip failed: Invalid IP address: “”
(subsequent runs return the same blank value for ip address)
Results from your suggested sql query:
working host: http://img.skitch.com/20091221-khxge4nu6b36ecihfse78pkppw.jpg
nagios2 broken host: http://img.skitch.com/20091221-1g5dby5iqij4drhefddqydwbqx.jpg
Updated by Markus Roberts about 2 years ago
Sorry for misleading you about the xx.xx IPs; I intended to blank out proprietary IP information from the ticket. (the real IPs from facter and in the database are actually correct)
Ah, that makes much more sense. I was really scratching my head over those “x"s.
The only odd thing about the facts collected from this node is that when facts are initially inserted into fact_values, the data is misaligned.
Can you clarify what you mean by “misaligned”? Also, I note that the nagios2 node appears to have 7 fewer facts.
’m going to look at the query results now. I’m suspecting that the broken one are just dropping a value somewhere.
Updated by Rob Terhaar about 2 years ago
sorry I know it makes it a bit hard to troubleshoot this when IP and hostname information is redacted, i’ve included output from facter and ifconfig on the troubled nagios2 node:
facter: http://img.skitch.com/20091221-d8uk1xmbdjai546qc79i7rd1u3.jpg
ifconfig: http://img.skitch.com/20091221-gaaryqmkexsepesyui5dtsskih.jpg
Updated by Markus Roberts about 2 years ago
So the non-working one is missing macaddress_eth1, netmask_eth1, and, most interestingly, ipaddress_eth1, but has four additional values (macaddress_dummy, processor1, processor2, processor3). I’d suspect that ipaddress_eth1 is the problem.
It would seem very odd to me if these changes were being made in the store config process. How is ipaddress_eth1 being set in the working case?
Updated by Markus Roberts about 2 years ago
Here’s a thought: could it be that this box doesn’t have an eth1 but something in your manifest is assuming that there will be one and trying to use its ip address?
Updated by Rob Terhaar about 2 years ago
yes that’s a very good possibility! I’ll check the catalog yaml and the manifests assigned to this node and respond in a moment…
Updated by Rob Terhaar about 2 years ago
I’ve added this ugly hack to my site.pp for now:
case $fqdn { ‘nagios2.domain.com’: {
$ipaddress_eth1 = $ipaddress_eth0
} }
This is a terrible hack, but it has fixed the problem until we can rewrite the various manifests assigned to this node which reference the non-existent $ipaddress_eth1. We’re using exported resources that reference $ipaddress_eth1 as well as referencing that variable from within .erb files.
Updated by Rob Terhaar about 2 years ago
so it seems that the bug can be potentially summed up with:
if you realized exported resources which contain variables based on facts that are not present on the node where the exported resources were collected from, puppetmaster’s database becomes inconsistent.
Updated by Markus Roberts about 2 years ago
Just to clarify, was the address being set before, by the explicit node, and (if so) would it make more sense to have the external node tool do “the same thing”?
In any case, this sounds like a site problem, not a puppet bug, correct?
Updated by Luke Kanies about 2 years ago
Rob Terhaar wrote:
so it seems that the bug can be potentially summed up with:
if you realized exported resources which contain variables based on facts that are not present on the node where the exported resources were collected from, puppetmaster’s database becomes inconsistent.
I think it’s simpler than that – if you attempt to export an invalid resource, then it only fails on collection rather than export.
I think this is a real bug, but the bug is the above – invalid resources should fail on export, rather than on collection. However, in many cases this isn’t actually possible – we can’t do much resource validation until it reaches the client, which in the case of exported resources can take a while and seem terribly disconnected.
Updated by Markus Roberts about 2 years ago
- Status changed from Needs More Information to Closed
In this case it wasn’t so much a question of the resources being invalid as of them depending on variables that were never set. That would pose the same problems regardless of whether they were exported or not.
Updated by Luke Kanies about 2 years ago
Markus Roberts wrote:
In this case it wasn’t so much a question of the resources being invalid as of them depending on variables that were never set. That would pose the same problems regardless of whether they were exported or not.
I think it should be opened as a new bug, though — validation should happen on export rather than collect.