Bug #1291

Facter is using the NIS domain name instead of DNS domain name in some cases

Added by Redmine Admin about 4 years ago. Updated 3 months ago.

Status:Closed Start:
Priority:Normal Due date:
Assigned to:Luke Kanies % Done:

0%

Category:library
Target version:-
Keywords: Branch:
Votes: 0

Description

It looks like the code around:

  • http://reductivelabs.com/cgi-bin/facter.cgi/browser/trunk/lib/facter.rb#L692

is looking for the DNS domain name, however that line invokes the “domainname” command, which at least the Debian GNU/Linux and AIX man pages make clear returns the NIS domain name (the Solaris man page is convoluted but seems to say that as well).

In any case, at my site we have a historical NIS domainname that is totally different from our DNS domainname, so this makes things confusing. There are also cases where this would just break things (e.g. we have box.foo.com and box.bar.com both in the NIS domain baz.com, so the first puppetized machine would get a SSL cert for box.baz.com, and the second one would have problems)

On Debian there is the dnsdomainname command, however the only thing I’ve found that works consistantly on all of my site’s myriad operating system/version combinations is to do a double lookup (e.g. get the host’s IP address, and then perform a lookup of that IP address); that way at least you’ll get whatever nsswitch.conf or equivalent thinks your IP address resolves to, which is usualy what you’ll be using when doing any debugging, and which is usually also resolvable by the other machines on the network.

facter-1.5.2-sortfix.patch - Patch to use sort_by in Fact.add (594 Bytes) Robin Hill, 10/08/2008 07:57 am

facter-1.5.2-sortfix2.patch - Sort facts using stable sort algorithm (1010 Bytes) Robin Hill, 10/16/2008 09:16 am

Associated revisions

Revision dca615c98b864d75e2ac5899d98d04a2bd979eba
Added by Ohad Levy 5 months ago

fixes #2573, #2085, #1291 – fixes domain and fqdn facts resolution

This patch removes the relationship between the domain fact and LDAP/NIS domains. domain fact relates to DNS domain – this will avoid the confusion caused by the LDAP/NIS domain (which might be different to the DNS domain name). Additionally, if hostname is already in long form, it won’t try to build the fqdn fact from hostname and domain.

History

Updated by Luke Kanies about 4 years ago

  • Status changed from 1 to 2

I’ll look at this a bit more closely, but the problem with domains is that there isn’t a consistent answer anywhere — sites do it differently all the time. I avoided considering DNS authoritative because my experience with cfengine taught me that most people do not maintain their DNS well enough for this to work.

Updated by Redmine Admin about 4 years ago

If “Domain” is a puppet-specific abstraction that does not have tie-in with a DNS domain, then IMHO you should really use some other word, as the word “Domain” without any explanation is pretty much taken by DNS (or Windows if you happen to be a Windows admin) – perhaps have a “PuppetDomain” fact which is required on each machine and set at install time explictly by the admin (like the Kerberos realm name for a machine being specified in /etc/krb5.conf), and also have optional “DnsDomain”, “NisDomain”, and “KerberosRealm” facts which are only populated if said systems are in use.

Also a bit confused about your statement that the DNS domain isn’t authoritative for the value “Domain” – looking at the code, the only thing that is looked at besides the output of “domainname” is the /etc/resolv.conf file, which as far as I’m aware is DNS-specific. Having a DNS domainname would also seem to be pretty much a requirement for any site using TCP/IP, which would seem to be a requirement for using Puppet.

Updated by Luke Kanies about 4 years ago

  • Status changed from 2 to Closed
  • 7 set to fixed

Fixed in r137. I just added a call to ‘dnsdomainname’ before ‘domainname’, so if that returns an answer then you will get that first.

Updated by Lynn - almost 4 years ago

…removed spam…

Updated by Robin Hill almost 2 years ago

I’m still having issues with this in 1.5.2.

Looking into it, the problem would seem to be more generic. In the ‘add’ method for the Fact class, you’re doing a sort to order by the number of confines. This is currently using Ruby’s built-in sort function, which is implemented as a qsort and is therefore non-deterministic when multiple entries have the same sort key. Since neither the “domainname”, nor the “dnsdomainname” facts have any confines, they’re being ordered pretty randomly (it actually seems to be biased towards the results from domainname, but occasionally does return the dnsdommainname result).

Changing this to use the (deterministic) sort_by function fixes the issue, maintaining the original sort order in cases where the sort keys are identical. The attach patched implements this (sorting by the negative of the length to ensure the largest number of enclosures is first).

Updated by Brenton Leanhardt almost 2 years ago

Robin wrote:

I’m still having issues with this in 1.5.2.

Looking into it, the problem would seem to be more generic. In the ‘add’ method for the Fact class, you’re doing a sort to order by the number of confines. This is currently using Ruby’s built-in sort function, which is implemented as a qsort and is therefore non-deterministic when multiple entries have the same sort key. Since neither the “domainname”, nor the “dnsdomainname” facts have any confines, they’re being ordered pretty randomly (it actually seems to be biased towards the results from domainname, but occasionally does return the dnsdommainname result).

Changing this to use the (deterministic) sort_by function fixes the issue, maintaining the original sort order in cases where the sort keys are identical. The attach patched implements this (sorting by the negative of the length to ensure the largest number of enclosures is first).

For what it’s worth, we hit this bug on RHEL 5 when upgrading to facter 1.5.2. This patch is working well for us.

Updated by devink - almost 2 years ago

I ran into this problem as well since /usr/bin/domainname returns a legacy NIS domain. One of the interesting side effects of this problem is that puppet will start looking in the wrong place for certificates when upgrading clients from factor 1.3 to 1.5. Puppet would also presumably (in our environment) try to create certificates which don’t use the DNS name of host … which just seems wrong.

Updated by David Lutterkort almost 2 years ago

Is there any guarantee that the sort algorithm used by sort_by is stable (i.e., that items with the same sort key appear in the sorted collection in the same order they had initially) ? That’s really at the heart of the problem: the assumption that resolutions with the same number of confines are tried in the order in which they are added to a fact; that’s not true with an unstable sort.

If sort_by is not guaranteed to be stable, this may only appear to fix the problem, and the docs are pretty useless on the subject.

Updated by Luke Kanies almost 2 years ago

  • Subject changed from Facter is using the NIS domain name instead of DNS domain name in some cases to luke

Updated by Luke Kanies almost 2 years ago

  • Subject changed from luke to Facter is using the NIS domain name instead of DNS domain name in some cases
  • Status changed from Re-opened to Accepted

Sorry, weird behaviour with 1password. :/

Updated by Luke Kanies almost 2 years ago

  • Status changed from Accepted to Code Insufficient

This patch can’t actually be working — ‘sort_by’ doesn’t modify the array, so you’re just not sorting the list at all.

You’d need to reassign the array, something like @resolves = @resolves.sort_by { |a| -a.length }.

Also, this causes a few test failures:

1)
NoMethodError in 'Facter::Util::Fact when adding resolution mechanisms should create a new resolution instance'
undefined method `-@' for nil:NilClass
/Users/luke/git/facter/spec/../lib/facter/util/fact.rb:46:in `add'
/Users/luke/git/facter/spec/../vendor/gems/mocha-0.5.6/lib/mocha/class_method.rb:33:in `sort_by'
/Users/luke/git/facter/spec/../lib/facter/util/fact.rb:46:in `each'
/Users/luke/git/facter/spec/../lib/facter/util/fact.rb:46:in `sort_by'
/Users/luke/git/facter/spec/../lib/facter/util/fact.rb:46:in `add'
/Users/luke/git/facter/spec/unit/util/fact.rb:48:
integration/facter.rb:5:

2)
NoMethodError in 'Facter::Util::Fact when adding resolution mechanisms should instance_eval the passed block on the new resolution'
undefined method `-@' for nil:NilClass
/Users/luke/git/facter/spec/../lib/facter/util/fact.rb:46:in `add'
/Users/luke/git/facter/spec/../vendor/gems/mocha-0.5.6/lib/mocha/class_method.rb:33:in `sort_by'
/Users/luke/git/facter/spec/../lib/facter/util/fact.rb:46:in `each'
/Users/luke/git/facter/spec/../lib/facter/util/fact.rb:46:in `sort_by'
/Users/luke/git/facter/spec/../lib/facter/util/fact.rb:46:in `add'
/Users/luke/git/facter/spec/unit/util/fact.rb:56:
integration/facter.rb:5:

3)
'Facter::Util::Fact when adding resolution mechanisms should re-sort the resolutions by length, so the most restricted resolutions are first' FAILED
expected: [#, #, #],
     got: [#, #, #] (using ==)
/Users/luke/git/facter/spec/unit/util/fact.rb:68:
integration/facter.rb:5:

4)
Mocha::ExpectationError in 'Facter::Util::Fact when returning a value should return nil if the value is the empty string'
#.length() - expected calls: 0, actual calls: 1
/Users/luke/git/facter/spec/../lib/facter/util/fact.rb:46:in `add'
/Users/luke/git/facter/spec/../lib/facter/util/fact.rb:46:in `each'
/Users/luke/git/facter/spec/../lib/facter/util/fact.rb:46:in `sort_by'
/Users/luke/git/facter/spec/../lib/facter/util/fact.rb:46:in `add'
/Users/luke/git/facter/spec/unit/util/fact.rb:120:
integration/facter.rb:5:

Updated by Robin Hill almost 2 years ago

luke wrote:

This patch can’t actually be working — ‘sort_by’ doesn’t modify the array, so you’re just not sorting the list at all.

You’d need to reassign the array, something like @resolves = @resolves.sort_by { |a| -a.length }.

You’re right – it was obviously working purely because it wasn’t modifying the array at all. And, looking into the documentation further, it would seem that sort_by isn’t stable either (despite the claims I’d originally found).

Anyway, I located a stable sort routine for ruby (http://codesnippets.joyent.com/posts/show/1698), which the attached patch uses. I see a fix for the domain resolution has already been checked in, but a stable sort may well be required for other facts.

Updated by Luke Kanies almost 2 years ago

I think the “right” solution is just to do a secondary sort, on who-cares-what criteria, so that we get consistent results when the confine count isn’t sufficient. Maybe just make an arbitrary “resolution number” which would increment as the resolutions were created, and sort by that.

Even with a stable sort, you wouldn’t be guaranteed stable results for resolutions stored in separate files, because the file list is retrieved using Dir.entries, which (I believe) returns files in modification order or something similar. Any attempt at bringing stability would need to resolve that problem.

Updated by Benedikt Böhm 11 months ago

Luke Kanies wrote:

the problem with domains is that there isn’t a consistent answer anywhere — sites do it differently all the time. I avoided considering DNS authoritative because my experience with cfengine taught me that most people do not maintain their DNS well enough for this to work.

i don’t think the inability of some people to maintain proper dns resolution should result in ridiculous hacks all over the place. instead people should be forced to fix their dns by proper implementation of dns resolution in facter.

please see http://github.com/hollow/facter/commit/92f6b049724a620f3833dde10b8126a70ce52a08

Updated by Mathias Gug 11 months ago

Relying on DNS to get the hostname and dns domain name is not the best option as some networks are not configured/maintained correctly.

Moreover you’d run into a bootstrapping problem: you’d have to get your dns domain up and running before being able to use puppet.

You’d also run into issues if there wasn’t any network connectivity. Or if you were on a different network (ex: laptop on customers networks – their dns domain name would change). Or if you have multiple IP addresses (which one should be authoritative?).

I’d suggest to use the hostname and dnsdomainame utilities as a first try:

@hostname@ returns the hostname

@dnsdomainname@ returns the dns domain name

@hostname -f@ returns the fqdn

The results of these commands are similar to what other components in the system will get to figure out the same information (they’re using the libc functions). If the hostname/dns domain name is not set correctly other components (ex: gnome or an MTA) would fail badly as well.

Updated by Benedikt Böhm 11 months ago

Mathias Gug wrote:

Relying on DNS to get the hostname and dns domain name is not the best option as some networks are not configured/maintained correctly.

as i said previously, this should not be facters problem. the inability of people to configure their dns/nss resolution is not an excuse for hacks all over the place!

Moreover you’d run into a bootstrapping problem: you’d have to get your dns domain up and running before being able to use puppet.

yes indeed, dns/nss resolution is the first thing you should setup before doing anything. it is not that hard to maintain one line in /etc/hosts, is it?

You’d also run into issues if there wasn’t any network connectivity. Or if you were on a different network (ex: laptop on customers networks – their dns domain name would change). Or if you have multiple IP addresses (which one should be authoritative?).

I’d suggest to use the hostname and dnsdomainame utilities as a first try:

that’s what my patches do. instead of calling binaries it uses the native socket api though. see http://github.com/hollow/facter/blob/master/lib/facter/fqdn.rb for my version. it uses gethostbyname first (this does the same as hostname/dnsdomainname, i.e. it resolves the domain via /etc/hosts), if that does not work it uses dns and the search domain in /etc/resolv.conf to get the fqdn.

Updated by Paul Nasrat 5 months ago

  • Status changed from Code Insufficient to Ready for Checkin

Updated by Paul Nasrat 5 months ago

  • Status changed from Ready for Checkin to Closed

Commited Ohad’s patch.

Also available in: Atom PDF