Bug #2617

Problem with certs upgrading puppetmaster to 0.25.0

Added by David Escala over 2 years ago. Updated almost 2 years ago.

Status:Closed Start date:09/09/2009
Priority:High Due date:
Assignee:Luke Kanies % Done:

100%

Category:SSL
Target version:2.6.0
Affected Puppet version:0.25.0 Branch:tickets/0.25.x/2617
Keywords:hotfix in 0.25.1
Votes: 0

Description

Upgraded master to 0.25.0 and 0.24.x clients works fine.

Sorry I am not sure how to reproduce this and the fuzzy subject of the bug. I am opening this issue anyway …

This only shows up when the IP address of the client has no reverse DNS, and happens with webrick server as well as mongrels.

Master log messages:

err: Could not resolve 192.168.13.47: no name for 192.168.13.47
info: Expiring the node cache of test.escala.local
warning: Host is missing hostname and/or domain: 192.168.13.47
warning: Host is missing hostname and/or domain: 192.168.13.47
err: Could not find default node or by name with '192.168.13.47, 192.168.13, 192.168, 192' on node 192.168.13.47
err: Could not find default node or by name with '192.168.13.47, 192.168.13, 192.168, 192' on node 192.168.13.47

Client log messages:

warning: peer certificate won't be verified in this SSL session
err: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find node '192.168.13.47'; cannot compile

Debugging shows that request in Puppet::Network::HTTP::MongrelREST or Puppet::Network::HTTP::WEBrickREST does not have the client cert info, forcing the reverse DNS lookup (it calls resolve_node in client_information).

A workaround is to remove vardir on the server, i.e. this bug does not affect a fresh 0.25.0 puppetmaster install.


Related issues

related to Puppet - Bug #1507: CA cert name configuration Closed 08/12/2008
related to Puppet - Bug #899: CRL signature failure when using apache/mongrel Closed
related to Puppet - Bug #2765: puppetrun --no-fqdn configuration option is effectively a... Closed 10/31/2009
duplicated by Puppet - Bug #2619: Fresh 0.25.0 client cannot 'authenticate' to 0.25.0 puppe... Closed 09/09/2009

History

Updated by Markus Roberts over 2 years ago

  • Category set to SSL
  • Status changed from Unreviewed to Accepted
  • Priority changed from Normal to Low
  • Target version set to 0.25.1

Low because there’s a workaround, but accepted because it shouldn’t happen in the first place, or should at least be handled more gracefully.

Updated by David Escala over 2 years ago

A more specific workaround is to remove ssldir. BTW: when you remove ssldir you loss your CA and signed certs, so it is not suitable for a production environment.

I’ve managed to have a working ssldir and one with this issue. These are the differences:

$ diff -rq  ssl-with-issue/ ssl-ok/
Files ssl-with-issue/ca/ca_crl.pem and ssl-ok/ca/ca_crl.pem differ
Files ssl-with-issue/ca/ca_crt.pem and ssl-ok/ca/ca_crt.pem differ
Files ssl-with-issue/ca/ca_key.pem and ssl-ok/ca/ca_key.pem differ
Only in ssl-ok/ca: ca_pub.pem
Files ssl-with-issue/ca/inventory.txt and ssl-ok/ca/inventory.txt differ
Files ssl-with-issue/ca/private/ca.pass and ssl-ok/ca/private/ca.pass differ
Files ssl-with-issue/ca/serial and ssl-ok/ca/serial differ
Only in ssl-with-issue/ca/signed:

... all our signed certs ...

Files ssl-with-issue/certs/ca.pem and ssl-ok/certs/ca.pem differ
Files ssl-with-issue/certs/one.ingent.net.pem and ssl-ok/certs/one.ingent.net.pem differ
Files ssl-with-issue/private_keys/one.ingent.net.pem and ssl-ok/private_keys/one.ingent.net.pem differ
Files ssl-with-issue/public_keys/one.ingent.net.pem and ssl-ok/public_keys/one.ingent.net.pem differ

So the key may be this ca_pub.pem that only gets created by 0.25.0 for a new CA … maybe we should migrate old ones?

Updated by Luke Kanies over 2 years ago

  • Priority changed from Low to High

The workaround is too disruptive.

Updated by Brice Figureau over 2 years ago

David Escala wrote:

A more specific workaround is to remove ssldir. BTW: when you remove ssldir you loss your CA and signed certs, so it is not suitable for a production environment.

I’ve managed to have a working ssldir and one with this issue. These are the differences:

[…]

So the key may be this ca_pub.pem that only gets created by 0.25.0 for a new CA … maybe we should migrate old ones?

I think you might have nailed down the issue.

I was able to somewhat reproduce #2619, where I found that the client is missing the ca.pem file, thus it never even tries to SSL authenticate to the server which does its job (ie denying access). What happens when a 0.25 new client connects for the first time to a newly upgraded 0.25 master, it should normally store the ca cert and generate a csr that should be sent to the server. It seems to me the ca cert is never created, and then later on (see Puppet::Network::HttpPool.cert_setup) it never adds the SSL info to the http client code.

Updated by Brice Figureau over 2 years ago

Brice Figureau wrote:

David Escala wrote:

A more specific workaround is to remove ssldir. BTW: when you remove ssldir you loss your CA and signed certs, so it is not suitable for a production environment.

I’ve managed to have a working ssldir and one with this issue. These are the differences:

[…]

So the key may be this ca_pub.pem that only gets created by 0.25.0 for a new CA … maybe we should migrate old ones?

I think you might have nailed down the issue.

I was able to somewhat reproduce #2619, where I found that the client is missing the ca.pem file, thus it never even tries to SSL authenticate to the server which does its job (ie denying access). What happens when a 0.25 new client connects for the first time to a newly upgraded 0.25 master, it should normally store the ca cert and generate a csr that should be sent to the server. It seems to me the ca cert is never created, and then later on (see Puppet::Network::HttpPool.cert_setup) it never adds the SSL info to the http client code.

OK, I found the issue (see #2619 for how to reproduce it).

In 0.24.x, the $ssldir/ca/ca_crt.pem file has the following Subject DN: CN=$fqdn In 0.25 the same file has the subject DN: CN=ca

All of the 0.25 SSL::Host code assume that the ca certificate has the subject DN: CN=ca. So basically everytime we deal with a certificate we know it is a ca either because its CN is ca or by its filename (which matches the CN).

When we register a new 0.25.x client to an upgraded master, it first ‘finds’ the CA through rest to the master and locally cache it as an SslFile. But since this ca cert is not named exactly ‘ca’, it is not recognized as a ca later when the client wants to connect back to the master, hence the issue.

Now, I really don’t know how to fix this…

Updated by Markus Roberts over 2 years ago

  • Status changed from Accepted to Needs Decision
  • Assignee set to Luke Kanies

It appears to me that we either need to change the assumption or document it (and how to deal with it).

Luke?

Updated by Luke Kanies over 2 years ago

  • Status changed from Needs Decision to Accepted
  • Assignee changed from Luke Kanies to Markus Roberts

This is a bug, but it’s a bit of a doozie.

We need to fix the CA cert so its subject is the fqdn again, but in doing so we have to fix the cert recognition so that we can tell if a cert is a CA cert. This should just be a question of actually asking the stupid cert if it’s capable of signing certs.

Updated by Brice Figureau over 2 years ago

Luke Kanies wrote:

This is a bug, but it’s a bit of a doozie.

We need to fix the CA cert so its subject is the fqdn again, but in doing so we have to fix the cert recognition so that we can tell if a cert is a CA cert. This should just be a question of actually asking the stupid cert if it’s capable of signing certs.

Yes good idea. I focused on the name of the not on the cert itself :–) Testing the cert has the critical extension basicConstraint containing CA:true should be enough:

def ca?(cert)
    cert.extensions.each do |ext|
        return true if ext.oid == "basicConstraints" and ext.value == 'CA:TRUE' and ext.critical?
    end
    false
end

This doesn’t prove the cert is the right CA, but in our context, it should work…

Updated by Luke Kanies over 2 years ago

Brice Figureau wrote:

[…] This doesn’t prove the cert is the right CA, but in our context, it should work…

Yep, that should essentially work.

Updated by Luke Kanies over 2 years ago

  • Status changed from Accepted to In Topic Branch Pending Review
  • Assignee changed from Markus Roberts to James Turnbull
  • Branch set to tickets/master/2617

Updated by Luke Kanies over 2 years ago

  • Status changed from In Topic Branch Pending Review to Accepted
  • Assignee changed from James Turnbull to Luke Kanies

Looks like this needs a bit more thinking.

Updated by Luke Kanies over 2 years ago

As discussed in the list, I’m going to produce two patches: One that has a minimal change and is for 0.25.1, and one that has a larger change and is for 0.26. The basic fix is this:

diff --git a/lib/puppet/indirector/rest.rb b/lib/puppet/indirector/rest.rb
index e1ee89f..ec3f70f 100644
--- a/lib/puppet/indirector/rest.rb
+++ b/lib/puppet/indirector/rest.rb
@@ -66,7 +66,11 @@ class Puppet::Indirector::REST < Puppet::Indirector::Terminus
     end
 
     def find(request)
-        deserialize network(request).get(indirection2uri(request), headers)
+        return nil unless result = deserialize(network(request).get(indirection2uri(request), headers))
+        unless result.name == request.key
+            result.name = request.key
+        end
+        result
     end
 
     def search(request)

Updated by Luke Kanies over 2 years ago

  • Branch changed from tickets/master/2617 to tickets/0.25.x/2617

I’m publishing a limited fix for 0.25.1 at tickets/0.25.x/2617 and a more complete fix at tickets/master/2617.

Updated by Luke Kanies over 2 years ago

  • Status changed from Accepted to In Topic Branch Pending Review

Updated by James Turnbull over 2 years ago

  • Target version changed from 0.25.1 to 2.6.0

0.25.1 fix pushed in commit:a1d3b04296babc42b6a00956508c86c18e2b39bc in branch 0.25.x. Ticket re-assigned to 0.26.0 for larger fix.

Updated by James Turnbull over 2 years ago

  • Keywords set to hotfix in 0.25.1

Updated by Luke Kanies over 2 years ago

  • Status changed from In Topic Branch Pending Review to Closed
  • % Done changed from 0 to 100

Applied in changeset commit:a1d3b04296babc42b6a00956508c86c18e2b39bc.

Updated by James Turnbull over 2 years ago

  • Status changed from Closed to Accepted

Updated by James Turnbull about 2 years ago

  • Status changed from Accepted to Code Insufficient

Can you rebase the master branch please.

Updated by James Turnbull about 2 years ago

  • Status changed from Code Insufficient to Closed

Pushed full fix in commit:089ac3e37dd1418751bc4dfe152e09fbacbc5122 in branch master.

Also available in: Atom PDF