The Puppet Labs Issue Tracker has Moved: https://tickets.puppetlabs.com

Bug #4415

puppetd ignores resources in the catalog if the type code cannot be found

Added by Dan Bode over 3 years ago. Updated over 1 year ago.

Status:Needs DecisionStart date:07/30/2010
Priority:NormalDue date:
Assignee:eric sorenson% Done:

0%

Category:-
Target version:3.x
Affected Puppet version: Branch:
Keywords:pluginsync

We've Moved!

Ticket tracking is now hosted in JIRA: https://tickets.puppetlabs.com

This ticket may be automatically exported to the PUP project on JIRA using the button below:


Description

site.pp

node puppetclient {
  vcsrepo { "/usr/local/dev/repo":
    ensure => present,
    provider => git
  }
  notify { 'test123':
    require => Vcsrepo['/usr/local/dev/repo']
  }
}

if I dont sync vcsrepo and I remove the type source code for notify.rb, then puppet doesn’t fail and just ignores them

debug: /Stage[main]//Node[puppetclient]/Notify[test123]/require: requires Vcsrepo[/usr/local/dev/repo]
info: Applying configuration version '1280516948'
debug: Finishing transaction -607606538
debug: Storing state
debug: Stored state in 0.02 seconds
notice: Finished catalog run in 0.04 seconds
debug: Using cached certificate for ca
debug: Using cached certificate for puppetclient
debug: Using cached certificate_revocation_list for ca
debug: Value of 'preferred_serialization_format' (pson) is invalid for report, using default (b64_zlib_yaml)
debug: report supports formats: b64_zlib_yaml marshal raw yaml; using b64_zlib_yaml

4415-ruby-types-and-source-code.tar.gz (734 Bytes) Chris Price, 05/21/2012 11:29 am

History

#1 Updated by Jesse Wolfe over 3 years ago

  • Status changed from Unreviewed to Investigating
  • Assignee set to Jesse Wolfe
  • Target version changed from 4 to 2.6.1

#2 Updated by Jesse Wolfe over 3 years ago

  • Status changed from Investigating to Needs Decision
  • Target version changed from 2.6.1 to 52

If a type exists on the master but does not exist on the agent, we skip it, rather than raising an error. This behavior is present in the 0.25.x series, so it is not a regression.

#3 Updated by James Turnbull about 3 years ago

  • Assignee changed from Jesse Wolfe to Nigel Kersten
  • Affected Puppet version deleted (2.6.1rc1)

#4 Updated by James Turnbull about 3 years ago

  • Target version deleted (52)

#5 Updated by Nigel Kersten over 2 years ago

  • Assignee changed from Nigel Kersten to Jacob Helwig
  • Keywords set to pluginsync

Jacob, this is part of the cluster of issues around pluginsync that you guys are hoping to get some work done on I believe?

#6 Updated by Jacob Helwig over 2 years ago

  • Status changed from Needs Decision to Accepted
  • Assignee deleted (Jacob Helwig)
  • Target version set to 3.x

This wasn’t specifically targeted, but failing the resource with an appropriate message (something along the lines of “unknown type” and making it clear that it’s an agent-side failure) seems perfectly reasonable to me.

#7 Updated by Nigel Kersten over 2 years ago

+1

#8 Updated by Ben Hughes about 2 years ago

  • Description updated (diff)
  • Status changed from Accepted to Unreviewed

#9 Updated by Nigel Kersten about 2 years ago

  • Description updated (diff)
  • Status changed from Unreviewed to Accepted

#10 Updated by Dan Bode almost 2 years ago

  • Assignee set to Chris Price

Hi Chris,

As requested, instead of just learning to live with some of these long standing known issues that really need to be fixed, I will bring them to your attention when I run across them. This is the first in that series :)

#11 Updated by Chris Price almost 2 years ago

Thanks, Dan!

#12 Updated by Chris Price almost 2 years ago

Dan, can you help me clarify this just a bit?

  1. Can I repro this with ‘apply’? Or does it require a master/agent setup? I’m presuming the latter, because I’m guessing that the resource could never make it into the catalog unless the type existed on the master?
  2. I presume your example doesn’t have anything to do with the ‘notify’ type in particular, right? If I were to define a custom type directly in the main source code directories of puppet on the master, and then use that in a manifest, that would trigger the behavior that you are describing w/o me needing to delete any files on the agent?
  3. What kind of real-world scenarios are likely to trigger this, now that Telly has improved (and automatically enabled by default) pluginsync? Is it something that would only occur if the user had explicitly disabled pluginsync?

#13 Updated by Dan Bode almost 2 years ago

  1. You are right about Puppet apply. Failure to find resources during compilation is a failure.
  2. The example applies to all native types. I chose to show-case the notify type to show that it was not just specific to native types
  3. I dont know enough about changes to plugin sync to now how it will improve this. This error usually occurs (or doesnt occur) when a user does not pluginsync.

Setting it to the default means this would only show up if the user is able to disable pluginsync, or if they tried to apply a catalog that had been separately compiled from an agent. So I think the answer is it still matters.

#14 Updated by Dan Bode almost 2 years ago

Dan Bode wrote:

  1. You are right about Puppet apply. Failure to find resources during compilation is a failure.
  2. The example applies to all native types. I chose to show-case the notify type to show that it was not just specific to custom types
  3. I dont know enough about changes to plugin sync to now how it will improve this. This error usually occurs (or doesnt occur) when a user does not pluginsync.

Setting it to the default means this would only show up if the user is able to disable pluginsync, or if they tried to apply a catalog that had been separately compiled from an agent. So I think the answer is it still matters.

#15 Updated by Chris Price almost 2 years ago

Awesome, that helps, thanks.

#16 Updated by Chris Price almost 2 years ago

I now have a simple repro scenario for this.

Repro Steps

  • Untar the attached tarball; it contains the following:
    • A simple ruby-defined type (which is just a copy of our Notify type)
    • A manifest that contains a manifest-defined type, and then declares one instance of the manifest-defined type and one instance of the ruby-defined type.
  • In a shell, start up a master with a command like this:

    puppet master —autosign=true —no-daemonize —debug —verbose —confdir=/home/test/master/conf —vardir=/home/test/master/var —certname localhost —libdir=/home/test/scratch/4415-ruby-types-and-source-code/lib —manifest=/home/cprice/work/puppet/scratch/4415-ruby-types-and-source-code/4415-ruby-types-and-source-code.pp

(The key points above are to make sure that —libdir points to the “lib” dir from the tarball, and that —manifest points to the .pp file from the tarball.)

  • In another shell, run an agent with a command like this:

    puppet agent —no-daemonize —debug —trace —verbose —confdir=/home/test/client/conf —vardir=/home/test/client/var —server localhost —onetime

What will happen

The agent run will complete successfully, and it will print the notification message from the manifest-defined type, but not the one from the ruby-defined type.

For grins, you can run the same manifest through apply via something like this:

puppet apply ./4415-ruby-types-and-source-code.pp --confdir=/home/test/client/conf --vardir=/home/test/client/var --no-pluginsync --libdir=/home/test/scratch/4415-ruby-types-and-source-code/lib

And you’ll see both notifications print out.

The issue is that when running via the master/agent mode, the ruby-defined type is not being synced down to the agent so it doesn’t exist there, but it doesn’t cause a failure.

What should have happened

The agent should have failed when it attempted to load the ruby code for the ruby-defined type and it was not able to do so.

More notes on why to follow.

#17 Updated by Chris Price almost 2 years ago

Following up on the comments above; here’s what’s actually happening here.

  • The catalog is compiled successfully on the master because it has access to the ruby code for the ruby-defined type. For the manifest-defined type, it adds the nested notify resource directly into the catalog as well.

  • The master then sends the catalog down to the agent. At this point, if you look at the contents of the catalog, it appears that it is impossible to distinguish that there was a difference in origin for the manifest-defined type and the ruby-defined type. They are identifiable as instances of custom types, but the objects used to represent them in memory have the same properties available for both instances, and there is no way that I can see to programmatically determine that there is a difference between them.

  • The agent then attempts to process the catalog. When it gets to the resource in the catalog that represents the ruby-defined type, it ends up hitting this block of code:

    lib/puppet/resource.rb:278

# Convert our resource to a RAL resource instance.  Creates component
# instances for resource types that don't exist.
def to_ral
  if typeklass = Puppet::Type.type(self.type)
    return typeklass.new(self)
  else
    return Puppet::Type::Component.new(self)
  end
end

What this code is doing is to attempt to load a ruby class for the type name, and, if one doesn’t exist, just default to “Component”—which is the correct (for now) class to use for a manifest-defined type. Given that there is no information available in the catalog to use to distinguish between the two, I’m not sure that there is a better way that this code could be written.

So, unfortunately, it feels to me like the real problem is that we need to add some more metadata to the catalog that would allow us to make this kind of distinction; then, we could change the implementation of the method above such that it would fail (correctly) when it knew that there should be a ruby implementation available for a type and it wasn’t able to load one.

#18 Updated by Chris Price almost 2 years ago

The severity of this problem should be lessened a bit because of the fact that pluginsync is now enabled by default in Telly; it feels like the most common occurrences of this problem would have been because someone forgot to pluginsync some new code. However, it is still possible to trigger this in a few different ways, and it should be a goal for us to improve the code cited above regardless.

We need a broader ticket related to improving catalog versioning and expandability, and we should link that to this when we have one.

In the meantime, one possible hack that I can think of is to experiment with using a fake metaparameter to carry this data for the time being. It would be an ugly hack, but it might work—I’ll try to at least investigate the feasibility of that.

#19 Updated by Chris Price almost 2 years ago

  • Status changed from Accepted to Needs Decision
  • Assignee changed from Chris Price to Daniel Pittman

After a quick glance, it does look like it might be possible to solve this for now with a metaparameter; they do seem intact in the catalog across the wire. So, the idea would be:

  1. Master parses manifest, recognizes that it’s loading a ruby-defined type, and stamps a metaparameter into the catalog with a descriptive name (e.g. “:type-loaded-from-ruby => true”);
  2. Agent receives catalog, tries to load type class; if it isn’t able to load it, check for this metaparameter and fail accordingly… otherwise fall back to using Component like it does now.

This seems a bit hacky but I’m not 100% sure what our longer term plans are for extending the catalog data model… depending on the direction there, this might or might not be unacceptably hacky. I’m going to put this in front of Daniel for an opinion.

#20 Updated by Daniel Pittman almost 2 years ago

Chris Price wrote:

After a quick glance, it does look like it might be possible to solve this for now with a metaparameter; they do seem intact in the catalog across the wire.

This sort of data packing has had a long tail of trouble behind it in the past. At the very least we lose a metaparameter, and we have to deal with the question of how that is handled in the rest of the ecosystem. PuppetDB, compliance, the dashboard, and everything else would need to know that was not user accessible. We would have to introduce code for “not user settable” metaparameters. Hilarity ensues.

This seems a bit hacky but I’m not 100% sure what our longer term plans are for extending the catalog data model… depending on the direction there, this might or might not be unacceptably hacky. I’m going to put this in front of Daniel for an opinion.

Regardless, I think it is unacceptable hacky, because of the long tail problems in the rest of the ecosystem.

We know that we need to change the catalog format to fix a number of issues, and are working toward that. Until that is in place this should remain open – there isn’t a safe half-way house to use.

#21 Updated by Chris Price almost 2 years ago

I generally agree with this assessment… however, I suspect that our solution for making the catalog more generically extensible is going to be to add yet another bucket that we can dump metadata into. So we’ll have “parameters”, (the not-so-fantastically-named) “meta-parameters”, and then something like “system-parameters” or “puppet-parameters” or “internal-parameters”… this new addition being basically Just Another Hash that we can use to contain key/value pairs that allow us to more intelligently distinguish between and handle various resources.

When it comes down to it, it seems like the implementation will be so similar to the existing metaparameters that it is a bit of a shame to have to leave a fairly hideous bug like this open for a long period of time waiting for that to become available.

That said, I would probably make the same call (avoiding hackery). Just wanted to get the option documented so that the cost/benefit can be weighed against the longevity of bugs like this in the event that we don’t get the catalog extensibility issue resolved soon.

#22 Updated by Ryan Coleman over 1 year ago

  • Assignee changed from Daniel Pittman to eric sorenson

Eric,

This impacts usability for my value stream. It’s mitigated with pluginsync on by default for Telly but is any chance I can get this on your radar for Telly & 2.7?

Also available in: Atom PDF