Bug #2010

puppetmaster fails to store config, trying to hash a param which is nil

Added by Peter Meier almost 3 years ago. Updated almost 2 years ago.

Status:Closed Start date:02/21/2009
Priority:Normal Due date:
Assignee:James Turnbull % Done:

0%

Category:-
Target version:0.24.8
Affected Puppet version:0.24.7 Branch:
Keywords:
Votes: 0

Description

running the master on the latest head of 0.24.x gives the following error for every client:

notice: Searched for resources in 0.25 seconds
/usr/lib/ruby/site_ruby/1.8/puppet/rails/host.rb:153:in `find_resources_parameters_tags'
/usr/lib/ruby/site_ruby/1.8/puppet/rails/host.rb:125:in `setresources'
/usr/lib/ruby/gems/1.8/gems/activesupport-2.2.2/lib/active_support/core_ext/benchmark.rb:8:in `realtime'
/usr/lib/ruby/site_ruby/1.8/puppet/rails/host.rb:124:in `setresources'
/usr/lib/ruby/site_ruby/1.8/puppet/rails/host.rb:52:in `store'
/usr/lib/ruby/gems/1.8/gems/activesupport-2.2.2/lib/active_support/core_ext/benchmark.rb:8:in `realtime'
/usr/lib/ruby/site_ruby/1.8/puppet/rails/host.rb:51:in `store'
/usr/lib/ruby/gems/1.8/gems/activerecord-2.2.2/lib/active_record/connection_adapters/abstract/database_statements.rb:66:in `transaction'
/usr/lib/ruby/gems/1.8/gems/activerecord-2.2.2/lib/active_record/transactions.rb:129:in `transaction'
/usr/lib/ruby/site_ruby/1.8/puppet/rails/host.rb:32:in `store'
/usr/lib/ruby/site_ruby/1.8/puppet/parser/compiler.rb:448:in `store_to_active_record'
/usr/lib/ruby/gems/1.8/gems/activerecord-2.2.2/lib/active_record/connection_adapters/abstract/database_statements.rb:66:in `transaction'
/usr/lib/ruby/gems/1.8/gems/activerecord-2.2.2/lib/active_record/transactions.rb:129:in `transaction'
/usr/lib/ruby/site_ruby/1.8/puppet/parser/compiler.rb:447:in `store_to_active_record'
/usr/lib/ruby/site_ruby/1.8/puppet/util.rb:180:in `benchmark'
/usr/lib/ruby/gems/1.8/gems/activesupport-2.2.2/lib/active_support/core_ext/benchmark.rb:8:in `realtime'
/usr/lib/ruby/site_ruby/1.8/puppet/util.rb:179:in `benchmark'
/usr/lib/ruby/site_ruby/1.8/puppet/parser/compiler.rb:446:in `store_to_active_record'
/usr/lib/ruby/site_ruby/1.8/puppet/parser/compiler.rb:439:in `store'
/usr/lib/ruby/site_ruby/1.8/puppet/parser/compiler.rb:99:in `compile'
/usr/lib/ruby/site_ruby/1.8/puppet/parser/interpreter.rb:29:in `compile'
/usr/lib/ruby/site_ruby/1.8/puppet/indirector/catalog/compiler.rb:68:in `compile'
/usr/lib/ruby/site_ruby/1.8/puppet/util.rb:180:in `benchmark'
/usr/lib/ruby/gems/1.8/gems/activesupport-2.2.2/lib/active_support/core_ext/benchmark.rb:8:in `realtime'
/usr/lib/ruby/site_ruby/1.8/puppet/util.rb:179:in `benchmark'
/usr/lib/ruby/site_ruby/1.8/puppet/indirector/catalog/compiler.rb:66:in `compile'
/usr/lib/ruby/site_ruby/1.8/puppet/indirector/catalog/compiler.rb:21:in `find'
/usr/lib/ruby/site_ruby/1.8/puppet/indirector/indirection.rb:210:in `find'
/usr/lib/ruby/site_ruby/1.8/puppet/indirector.rb:49:in `find'
/usr/lib/ruby/site_ruby/1.8/puppet/network/handler/master.rb:65:in `getconfig'
/usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `to_proc'
/usr/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call'
/usr/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service'
/usr/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:85:in `setup_processor'
/usr/lib/ruby/1.8/xmlrpc/server.rb:336:in `call'
/usr/lib/ruby/1.8/xmlrpc/server.rb:336:in `dispatch'
/usr/lib/ruby/1.8/xmlrpc/server.rb:323:in `each'
/usr/lib/ruby/1.8/xmlrpc/server.rb:323:in `dispatch'
/usr/lib/ruby/1.8/xmlrpc/server.rb:366:in `call_method'
/usr/lib/ruby/1.8/xmlrpc/server.rb:378:in `handle'
/usr/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process'
/usr/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/webrick_servlet.rb:68:in `service'
/usr/lib/ruby/1.8/webrick/httpserver.rb:104:in `service'
/usr/lib/ruby/1.8/webrick/httpserver.rb:65:in `run'
/usr/lib/ruby/1.8/webrick/server.rb:173:in `start_thread'
/usr/lib/ruby/1.8/webrick/server.rb:162:in `start'
/usr/lib/ruby/1.8/webrick/server.rb:162:in `start_thread'
/usr/lib/ruby/1.8/webrick/server.rb:95:in `start'
/usr/lib/ruby/1.8/webrick/server.rb:92:in `each'
/usr/lib/ruby/1.8/webrick/server.rb:92:in `start'
/usr/lib/ruby/1.8/webrick/server.rb:23:in `start'
/usr/lib/ruby/1.8/webrick/server.rb:82:in `start'
/usr/lib/ruby/site_ruby/1.8/puppet.rb:293:in `start'
/usr/lib/ruby/site_ruby/1.8/puppet.rb:144:in `newthread'
/usr/lib/ruby/site_ruby/1.8/puppet.rb:143:in `initialize'
/usr/lib/ruby/site_ruby/1.8/puppet.rb:143:in `new'
/usr/lib/ruby/site_ruby/1.8/puppet.rb:143:in `newthread'
/usr/lib/ruby/site_ruby/1.8/puppet.rb:291:in `start'
/usr/lib/ruby/site_ruby/1.8/puppet.rb:290:in `each'
/usr/lib/ruby/site_ruby/1.8/puppet.rb:290:in `start'
/usr/sbin/puppetmasterd:285
err: Could not store configs: undefined method `add_param_to_hash' for nil:NilClass
notice: Compiled catalog for puppet.foobar.com in 20.64 seconds

History

Updated by Andrew Shafer almost 3 years ago

resources[param[‘resource_id’]].add_param_to_hash(param) is on line 177 in the head of 0.24.x for me.

Can you do a git log and let me know what the latest commit hash is?

Obviously resources[param[‘resource_id’]] is returning nil, but why?

Updated by James Turnbull almost 3 years ago

  • Status changed from Unreviewed to Accepted
  • Assignee set to Andrew Shafer

Updated by Peter Meier almost 3 years ago

Hi

resources[param[‘resource_id’]].add_param_to_hash(param) is on line 177 in the head of 0.24.x for me.

Can you do a git log and let me know what the latest commit hash is?

It is for me as well on line 177 and I was also suprised that actually the stacktrace doesn’t give the exact location where the error happens and only shows the line number of the former method from where this have been called.

you still like to have the git log? I can give you it in a few hours.

Obviously resources[param[‘resource_id’]] is returning nil, but why?

I dunno either. Only some of the hosts had that problem and I tried to debug it furrther by printing out the resource_id. Unfortunately the stored data didn’t show anything interesting. But it seemed to get away if I deleted the record. However as this error fucked up the exported resources I had to drop the stored resources of these hosts to get my environment working properly again. So we somehow aren’t able to test it again, as the errors doesn’t occur anymore. :–/ but I still have a mysql dump with the data that provoked the problem, I could provide it to you.

Just an idea: I first didn’t had dbmigrate=true in my config after updating the master and the clients from 0.24.5 to 0.24.7 (and later to 0.24.x HEAD). However the server and clients were running a while with the environment problem (have been on the list by others). Maybe it’s connected to that? Only a some clients had the problem. And these who haven’t been updated to the latest version and still running 0.24.5 didn’t have the problem. But also not every client who have been updated had the problem. Quite strange.

PS: sorry to react that late. Due to the redmine problem I didn’t get any mails hence I didn’t see your question.

Updated by Peter Meier almost 3 years ago

I dunno either. Only some of the hosts had that problem and I tried to debug it furrther by printing out the resource_id. Unfortunately the stored data didn’t show anything interesting. But it seemed to get away if I deleted the record. However as this error fucked up the exported resources I had to drop the stored resources of these hosts to get my environment working properly again. So we somehow aren’t able to test it again, as the errors doesn’t occur anymore. :–/ but I still have a mysql dump with the data that provoked the problem, I could provide it to you.

actually this doesn’t seem to be fully correct I get the problem again at least on one host.

Updated by Peter Meier almost 3 years ago

Hi

Only a some clients had the problem. And these who haven’t been updated to the latest version and still running 0.24.5 didn’t have the problem. But also not every client who have been updated had the problem. Quite strange.

and it now even appears on clients which haven’t yet been updated.

so I would say it’s a more general issue.

Updated by Brice Figureau almost 3 years ago

immerda wrote:'

Hi

Only a some clients had the problem. And these who haven’t been updated to the latest version and still running 0.24.5 didn’t have the problem. But also not every client who have been updated had the problem. Quite strange.

and it now even appears on clients which haven’t yet been updated.

so I would say it’s a more general issue.

The issue is strange, it can be:

  • either a parameter that doesn’t have a resource_id (why?) or
  • a parameter which references a resource we don’t know for this run

The request we use for finding parameters is the following one:

SELECT v.id, v.value,  v.line, v.resource_id, v.param_name_id, n.name FROM param_values as v INNER JOIN resources r ON v.resource_id=r.id INNER JOIN param_names as n ON v.param_name_id=n.id WHERE r.host_id=

(replace HOSTID with the database id of the failing host).

Since there is an inner join with the resource id we shouldn’t be able to return a parameter which doesn’t have a resource id (except if your database is somewhat corrupted and you have some NULLs resources id). Another possibility is that there is a discrepency between the resources returned directly from rails and what is returned from the direct SQL request (ie there is caching in the middle here). To see if caching is an issue, if you restart your master, you shouldn’t see the failure for the previously failing host, until some changes happen.

A temporary solution to prevent the crash can be to replace both failing functions in host.rb with the following code:

    def find_resources_parameters(resources)
        params = Puppet::Rails::ParamValue.find_all_params_from_host(self)

        # assign each loaded parameters/tags to the resource it belongs to
        params.each do |param|
            resources[param['resource_id']].add_param_to_hash(param) if resources.include?(param['resource_id'])
        end
    end

    def find_resources_tags(resources)
        tags = Puppet::Rails::ResourceTag.find_all_tags_from_host(self)

        tags.each do |tag|
            resources[tag['resource_id']].add_tag_to_hash(tag) if resources.include?(tag['resource_id'])
        end
    end

That won’t explain the issue, but at least you won’t have a failing storeconfig.

Updated by Brice Figureau almost 3 years ago

immerda wrote:'

Hi

Only a some clients had the problem. And these who haven’t been updated to the latest version and still running 0.24.5 didn’t have the problem. But also not every client who have been updated had the problem. Quite strange.

and it now even appears on clients which haven’t yet been updated.

so I would say it’s a more general issue.

Thanks to all the debug information you provided, we found a potential issue. The current storedconfig database for a given host has many resources for the same type and title, for instance (edited to protect the innocents):

mysql> select * from resources where title = 'title' and host_id=5;
+---------+-----------------------------------+------------------+---------+----------------+----------+------+---------------------+------------+
| id      | title                             | restype          | host_id | source_file_id | exported | line | updated_at          | created_at |
+---------+-----------------------------------+------------------+---------+----------------+----------+------+---------------------+------------+
| 3599582 | title                             | Type::Subtype    |       5 |             11 |     NULL |  138 | 2008-08-31 15:25:52 | NULL       | 
| 3603115 | title                             | Type::Subtype    |       5 |             11 |     NULL |  138 | 2008-08-31 16:03:35 | NULL       | 
| 3652537 | title                             | Type::Subtype    |       5 |             11 |     NULL |  138 | 2008-08-31 22:48:22 | NULL       | 
| 3714202 | title                             | Type::Subtype    |       5 |             11 |     NULL |  138 | 2008-09-01 06:18:21 | NULL       | 
| 3719014 | title                             | Type::Subtype    |       5 |             11 |     NULL |  138 | 2008-09-01 07:06:36 | NULL       | 
| 3783948 | title                             | Type::Subtype    |       5 |             11 |     NULL |  138 | 2008-09-01 21:42:25 | NULL       | 
+---------+-----------------------------------+------------------+---------+----------------+----------+------+---------------------+------------+

Unfortunately the code (and the old code was doing the same too), is doing:

        resources.find(:all, :include => :source_file).inject({}) do | hash, resource |
            hash[resource.ref] = resource
            hash
        end

which will collapse the resources of same ref into one hash entry. Later when we get the parameters, we get all the current resources, but unfortunately we compare with the the hash above, and since we lost some information, then we crash. The immediate fix is the one I pasted in my previous comment. I’ll also change my code to not use the “collapsed” resources for finding parameters, but use the whole returned set of resources, this will definitely fix the issue.

Now, the real problem to me is why the database allows the same resource to be written several time? It shouldn’t have been possible, so I’m enclined to think the storedconfig got logically corrupted after several change (code or data).

Updated by Peter Meier almost 3 years ago

clearing the failed hosts seems to definitely fix something.

however it still seems that the problem might reoccur from time to time on different hosts.

I’ll test your suggested fix when I’ll encounter the problem again. However I think I already tried it on my own in the same way and I got a not so nice sideeffect. What I tried:

    def find_resources_parameters(resources)
        params = Puppet::Rails::ParamValue.find_all_params_from_host(self)

        # assign each loaded parameters/tags to the resource it belongs to
        params.each do |param|
            resources[param['resource_id']].add_param_to_hash(param) if resources[param['resource_id']]
        end
    end

However what I got then, was this the second peak round: https://durito.cronopios.org/mysql_queries-day_storeconfigs_speedup_v2.png, which is why I reverted it. I just saw it as a quick and dirty fix and I had no idea what the actual problem is, nor what I’m actually doing.

thanks for helping out.

Updated by Brice Figureau almost 3 years ago

immerda wrote:'

clearing the failed hosts seems to definitely fix something.

however it still seems that the problem might reoccur from time to time on different hosts.

I’m not so sure. There’s no reason to have more than one resource of the same ref. Puppet can’t generate this, but since the schema doesn’t enforce it, on some error condition and if your database engine doesn’t support transaction it might be possible to have several resource of the same ref for the same host (on multiple run). My patch mentioned on the puppet-dev list should clear the extraneous resources.

I’ll test your suggested fix when I’ll encounter the problem again. However I think I already tried it on my own in the same way and I got a not so nice sideeffect. What I tried:

[…]

However what I got then, was this the second peak round: https://durito.cronopios.org/mysql_queries-day_storeconfigs_speedup_v2.png, which is why I reverted it. I just saw it as a quick and dirty fix and I had no idea what the actual problem is, nor what I’m actually doing.

The only way to achieve the peaks (notice those are mostly Inserts) is that you might have reversed the logic and never calls add_param_to_hash. In this case the system thinks there is no parameters so it first deletes every parameters and on the next run, there will be tons of inserts.

Please test with the following fix if you can (it worked fine on my dev machine): http://github.com/masterzen/puppet/tree/tickets/0.24.x/2010

Updated by Peter Meier almost 3 years ago

However what I got then, was this the second peak round: https://durito.cronopios.org/mysql_queries-day_storeconfigs_speedup_v2.png, which is why I reverted it. I just saw it as a quick and dirty fix and I had no idea what the actual problem is, nor what I’m actually doing.

The only way to achieve the peaks (notice those are mostly Inserts) is that you might have reversed the logic and never calls add_param_to_hash. In this case the system thinks there is no parameters so it first deletes every parameters and on the next run, there will be tons of inserts.

bleh, actually this could be possible and as I got now more into it I think your explanation makes sense

Please test with the following fix if you can (it worked fine on my dev machine): http://github.com/masterzen/puppet/tree/tickets/0.24.x/2010

I’m now running on that one.

Updated by Peter Meier almost 3 years ago

Please test with the following fix if you can (it worked fine on my dev machine): http://github.com/masterzen/puppet/tree/tickets/0.24.x/2010

I’m now running on that one.

I’m now running for a while on that patch and I can confirm that it works fine. Before the error had reappeared after a while. Not so since applying the patch.

Updated by micah - almost 3 years ago

i’ve just applied this patch and will report back how it goes here.

Updated by Luke Kanies almost 3 years ago

  • Status changed from Accepted to In Topic Branch Pending Review
  • Assignee changed from Andrew Shafer to Brice Figureau

Seems like this is just about ready; no problems reported so far.

Updated by micah - almost 3 years ago

My testing has resulted in these errors going away, so it seems like good to me!

Updated by Brice Figureau almost 3 years ago

  • Status changed from In Topic Branch Pending Review to Ready For Checkin
  • Assignee changed from Brice Figureau to James Turnbull

James, so far people report only success, can you please merge to 0.24.x so that it is queued for 0.24.8: http://github.com/masterzen/puppet/tree/tickets/0.24.x/2010

Thanks, If you need a rebase for master, let me know.

Updated by James Turnbull almost 3 years ago

  • Status changed from Ready For Checkin to Closed

Pushed in commit:c55ac3f2c2335de0beacd2cb3396b550c8f1402f in branch 0.24.x

Also available in: Atom PDF