The Puppet Labs Issue Tracker has Moved: https://tickets.puppetlabs.com

Bug #4694

Storeconfig issue with AR/MySQL

Added by Andrew Forgue about 4 years ago. Updated over 1 year ago.

Status:DuplicateStart date:09/02/2010
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:stored configuration
Target version:-
Affected Puppet version:0.25.5 Branch:
Keywords:stored configs customer

We've Moved!

Ticket tracking is now hosted in JIRA: https://tickets.puppetlabs.com

This issue is currently not available for export. If you are experiencing the issue described below, please file a new ticket in JIRA. Once a new ticket has been created, please add a link to it that points back to this Redmine ticket.


Description

We’re having an issue with storeconfigs generating random MySQL disconnect errors. This is not related to #2880 or #2816. Though I suspect that the underlying cause may be the same.

First:

 err: Mysql::Error: MySQL server has gone away: INSERT INTO `fact_values` (...)

Second:

err: Puppet::Parser::Compiler failed with error ActiveRecord::StatementInvalid: Mysql::Error: Lost connection to MySQL server during query: SELECT * FROM `param_values` WHERE (`param_values`.resource_id = 1136908 AND (param_name_id = 133)) on node xxx.

Third:

err: Mysql::Error: Lost connection to MySQL server during query: SELECT t.id, t.resource_id, p.name FROM resource_tags as t INNER JOIN resources r ON t.resource_id=r.id INNER JOIN puppet_tags as p ON t.puppet_tag_id=p.id WHERE r.host_id=531

So they’re coming from all over the place. I’ve verified that in the code we’re using, reconnect is set to true but it’s not reconnecting. Here’s the versions:

  • Puppet 0.25.5
  • Rails 2.3.8
  • Mongrel 1.1.5
  • OS X Leopard (server)
  • MySQL gem 2.8.1
  • MySQL client library 5.0.82
  • MySQL server 5.1.48
  • Ruby 1.8.7-p22

Here’s a summary of a single incident — puppet server (puppet) to SQL server conversation:

Puppet -> SQL: SYN
Puppet <- SQL: SYN+ACK
Puppet -> SQL: ACK

So the TCP connection is open,

  1. MySQL Login successful
  2. SET NAMES
  3. SELECT * FROM HOSTS where hosts.name=‘hostname’
  4. SELECT Facts for host
  5. executes a bunch of queries for facts
  6. Runs SELECT param_names.* FROM param_names INNER JOIN param_values ON param_names.id = param_values.param_name_id WHERE ((param_values.resource_id = 943831))
  7. Server returns results and then the client issues a quit (!)?
  8. Socket is shutdown, client sends FIN+ACK, server responds with FIN+ACK
  9. Server sends RST to client twice.

I’ve been working on this for a day or 2 and I have no idea what could be causing this. I don’t see any pattern.


Related issues

Related to Puppet - Bug #2880: Fix for mysql 'server has gone away' under passenger Duplicate 11/30/2009
Related to Puppet - Bug #2816: Storeconfigs - MySQL server has gone away Closed 11/13/2009
Related to Puppet - Bug #4632: async_storedconfigs not exporting resources after days of... Needs More Information 08/26/2010
Related to Puppet - Bug #4517: async_storeconfigs + activemq queueing support causes pup... Needs More Information 08/11/2010
Duplicated by Puppet - Bug #5067: Mysql::Error: MySQL server has gone away Duplicate 10/21/2010
Duplicates Puppet - Bug #3581: tagmail + err: PGError: server closed the connection unex... Closed 04/20/2010

History

#1 Updated by Hector Rivas about 4 years ago

The same problem for me.

In my case my installation is:

  • 7 nodes: 1 debian lenny, 1 suse 11.1 x86_64, 1 suse 11.0 s390x (ZLinux), 2 AIX 6.1 ML3
  • Puppet 2.6.1rc3
  • facter 1.5.7
  • mongrel 1.1.5-5 (apache+mongrel) (debian package)
  • libmysql-ruby1.8 2.7.4-1 (debian package)
  • mysql-server 5.0.51a-24+lenny4

  • Ruby:

    • puppetmaster: ruby 1.8.7.72-3lenny1 (debian package)
    • AIX nodes: ruby-1.8.7-p302-aix61
    • Suse (zLinux & x86): ruby-1.8.7.p72-5.22.1

But I only have the error on the AIX and Suse ZLinux nodes, not on Linux x86 (debian or suse) (???)

Some errors that I am getting: * More often: “SERVER: Mysql::Error: MySQL server has gone away: SELECT * FROM hosts WHERE (hosts.name = ‘hostname’) LIMIT 1” * SERVER: Mysql::Error: MySQL server has gone away: INSERT INTO fact_values (created_at, updated_at, value, fact_name_id, host_id) VALUES(‘2010-09-03 12:36:35’, ‘2010-09-03 12:36:35’, ‘0’, 1, 4) * SERVER: Mysql::Error: MySQL server has gone away: INSERT INTO param_values (created_at, line, resource_id, updated_at, value, param_name_id) VALUES(‘2010-09-03 12:34:08’, 75, 1197, ‘2010-09-03 12:34:08’, ‘4’, 47) * SERVER: Mysql::Error: MySQL server has gone away: DELETE FROM param_values WHERE (id IN (‘7401’,‘7402’)) * SERVER: Mysql::Error: MySQL server has gone away: SELECT t.id, t.resource_id, p.name FROM resource_tags t INNER JOIN resources r ON t.resource_id=r.id INNER JOIN puppet_tags p ON t.puppet_tag_id=p.id WHERE r.host_id=4

#2 Updated by Andrew Forgue about 4 years ago

It seems it’s related to Mongrel. I installed passenger 2.2.15 with the rest of the same configs and that error doesn’t seem to happen anymore. So is it a problem with mongrel itself or the way puppet uses it?

#3 Updated by Markus Roberts about 4 years ago

  • Status changed from Unreviewed to Accepted

We aren’t sure. There are a nest of issues here, and three of us have been taking turns beating our heads against them for the last two weeks. At the most abstract, the problem can be expressed as “various libraries have slightly conflicting ideas about who is responsible for thread/process safety and related concurrency issues” but that isn’t enough to fix it.

#4 Updated by Hector Rivas about 4 years ago

  • Target version changed from 49 to 69

I installed debian passenger

# dpkg -l librack-ruby*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)
||/ Name                       Version                    Description
+++-==========================-==========================-====================================================================
ii  librack-ruby               1.1.0-3~bpo50+1            A modular Ruby webserver interface
ii  librack-ruby1.8            1.1.0-3~bpo50+1            A modular Ruby webserver interface (Ruby 1.8)

and I still have the same error.

#5 Updated by Hector Rivas about 4 years ago

I have to say more. I installed and setup debian passenger for puppetmaster with versions:

ii  librack-ruby               1.1.0-3~bpo50+1            A modular Ruby webserver interface
ii  librack-ruby1.8            1.1.0-3~bpo50+1            A modular Ruby webserver interface (Ruby 1.8)

and activerecord 2.1

ii  libactiverecord-ruby1.8    2.1.0-3                    Tie database tables to classes (Ruby 1.8)
ii  libactivesupport-ruby1.8   2.1.0-3                    utility classes and extensions (Ruby 1.8)

And it always failed with this error (With mongrel failed the 20% of the times):

Executing puppet one time and no daemonized: info: Retrieving plugin
info: Loading facts in sshkeys
info: Loading facts in sshkeys
err: Could not retrieve catalog from remote server: Error 400 on SERVER: Mysql::Error: MySQL server has gone away: SELECT * FROM `hosts`     WHERE (`hosts`.`name` = 'myhost.mycompany.com')  LIMIT 1
err: Cached catalog for myhost.mycompany.com failed: Could not parse YAML data for catalog myhost.mycompany.com: allocator undefined for Proc
notice: Using cached catalog
err: Could not retrieve catalog; skipping run

Then I upgraded to activerecord 2.3 (from debian backports):

ii  libactiverecord-ruby1.8                                 2.3.5-1~bpo50+1            ORM database interface for ruby
ii  libactivesupport-ruby1.8                                2.3.5-1~bpo50+1            utility classes and extensions (Ruby 1.8)

And now I don’t get the error. But I still have to leave it running for some hours to verify that the error is solved.

#6 Updated by Andrew Gaffney almost 4 years ago

Updating rails, activerecord, and the rest of the active* gems to 2.3.5 did decrease the frequency of this error, but it has not eliminated it completely.

#7 Updated by Adam Crews over 3 years ago

I’m seeing what I think is the same issue using the postgresql connector.

err: Could not retrieve catalog from remote server: Error 400 on SERVER: PGError: no connection to the server
: SELECT "hosts".* FROM "hosts" WHERE ("hosts"."name" = 'acrews-pup2.xxxx') LIMIT 1

My puppet master is RH6 using activerecord 3.0.3, puppet 2.6.4, rack 1.1.0 and unicorn 3.0.1

I believe that the ActiveRecord::Base.verify_active_connections! call is not being made or not functioning properly, but I also have not been able to narrow it down more.

#8 Updated by Nigel Kersten over 3 years ago

  • Target version deleted (69)

#9 Updated by Stefan Schulte almost 3 years ago

I’m also seeing this error (agent and server running puppet 2.6.1) and tried the following

  • rebuild mysql database with innodb
  • update rails and activerecord to 3.0.10 (patched master with #4763)
  • update passenger to 3.0.9
  • use the mysql2 gem (patched master with #9997)

I’m only seeing this error with solaris sparc nodes. I updated ruby on a few sparc nodes from 1.8.5 to 1.8.7-p352 with no luck. Now I stumbled over #3581 and I am also using tagmail for reports.

So question to all the others here: Do you use tagmail?

#10 Updated by Stefan Schulte almost 3 years ago

Here is some output from rails.log

Failure 01:

D, [2011-10-14T15:23:09.474878 #20802] DEBUG -- :   Puppet::Rails::FactName Load (0.3ms)  SELECT `fact_names`.* FROM `fact_names` WHERE `fact_names`.`name` = 'clientversion' LIMIT 1
D, [2011-10-14T15:23:09.478551 #20802] DEBUG -- :   Puppet::Rails::FactName Load (0.3ms)  SELECT `fact_names`.* FROM `fact_names` WHERE `fact_names`.`name` = 'is_virtual' LIMIT 1
D, [2011-10-14T15:23:09.504577 #20802] DEBUG -- :   Puppet::Rails::FactName Load (23.0ms)  SELECT `fact_names`.* FROM `fact_names` WHERE `fact_names`.`name` = 'virtual' LIMIT 1
D, [2011-10-14T15:23:09.504760 #20802] DEBUG -- : Mysql2::Error: Lost connection to MySQL server during query: SELECT  `fact_names`.* FROM `fact_names` WHERE `fact_names`.`name` = 'virtual' LIMIT 1
D, [2011-10-14T15:24:02.412434 #20802] DEBUG -- :   Puppet::Rails::Host Load (1.0ms)  SELECT `hosts`.* FROM `hosts` WHERE `hosts`.`name` = 'XXX' LIMIT 1

Failure 02:

D, [2011-10-14T16:16:08.253846 #20856] DEBUG -- :   Puppet::Rails::FactName Load (0.2ms)  SELECT `fact_names`.* FROM `fact_names` WHERE `fact_names`.`name` = 'ipaddress_nxge3_4' LIMIT 1
D, [2011-10-14T16:16:08.280055 #20856] DEBUG -- :   Puppet::Rails::FactName Load (24.3ms)  SELECT `fact_names`.* FROM `fact_names` WHERE `fact_names`.`name` = 'kernelrelease' LIMIT 1
D, [2011-10-14T16:16:08.280223 #20856] DEBUG -- : Mysql2::Error: Lost connection to MySQL server during query: SELECT  `fact_names`.* FROM `fact_names` WHERE `fact_names`.`name` = 'kernelrelease' LIMIT 1
D, [2011-10-14T16:16:08.359105 #20806] DEBUG -- :   Puppet::Rails::ParamValue Load (0.8ms)  SELECT `param_values`.* FROM `param_values` WHERE (`param_values`.resource_id = 157) AND (param_name_id = 4)
D, [2011-10-14T16:16:08.360513 #20806] DEBUG -- :   Puppet::Rails::ParamValue Load (0.3ms)  SELECT `param_values`.* FROM `param_values` WHERE (`param_values`.resource_id = 157) AND (param_name_id = 3)

When I search for the error on google I mostly find timeout related problems. But since the error happens during a facts retrieval (as far as I can tell) and after a few successfull queries that does not seem to be the case here.

#11 Updated by Stefan Schulte almost 3 years ago

My problem was indeed related to tagmail. See #3581 for a patch

#12 Updated by James Turnbull almost 3 years ago

  • Status changed from Accepted to Needs Decision
  • Assignee set to Jason McKerr

Jason – this needs commercial triage.

#13 Updated by Anonymous over 2 years ago

  • Status changed from Needs Decision to Duplicate
  • Assignee changed from Jason McKerr to Anonymous

This is absolutely a duplicate of the earlier ticket – specifically, our handling of fork in Puppet means that we will hit this problem anywhere that we fork. tagmail is just the most common place for people to run into that. :)

#14 Updated by James Turnbull over 2 years ago

Daniel – which ticket is it a duplicate of?

#15 Updated by Anonymous over 2 years ago

James Turnbull wrote:

Daniel – which ticket is it a duplicate of?

3581, since it is caused by the fork inside Puppet causing the connection to be shut down improperly. I will like the two properly now.

#16 Updated by Walter Heck over 2 years ago

Not sure which bug to add this to, but the real issue here is that mysql connection/error handling is done improperly. When this error is encountered, the connection should be restored and the query retried. Related info about it here: http://openquery.com/blog/error-handling-for-mysql-applications.

Places where you will see this are merely a symptom, like the fork you guys are talking about.

#17 Updated by Charlie Sharpsteen over 1 year ago

  • Keywords changed from stored configs to stored configs customer

#19 Updated by Anonymous over 1 year ago

  • Assignee deleted (Anonymous)

Also available in: Atom PDF