Bug #4694
Storeconfig issue with AR/MySQL
| Status: | Duplicate | Start date: | 09/02/2010 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | % Done: | 0% |
||
| Category: | stored configuration | |||
| Target version: | - | |||
| Affected Puppet version: | 0.25.5 | Branch: | ||
| Keywords: | stored configs | |||
| Votes: | 8 |
Description
We’re having an issue with storeconfigs generating random MySQL disconnect errors. This is not related to #2880 or #2816. Though I suspect that the underlying cause may be the same.
First:
err: Mysql::Error: MySQL server has gone away: INSERT INTO `fact_values` (...)
Second:
err: Puppet::Parser::Compiler failed with error ActiveRecord::StatementInvalid: Mysql::Error: Lost connection to MySQL server during query: SELECT * FROM `param_values` WHERE (`param_values`.resource_id = 1136908 AND (param_name_id = 133)) on node xxx.
Third:
err: Mysql::Error: Lost connection to MySQL server during query: SELECT t.id, t.resource_id, p.name FROM resource_tags as t INNER JOIN resources r ON t.resource_id=r.id INNER JOIN puppet_tags as p ON t.puppet_tag_id=p.id WHERE r.host_id=531
So they’re coming from all over the place. I’ve verified that in the code we’re using, reconnect is set to true but it’s not reconnecting. Here’s the versions:
- Puppet 0.25.5
- Rails 2.3.8
- Mongrel 1.1.5
- OS X Leopard (server)
- MySQL gem 2.8.1
- MySQL client library 5.0.82
- MySQL server 5.1.48
- Ruby 1.8.7-p22
Here’s a summary of a single incident — puppet server (puppet) to SQL server conversation:
Puppet -> SQL: SYN
Puppet <- SQL: SYN+ACK
Puppet -> SQL: ACK
So the TCP connection is open,
- MySQL Login successful
- SET NAMES
- SELECT * FROM HOSTS where hosts.name=‘hostname’
- SELECT Facts for host
- executes a bunch of queries for facts
- Runs SELECT
param_names.* FROMparam_namesINNER JOINparam_valuesONparam_names.id =param_values.param_name_id WHERE ((param_values.resource_id = 943831)) - Server returns results and then the client issues a quit (!)?
- Socket is shutdown, client sends FIN+ACK, server responds with FIN+ACK
- Server sends RST to client twice.
I’ve been working on this for a day or 2 and I have no idea what could be causing this. I don’t see any pattern.
Related issues
History
Updated by Hector Rivas over 1 year ago
The same problem for me.
In my case my installation is:
- 7 nodes: 1 debian lenny, 1 suse 11.1 x86_64, 1 suse 11.0 s390x (ZLinux), 2 AIX 6.1 ML3
- Puppet 2.6.1rc3
- facter 1.5.7
- mongrel 1.1.5-5 (apache+mongrel) (debian package)
- libmysql-ruby1.8 2.7.4-1 (debian package)
mysql-server 5.0.51a-24+lenny4
Ruby:
- puppetmaster: ruby 1.8.7.72-3lenny1 (debian package)
- AIX nodes: ruby-1.8.7-p302-aix61
- Suse (zLinux & x86): ruby-1.8.7.p72-5.22.1
But I only have the error on the AIX and Suse ZLinux nodes, not on Linux x86 (debian or suse) (???)
Some errors that I am getting:
* More often: “SERVER: Mysql::Error: MySQL server has gone away: SELECT * FROM hosts WHERE (hosts.name = ‘hostname’) LIMIT 1”
* SERVER: Mysql::Error: MySQL server has gone away: INSERT INTO fact_values (created_at, updated_at, value, fact_name_id, host_id) VALUES(‘2010-09-03 12:36:35’, ‘2010-09-03 12:36:35’, ‘0’, 1, 4)
* SERVER: Mysql::Error: MySQL server has gone away: INSERT INTO param_values (created_at, line, resource_id, updated_at, value, param_name_id) VALUES(‘2010-09-03 12:34:08’, 75, 1197, ‘2010-09-03 12:34:08’, ‘4’, 47)
* SERVER: Mysql::Error: MySQL server has gone away: DELETE FROM param_values WHERE (id IN (‘7401’,‘7402’))
* SERVER: Mysql::Error: MySQL server has gone away: SELECT t.id, t.resource_id, p.name FROM resource_tags t INNER JOIN resources r ON t.resource_id=r.id INNER JOIN puppet_tags p ON t.puppet_tag_id=p.id WHERE r.host_id=4
Updated by Andrew Forgue over 1 year ago
It seems it’s related to Mongrel. I installed passenger 2.2.15 with the rest of the same configs and that error doesn’t seem to happen anymore. So is it a problem with mongrel itself or the way puppet uses it?
Updated by Markus Roberts over 1 year ago
- Status changed from Unreviewed to Accepted
We aren’t sure. There are a nest of issues here, and three of us have been taking turns beating our heads against them for the last two weeks. At the most abstract, the problem can be expressed as “various libraries have slightly conflicting ideas about who is responsible for thread/process safety and related concurrency issues” but that isn’t enough to fix it.
Updated by Hector Rivas over 1 year ago
- Target version changed from 49 to 69
I installed debian passenger
# dpkg -l librack-ruby*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)
||/ Name Version Description
+++-==========================-==========================-====================================================================
ii librack-ruby 1.1.0-3~bpo50+1 A modular Ruby webserver interface
ii librack-ruby1.8 1.1.0-3~bpo50+1 A modular Ruby webserver interface (Ruby 1.8)
and I still have the same error.
Updated by Hector Rivas over 1 year ago
I have to say more. I installed and setup debian passenger for puppetmaster with versions:
ii librack-ruby 1.1.0-3~bpo50+1 A modular Ruby webserver interface
ii librack-ruby1.8 1.1.0-3~bpo50+1 A modular Ruby webserver interface (Ruby 1.8)
and activerecord 2.1
ii libactiverecord-ruby1.8 2.1.0-3 Tie database tables to classes (Ruby 1.8)
ii libactivesupport-ruby1.8 2.1.0-3 utility classes and extensions (Ruby 1.8)
And it always failed with this error (With mongrel failed the 20% of the times):
Executing puppet one time and no daemonized: info: Retrieving plugin
info: Loading facts in sshkeys
info: Loading facts in sshkeys
err: Could not retrieve catalog from remote server: Error 400 on SERVER: Mysql::Error: MySQL server has gone away: SELECT * FROM `hosts` WHERE (`hosts`.`name` = 'myhost.mycompany.com') LIMIT 1
err: Cached catalog for myhost.mycompany.com failed: Could not parse YAML data for catalog myhost.mycompany.com: allocator undefined for Proc
notice: Using cached catalog
err: Could not retrieve catalog; skipping run
Then I upgraded to activerecord 2.3 (from debian backports):
ii libactiverecord-ruby1.8 2.3.5-1~bpo50+1 ORM database interface for ruby
ii libactivesupport-ruby1.8 2.3.5-1~bpo50+1 utility classes and extensions (Ruby 1.8)
And now I don’t get the error. But I still have to leave it running for some hours to verify that the error is solved.
Updated by Andrew Gaffney about 1 year ago
Updating rails, activerecord, and the rest of the active* gems to 2.3.5 did decrease the frequency of this error, but it has not eliminated it completely.
Updated by Adam Crews about 1 year ago
I’m seeing what I think is the same issue using the postgresql connector.
err: Could not retrieve catalog from remote server: Error 400 on SERVER: PGError: no connection to the server
: SELECT "hosts".* FROM "hosts" WHERE ("hosts"."name" = 'acrews-pup2.xxxx') LIMIT 1
My puppet master is RH6 using activerecord 3.0.3, puppet 2.6.4, rack 1.1.0 and unicorn 3.0.1
I believe that the ActiveRecord::Base.verify_active_connections! call is not being made or not functioning properly, but I also have not been able to narrow it down more.
Updated by Nigel Kersten about 1 year ago
- Target version deleted (
69)
Updated by Stefan Schulte 4 months ago
I’m also seeing this error (agent and server running puppet 2.6.1) and tried the following
- rebuild mysql database with innodb
- update rails and activerecord to 3.0.10 (patched master with #4763)
- update passenger to 3.0.9
- use the mysql2 gem (patched master with #9997)
I’m only seeing this error with solaris sparc nodes. I updated ruby on a few sparc nodes from 1.8.5 to 1.8.7-p352 with no luck. Now I stumbled over #3581 and I am also using tagmail for reports.
So question to all the others here: Do you use tagmail?
Updated by Stefan Schulte 4 months ago
Here is some output from rails.log
Failure 01:
D, [2011-10-14T15:23:09.474878 #20802] DEBUG -- : Puppet::Rails::FactName Load (0.3ms) SELECT `fact_names`.* FROM `fact_names` WHERE `fact_names`.`name` = 'clientversion' LIMIT 1 D, [2011-10-14T15:23:09.478551 #20802] DEBUG -- : Puppet::Rails::FactName Load (0.3ms) SELECT `fact_names`.* FROM `fact_names` WHERE `fact_names`.`name` = 'is_virtual' LIMIT 1 D, [2011-10-14T15:23:09.504577 #20802] DEBUG -- : Puppet::Rails::FactName Load (23.0ms) SELECT `fact_names`.* FROM `fact_names` WHERE `fact_names`.`name` = 'virtual' LIMIT 1 D, [2011-10-14T15:23:09.504760 #20802] DEBUG -- : Mysql2::Error: Lost connection to MySQL server during query: SELECT `fact_names`.* FROM `fact_names` WHERE `fact_names`.`name` = 'virtual' LIMIT 1 D, [2011-10-14T15:24:02.412434 #20802] DEBUG -- : Puppet::Rails::Host Load (1.0ms) SELECT `hosts`.* FROM `hosts` WHERE `hosts`.`name` = 'XXX' LIMIT 1
Failure 02:
D, [2011-10-14T16:16:08.253846 #20856] DEBUG -- : Puppet::Rails::FactName Load (0.2ms) SELECT `fact_names`.* FROM `fact_names` WHERE `fact_names`.`name` = 'ipaddress_nxge3_4' LIMIT 1 D, [2011-10-14T16:16:08.280055 #20856] DEBUG -- : Puppet::Rails::FactName Load (24.3ms) SELECT `fact_names`.* FROM `fact_names` WHERE `fact_names`.`name` = 'kernelrelease' LIMIT 1 D, [2011-10-14T16:16:08.280223 #20856] DEBUG -- : Mysql2::Error: Lost connection to MySQL server during query: SELECT `fact_names`.* FROM `fact_names` WHERE `fact_names`.`name` = 'kernelrelease' LIMIT 1 D, [2011-10-14T16:16:08.359105 #20806] DEBUG -- : Puppet::Rails::ParamValue Load (0.8ms) SELECT `param_values`.* FROM `param_values` WHERE (`param_values`.resource_id = 157) AND (param_name_id = 4) D, [2011-10-14T16:16:08.360513 #20806] DEBUG -- : Puppet::Rails::ParamValue Load (0.3ms) SELECT `param_values`.* FROM `param_values` WHERE (`param_values`.resource_id = 157) AND (param_name_id = 3)
When I search for the error on google I mostly find timeout related problems. But since the error happens during a facts retrieval (as far as I can tell) and after a few successfull queries that does not seem to be the case here.
Updated by Stefan Schulte 4 months ago
My problem was indeed related to tagmail. See #3581 for a patch
Updated by James Turnbull 3 months ago
- Status changed from Accepted to Needs Decision
- Assignee set to Jason McKerr
Jason – this needs commercial triage.
Updated by Daniel Pittman 29 days ago
- Status changed from Needs Decision to Duplicate
- Assignee changed from Jason McKerr to Daniel Pittman
This is absolutely a duplicate of the earlier ticket – specifically, our handling of fork in Puppet means that we will hit this problem anywhere that we fork. tagmail is just the most common place for people to run into that. :)
Updated by James Turnbull 26 days ago
Daniel – which ticket is it a duplicate of?
Updated by Daniel Pittman 22 days ago
James Turnbull wrote:
Daniel – which ticket is it a duplicate of?