Bug #2616

Locking error in tagmail

Added by Martin Englund over 2 years ago. Updated about 2 years ago.

Status:Closed Start date:09/09/2009
Priority:High Due date:
Assignee:Steven Jenkins % Done:

0%

Category:reports
Target version:0.25.1
Affected Puppet version:0.25.0 Branch:
Keywords:
Votes: 0

Description

After upgrading a puppetmaster to 0.25.0 I got the following error:

/usr/ops/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:168:in `send': Could not send report emails via sendmail: Thread(#) not locked. (Puppet::Error)
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:126:in `fork'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:126:in `send'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:121:in `process'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:66:in `process'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:59:in `each'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:59:in `process'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:33:in `report'
        from /usr/ops/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `to_proc'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:85:in `setup_processor'
        from /usr/ops/lib/ruby/1.8/xmlrpc/server.rb:336:in `call'
        from /usr/ops/lib/ruby/1.8/xmlrpc/server.rb:336:in `dispatch'
        from /usr/ops/lib/ruby/1.8/xmlrpc/server.rb:323:in `each'
        from /usr/ops/lib/ruby/1.8/xmlrpc/server.rb:323:in `dispatch'
        from /usr/ops/lib/ruby/1.8/xmlrpc/server.rb:366:in `call_method'
        from /usr/ops/lib/ruby/1.8/xmlrpc/server.rb:378:in `handle'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/webrick_servlet.rb:68:in `service'
        from /usr/ops/lib/ruby/1.8/webrick/httpserver.rb:104:in `service'
        from /usr/ops/lib/ruby/1.8/webrick/httpserver.rb:65:in `run'
        from /usr/ops/lib/ruby/1.8/webrick/server.rb:173:in `start_thread'
        from /usr/ops/lib/ruby/1.8/webrick/server.rb:162:in `start'
        from /usr/ops/lib/ruby/1.8/webrick/server.rb:162:in `start_thread'
        from /usr/ops/lib/ruby/1.8/webrick/server.rb:95:in `start'
        from /usr/ops/lib/ruby/1.8/webrick/server.rb:92:in `each'
        from /usr/ops/lib/ruby/1.8/webrick/server.rb:92:in `start'
        from /usr/ops/lib/ruby/1.8/webrick/server.rb:23:in `start'
        from /usr/ops/lib/ruby/1.8/webrick/server.rb:82:in `start'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:40:in `listen'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:40:in `initialize'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:40:in `new'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:40:in `listen'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:37:in `synchronize'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:37:in `listen'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/network/server.rb:131:in `listen'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/network/server.rb:146:in `start'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/daemon.rb:128:in `start'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/application/puppetmasterd.rb:122:in `main'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/application.rb:226:in `send'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/application.rb:226:in `run_command'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/application.rb:217:in `run'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/application.rb:306:in `exit_on_fail'
        from /usr/ops/lib/ruby/site_ruby/1.8/puppet/application.rb:217:in `run'
        from /usr/ops/sbin/puppetmasterd:66

0001-FIXES-2616-move-synchronize-to-wrap-fork.patch (5.3 kB) Steven Jenkins, 09/17/2009 08:01 pm

fork-threads.rb (760 Bytes) Steven Jenkins, 09/17/2009 08:01 pm


Related issues

related to Puppet - Bug #1703: Thread error in tagmail Closed 10/28/2008

History

Updated by Markus Roberts over 2 years ago

  • Status changed from Unreviewed to Accepted
  • Target version set to 0.25.1

Updated by Markus Roberts over 2 years ago

  • Category set to reports
  • Status changed from Accepted to Needs More Information
  • Assignee set to Martin Englund

Martin —

Can you try it again with —debug, —trace and —no-deamonize and post the stack trace that this should produce right before the tagmail error you posted above? It should provide information about the cause of the error.

Thanks,

— Markus

Updated by Martin Englund over 2 years ago

Here’s the stack trace you asked for:

/usr/ops/lib/ruby/1.8/sync.rb:57:in `Fail'
/usr/ops/lib/ruby/1.8/sync.rb:63:in `Fail'
/usr/ops/lib/ruby/1.8/sync.rb:183:in `sync_unlock'
/usr/ops/lib/ruby/1.8/sync.rb:231:in `synchronize'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/util.rb:362:in `threadlock'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/util/log.rb:440:in `newmessage'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/util/log.rb:439:in `each'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/util/log.rb:439:in `newmessage'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/util/log.rb:508:in `initialize'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/util/log.rb:133:in `new'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/util/log.rb:133:in `create'
/usr/ops/lib/ruby/site_ruby/1.8/puppet.rb:51:in `info'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:152:in `send'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:151:in `each'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:151:in `send'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:126:in `fork'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:126:in `send'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:121:in `process'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:66:in `process'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:59:in `each'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:59:in `process'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/network/handler/report.rb:33:in `report'
/usr/ops/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `to_proc'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protec
t_service'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:85:in `setup_
processor'
/usr/ops/lib/ruby/1.8/xmlrpc/server.rb:336:in `call'
/usr/ops/lib/ruby/1.8/xmlrpc/server.rb:336:in `dispatch'
/usr/ops/lib/ruby/1.8/xmlrpc/server.rb:323:in `each'
/usr/ops/lib/ruby/1.8/xmlrpc/server.rb:323:in `dispatch'
/usr/ops/lib/ruby/1.8/xmlrpc/server.rb:366:in `call_method'
/usr/ops/lib/ruby/1.8/xmlrpc/server.rb:378:in `handle'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `proces
s'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/webrick_servlet.rb:68:in `
service'
/usr/ops/lib/ruby/1.8/webrick/httpserver.rb:104:in `service'
/usr/ops/lib/ruby/1.8/webrick/httpserver.rb:65:in `run'
/usr/ops/lib/ruby/1.8/webrick/server.rb:173:in `start_thread'
/usr/ops/lib/ruby/1.8/webrick/server.rb:162:in `start'
/usr/ops/lib/ruby/1.8/webrick/server.rb:162:in `start_thread'
/usr/ops/lib/ruby/1.8/webrick/server.rb:95:in `start'
/usr/ops/lib/ruby/1.8/webrick/server.rb:92:in `each'
/usr/ops/lib/ruby/1.8/webrick/server.rb:92:in `start'
/usr/ops/lib/ruby/1.8/webrick/server.rb:23:in `start'
/usr/ops/lib/ruby/1.8/webrick/server.rb:82:in `start'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:40:in `listen'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:40:in `initialize
'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:40:in `new'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:40:in `listen'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:37:in `synchroniz
e'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/network/http/webrick.rb:37:in `listen'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/network/server.rb:131:in `listen'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/network/server.rb:146:in `start'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/daemon.rb:128:in `start'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/application/puppetmasterd.rb:122:in `main
'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/application.rb:226:in `send'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/application.rb:226:in `run_command'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/application.rb:217:in `run'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/application.rb:306:in `exit_on_fail'
/usr/ops/lib/ruby/site_ruby/1.8/puppet/application.rb:217:in `run'
/usr/ops/sbin/puppetmasterd:66

Updated by Markus Roberts over 2 years ago

  • Status changed from Needs More Information to Accepted
  • Assignee changed from Martin Englund to Rein Henrichs

Updated by Luke Kanies over 2 years ago

  • Priority changed from Normal to High

Updated by Steven Jenkins over 2 years ago

  • Assignee changed from Rein Henrichs to Steven Jenkins

Updated by Steven Jenkins over 2 years ago

Some findings:

As Luke points out, this appears to be a reoccurrence of 1703: Could not send report emails via sendmail: Thread(#<Thread:0xc75c00 run>) not locked. (Puppet::Error)

  from /usr/ops/lib/ruby/site_ruby/1.8/puppet/reports/tagmail.rb:126:in `fork'

Note that the failing call is inside the fork, but it’s an error message particular to the Ruby Sync library (specifically the sync_unlock call, which itself is called by a few methods).

Because this occurs in both 0.25.0 and 0.24.5, the sync.syncronize wrapper is likely not a cause here, but rather a bystander.

There are two issues that stand out:

1- since the error appears to be occurring at the fork(), the problem may be related to forking from a thread where another thread is manipulating synchronization objects in question. Unfortunately, we don’t have those objects (we do see the Thread.inspect result, as sync_unlock’s Fail method for UnknownLocker does a th.inspect). The two choices for the objects are the @sync instance variable and various file objects (e.g., file descriptors) used internally by Ruby.

2- Since the fork in both cases appears to be the problem, the sync.synchronize call (if needed at all) should protect the fork call. I’ve attached a patch which does that simple refactoring (and pushed to my github repo git://github.com/stevenjenkins/puppet as branch tickets/0.25.0/2616). That patch should be tested.

Unfortunately, there are no tests for tagmail.rb. If someone has suggestions on testing it (or could provide a small but complete config that I could re-run) that would be very helpful.

I’ve also attached a little test program (fork-threads.rb) to demonstrate/model the synchronization issues. Note that I have commented out the synchronize calls as in the current Puppet and have put in ones like in the patch. Switching those around demonstrates that the current Puppet is not actually synchronizing the calls to IO.popen but that the patch serializes things properly. It’s also quite slow.

Now, some questions I have:

  • has anyone seen this on a platform other than Solaris x86?
  • how frequently does this occur?
  • could more information be provided about the tagmail configuration?
  • I’ve not exhaustively looked through the @sync instance variable and its occurrences. If someone is familiar with that code, please let me know, as I’ll be doing that next, and it would be great if someone could give a hand there. On the other hand, there aren’t that many occurrences.

Updated by Markus Roberts over 2 years ago

The core problem is that synchronization objects are not fork-safe.

Fork make a (copy-on-write) copy of the full address space at the time of the fork. This includes any synchronization objects. If there are other threads using them they may be in an inconsistent state at the time of the fork and thus the copy in the fork will be invalid. This effect can be demonstrated with a simple program such as:

require 'sync'

# info relies on a synchronization variable
#
$sync = Sync.new
def info(text)
  sleep rand*0.001
  $sync.synchronize { p text if text =~ /00$/}
end

# Start a bunch of threads calling info and then fork some processes that also call it
#
threads = (1..10).collect { |i| Thread.new(i) { |j| (0..1000).each { |k| info "#{i}:#{k}"} }}
1000.times { Process.detach fork { info "In fork" } }
threads.each {|th| th.join}

Updated by Steven Jenkins over 2 years ago

  • Status changed from Accepted to Ready For Checkin

Updated by Steven Jenkins over 2 years ago

And pointing to the commit: ref: 84f9ca592521172ed51324d27d69ddadd549633a on github: http://github.com/stevenjenkins/puppet/tree/tickets/0.25.0/2616

Updated by James Turnbull over 2 years ago

  • Status changed from Ready For Checkin to Closed

Pushed in commit:“5c2ba479d17b6a88fa018947de0b00d16d53122” in branch 0.25.x

Also available in: Atom PDF