Steve ConoverSteve Conover
Inspect running ruby processes using xray and kill -3
edit Posted by Steve Conover on Friday March 20, 2009 at 06:00PM

We made a code change and deployed to demo, and all the sudden some of our ruby processes were eating a ton of CPU against our full dataset.

In the java world you can send a SIGQUIT to any running java process and get a thread dump. Go ahead, run a java process and kill -3 it.

You can get this in the ruby world by using xray:

sudo gem install xray

Drop this into your code:

require "xray/thread_dump_signal_handler"

Now:

kill -3 <ruby pid>

Look in the log file where you're sending stdout. You'll see something like:

=============== XRay - Done ===============
/usr/lib64/ruby/gems/1.8/gems/eventmachine-0.12.0/lib/eventmachine.rb:224:in `call'
    _ /usr/lib64/ruby/gems/1.8/gems/eventmachine-0.12.0/lib/eventmachine.rb:224:in `run_machine'
    _ /usr/lib64/ruby/gems/1.8/gems/eventmachine-0.12.0/lib/eventmachine.rb:224:in `run'
    _ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/lib/thin/backends/base.rb:57:in `start'
    _ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/lib/thin/server.rb:150:in `start'
    _ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/lib/thin/controllers/controller.rb:80:in `start'
    _ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/lib/thin/runner.rb:173:in `send'
    _ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/lib/thin/runner.rb:173:in `run_command'
    _ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/lib/thin/runner.rb:139:in `run!'
    _ /usr/lib64/ruby/gems/1.8/gems/thin-1.0.0/bin/thin:6
    _ /usr/bin/thin:19:in `load'
    _ /usr/bin/thin:19

(that's thin patiently waiting to service the next request)

A one-line code drop-in results in a powerful new inspection tool. Pretty neat.

For bonus points:

ps ax | grep "thin server" | grep -v grep | awk '{print $1}' | xargs kill -3

For more bonus points, stick this in a capistrano task and grab the thread dumps from your logs, and you'll have a cluster-wide snapshotting tool.

We kill -3'd our CPU-eating thins and discovered a directory scan problem introduced by a recent code change - totally obvious from the thread dump. Now we're nailing it down with a failing perf unit test and fixing the problem.

Comments

  1. chris chris on March 21, 2009 at 07:40AM

    Even more bonus points: pkill -3 "thin server"

    Also have a look at pgrep