Matthew KocherMatthew Kocher
Yaml, Psych and Ruby 1.9.2-p180 - Here there be dragons
edit Posted by Matthew Kocher on Saturday May 14, 2011 at 10:59AM

As mentioned in yesterday's standup blog, my pair and I encountered some problems with YAML parsing over the last few days, and now that I think I understand it I wanted to document it for posterity.

Psych is a new YAML parser which presumably is better than what came before it, but can't merge hash keys correctly and doesn't work with delayed job. The merging of hash keys is serious, as our standard databse.yml defines a common section, and back references it merging in individual database name and settings. When psych is loaded, we get a blank database name, which makes active record pretty much useless.

Ruby 1.9.2 optionally compiles psych into ruby if you have libyaml installed on the computer. Some gems will require psych if it's available, thus poisoning any future YAML parsing which does not expect psych's pedantic (and currently broken) behavior.

You can always look at the value of the YAML constant - it can varry between YAML, Syck and Psych, depending on what's loaded. You can switch the yamler by adding YAML::ENGINE.yamler = 'syck', but you need to make sure this happens at every code entry point.

For now, I'm no longer install libyaml and libyaml-devel on servers, which got there because I had been following RVM's information ruby prerequisites. I'll also write a chef recipe to assert that libyaml is not installed.

Comments

  1. user user on May 14, 2011 at 12:54PM

    It probably also causes this nasty issue.

    https://github.com/rubygems/rubygems/pull/57

  2. Matthew Kocher Matthew Kocher on May 14, 2011 at 01:11PM

    Interesting that the discussion there mentions a manual uninstall - "manually remove psych from your ruby installation by removing both psych.rb and psych.{so,bundle}"

    That's a nice trick if you need it, though not the kind of thing you should go doing on production.

  3. Chad Woolley Chad Woolley on May 14, 2011 at 02:58PM

    I was wondering why Psych was "better" than YAML, and found the answer:

    "The reason is that YAML uses the unmaintained Syck library, whereas Psych uses the modern LibYAML"

  4. Robin Robin on May 14, 2011 at 04:46PM

    I spent hours this week tracking down the source of a failing cucumber feature that Worked On My Machine ™ but not on our build server - root cause: Syck vs Psych YAML parsing.

    I believe the reason a lot more people are (or will be) running into these issues is because of this commit to bundler, making it require psych as of 1.0.10 - https://github.com/rubygems/rubygems/pull/57

  5. Nate Clark Nate Clark on May 14, 2011 at 07:21PM

    We ran into this problem a couple weeks ago, too.

    To be fair, it's not Psych's fault, really. Psych is supposedly more strictly YAML 1.1 compliant. The problem with the database.yml and the common &defaults section is that Psych strictly interprets the defaults node as an actual node. So, now your database.yml is specifying a development, test and defauts environment ... not what you wanted. There is some initialization code in ActiveRecord that looks up all the database names in your database.yml, and because the 'defaults' environment doesn't have a database name, it chokes. Previously there was something that was ignoring a node called "defaults" in the database.yml.

    We solved this problem the dumb and simple way: just remove the fancy &defaults thing from your database.yml. Use Soloist to write out your database.yml verbosely, or just suck it up and copy & paste.

  6. Matthew Kocher Matthew Kocher on May 14, 2011 at 11:58PM

    Not merging subkeys is a bug that's been fixed in ruby head and is supposed to be backported to 1.9.2.

    Psych's strictness is a different issue which is understandable, though I appreciate Nokogiri having a an optional pedantic mode. There will be plenty of intermittent breakage as psych becomes more prevalent - the jasmine gem needs a release to fix the jasmine.yml file, for instance.

  7. Alex Chaffee Alex Chaffee on May 19, 2011 at 11:56AM

    I wish I'd read this a few days ago. I wasted a bunch of time yesterday on this bug: http://rubyforge.org/tracker/?func=detail&atid=575&aid=29181&group_id=126

    Bottom line: when you install Ruby with rvm, make sure to provide a --with-libyaml-dir options, or bad things will happen, especially if you're building a gem.

    e.g.

    rvm package install readline
    sudo port install libyaml
    rvm install 1.9.2 -C "--with-libyaml-dir=/opt/local --with-readline-dir=$HOME/.rvm/usr"
    
  8. Alex Chaffee Alex Chaffee on May 19, 2011 at 11:57AM

    Wait, now I'm confused again. You're recommending syck, but Ryan's recommending psych... ugh.

  9. Chris Boone Chris Boone on June 10, 2011 at 09:23AM

    Alex, regarding your suggested method of installation: Is there a good reason to use MacPorts to install libyaml, instead of using RVM’s package?

    I ran the following, and it seemed to work fine:

    rvm package install readline
    rvm package install libyaml
    rvm install 1.9.2-p180 -C --with-libyaml-dir=$rvm_usr_path,--with-readline-dir=$rvm_usr_path
    
  10. Dan Healy Dan Healy on June 23, 2011 at 11:58PM

    We encountered the same issue as Nate Clark details, but it manifested itself with a slightly different error, so I just wanted to throw some keywords up here in case anyone else goes through what we just went through..

    Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) (Mysql::Error)

    We were getting the above error when trying to start our 1.9.2 rails 2.3.12 production server after pushing a large number of changes. This didn't make a lot of sense to us because our production server should be trying to connect via a tcp host. Somehow, we never encountered this issue on staging or in rails console in any environment.

    It turns out this psych / syck issue was causing the problem. Our database.yml file used a &default section and included it in production: with <<: *default. We verified that it was receiving all of the "default" configuration but none of the "production" configuration. Our "default" section also didn't have a "socket" or "host" config. So because of this, our production env was therefore trying to use the default of "socket: /var/run/mysqld/mysqld.sock" and failing because of a lack of mysql on the production host.

    Realizing this, we then removed "default" and copied its config to all other configurations as Nate mentions, and it worked.

    The root cause is that something we changed, whether it be bundler or another gem, moved us from syck to psych and it broke all of our yaml files that use this merging method.

  11. Kurt Stephens Kurt Stephens on July 07, 2011 at 12:06PM

    If you are using syck under 1.9.2, you might wanna apply this REE patch:

    http://code.google.com/p/rubyenterpriseedition/issues/detail?id=66