Pivotal Labs

Main menu

Skip to primary content
Skip to secondary content
  • About
  • Case Studies
  • Team
    • Executives
    • Locations
      • San Francisco (HQ)
      • Boston
      • Boulder
      • Denver
      • London
      • Los Angeles
      • New York
  • Community
    • Blogs
    • Tech Talks
    • Events
  • Careers
    • Lifestyle
    • Principles & Practices
    • Benefits
    • FAQ
    • Apply
  • Contact
    • Press Room
    • Press Releases
    • In The News
    • Press Kit
  • All
  • Labs
  • Standup
  • Tracker

FasterCSV, Ruby 1.8, and Character Encodings

Evan Farrar
Tuesday, December 7, 2010

We had a bit of a head scratcher this week at the New York City office while working on Red Rover, a social directory for engaging students with their colleges and employees with their employer. We were trying to allow a CSV to be uploaded to the application, when it mysteriously failed to parse the CSV. We narrowed it down to being caused by a certain row with strangely encoded international characters (but not every row with them was a problem):

Fuentes,Jesús,”Cribbage, Chess, and Bridge Club”,Treasurer

But another row with the same character with the same encoding would import fine:

Johnson,Lúisa,Dodgeball Club,President

It turned out that this was due a problem with how Ruby finds character boundaries in 1.8. If that miscalculated character boundary happens to be where a quote mark begins in your CSV file, FasterCSV will hurl:

1.8.7> 'Jesús,"'.split(//)
=> ["J","e","s","349s,""]
1.9   > 'Jesús,"'.split(//)
=> ["J","e","s","ú","s",",","""]

This is not a problem in Ruby 1.9 with FasterCSV or in the old fashioned CSV class included with Ruby’s standard library in 1.8.6. Hopefully I can help others who have got this error staring them in the face despite having a perfectly valid CSV in every regard:

FasterCSV::MalformedCSVError: FasterCSV::MalformedCSVError
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1623:in `shift'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1614:in `each'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1614:in `shift'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1581:in `loop'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1581:in `shift'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1526:in `each'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1537:in `to_a'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1537:in `read'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1229:in `parse'
  • 0 Shares
  • Share on Facebook
  • Share on Twitter

One comment

  1. Joseph Palermo says:

    Is your $KCODE set to “U” in 1.8.7?

    Here are my results from 1.8.7 REE

    > $KCODE = “NONE”
    > ‘Jesús,”‘.split(//)
    => ["J", "e", "s", "303", "272", "s", ",", """]

    > $KCODE = ‘U’
    > ‘Jesús,”‘.split(//)
    => ["J", "e", "s", "303272", "s", ",", """]

    No $KCODE value produces the results you were seeing for me though.

    I wonder if the input you have is actually an invalid character encoding in your input and 1.9 is able to correct it, but 1.8.7 is not.

    December 7, 2010 at 11:03 pm

Add New Comment Cancel reply

Your email address will not be published.

Evan Farrar

Evan Farrar
New York

Recent Posts

  • Rails 3.1 Hackfest NYC July 23rd
  • Testing Service Integrations with Bash and cURL
  • Sending Text Messages with Twilio
Subscribe to Evan's Feed

Author Topics

nyc (2)
sms (1)
  • About
  • Case Studies
  • Team
  • Community
  • Careers
  • Contact
  • Labs
  • Events

Contact Us

contact@pivotallabs.com
+1 415-77-PIVOT
TwitterLinkedInFacebook

Pivotal Tracker

Tracker is the award-winning agile project management tool that enables real-time collaboration around a shared, prioritized backlog.
Visit pivotaltracker.com >