We had a bit of a head scratcher this week at the New York City office while working on Red Rover, a social directory for engaging students with their colleges and employees with their employer. We were trying to allow a CSV to be uploaded to the application, when it mysteriously failed to parse the CSV. We narrowed it down to being caused by a certain row with strangely encoded international characters (but not every row with them was a problem):
Fuentes,Jesús,”Cribbage, Chess, and Bridge Club”,Treasurer
But another row with the same character with the same encoding would import fine:
Johnson,Lúisa,Dodgeball Club,President
It turned out that this was due a problem with how Ruby finds character boundaries in 1.8. If that miscalculated character boundary happens to be where a quote mark begins in your CSV file, FasterCSV will hurl:
1.8.7> 'Jesús,"'.split(//)
=> ["J","e","s","349s,""]
1.9 > 'Jesús,"'.split(//)
=> ["J","e","s","ú","s",",","""]
This is not a problem in Ruby 1.9 with FasterCSV or in the old fashioned CSV class included with Ruby’s standard library in 1.8.6. Hopefully I can help others who have got this error staring them in the face despite having a perfectly valid CSV in every regard:
FasterCSV::MalformedCSVError: FasterCSV::MalformedCSVError
from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1623:in `shift'
from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1614:in `each'
from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1614:in `shift'
from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1581:in `loop'
from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1581:in `shift'
from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1526:in `each'
from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1537:in `to_a'
from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1537:in `read'
from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1229:in `parse'
Is your $KCODE set to “U” in 1.8.7?
Here are my results from 1.8.7 REE
> $KCODE = “NONE”
> ‘Jesús,”‘.split(//)
=> ["J", "e", "s", "303", "272", "s", ",", """]
> $KCODE = ‘U’
> ‘Jesús,”‘.split(//)
=> ["J", "e", "s", "303272", "s", ",", """]
No $KCODE value produces the results you were seeing for me though.
I wonder if the input you have is actually an invalid character encoding in your input and 1.9 is able to correct it, but 1.8.7 is not.
December 7, 2010 at 11:03 pm