Close
Glad You're Ready. Let's Get Started!

Let us know how we can contact you.

Thank you!

We'll respond shortly.

LABS
FasterCSV, Ruby 1.8, and Character Encodings

We had a bit of a head scratcher this week at the New York City office while working on Red Rover, a social directory for engaging students with their colleges and employees with their employer. We were trying to allow a CSV to be uploaded to the application, when it mysteriously failed to parse the CSV. We narrowed it down to being caused by a certain row with strangely encoded international characters (but not every row with them was a problem):

Fuentes,Jesús,”Cribbage, Chess, and Bridge Club”,Treasurer

But another row with the same character with the same encoding would import fine:

Johnson,Lúisa,Dodgeball Club,President

It turned out that this was due a problem with how Ruby finds character boundaries in 1.8. If that miscalculated character boundary happens to be where a quote mark begins in your CSV file, FasterCSV will hurl:

1.8.7> 'Jesús,"'.split(//)
=> ["J","e","s","349s,""]
1.9   > 'Jesús,"'.split(//)
=> ["J","e","s","ú","s",",","""]

This is not a problem in Ruby 1.9 with FasterCSV or in the old fashioned CSV class included with Ruby’s standard library in 1.8.6. Hopefully I can help others who have got this error staring them in the face despite having a perfectly valid CSV in every regard:

FasterCSV::MalformedCSVError: FasterCSV::MalformedCSVError
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1623:in `shift'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1614:in `each'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1614:in `shift'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1581:in `loop'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1581:in `shift'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1526:in `each'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1537:in `to_a'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1537:in `read'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1229:in `parse'

Comments
  1. Joseph Palermo says:

    Is your $KCODE set to “U” in 1.8.7?

    Here are my results from 1.8.7 REE

    > $KCODE = “NONE”
    > ‘Jesús,”‘.split(//)
    => [“J”, “e”, “s”, “303”, “272”, “s”, “,”, “””]

    > $KCODE = ‘U’
    > ‘Jesús,”‘.split(//)
    => [“J”, “e”, “s”, “303272”, “s”, “,”, “””]

    No $KCODE value produces the results you were seeing for me though.

    I wonder if the input you have is actually an invalid character encoding in your input and 1.9 is able to correct it, but 1.8.7 is not.

Post a Comment

Your Information (Name required. Email address will not be displayed with comment.)

* Copy This Password *

* Type Or Paste Password Here *