Glad You're Ready. Let's Get Started!

Let us know how we can contact you.

Thank you!

We'll respond shortly.

Standup 08/12/2010: (Text encoding issues)

Ask for Help

“We have a file claiming to be ISO-8859-l encoded, but having characters that don’t seem to be encoded that way. Every editor we try to open it in shows the bad characters. We have tried to use the iconv utility to change it to a UTF8 file type but it keeps crapping out whenever it hits an unknown character for that encoding. Any ideas how to fix these files??”

There were multiple suggestions:

  • Try running iconv to encode it to its current encoding and see if that works i.e. latin1 -> latin1
  • Manually fix the characters that iconv complains on. (this was rejected as there were thousands of fils that needed fixing).
  • Try pasting the file into a WYSYWIG editor that may fix the bad characters and then copying them out again (using mechanize or similar to automate)
  • Try opening it in MS-Word or similar and saving it back out.

Interesting Things

  • the ‘:contains’ pseudo selector in jQuery & webrat behave differently with regards to commas. in jQuery it treats it as a string, in webrat (probably because it is converted to xpath) it is treated as separator.
  • spec.opts << --format profile will show you the 10 slowest specs after the suite is run.
  • Rails versions > 2.3.x use HTTP only cookeis so you cannot use selenium to clear them out. You can turn that off in your test environment to fix this. while a slower you may also just want your sleenium tests to open a fresh browser for every test in case there are other side effects in the browser like local data stores etc.
  • If you are using RubyMine and are experiencing a perpetual ‘Attach gems’ problem you may be able to fix it by clearing out older gem versions. gem clean
  • Riak is a distributed hash table data store solution. They are working on a project that is going to implement the SOLR api but be backed by Riak instead of lucene
  • If your using Rails 3/Passanger and you would like to have a production like environment but not call it ‘production’, say ‘demo’ You will need to change not only the RAILS_ENV but also the RACK_ENV variable to get it to work. Someone also mentioned that there may be a Passenger config that will also do this.

  1. Joseph Palermo says:

    With regards to the encoding problem. I’ve had data from a provider before with the same problem. It turned out that they had used a single char column and had run out of low byte values and had started using high byte values too.

    This caused all programs to “think” that it was using a multi-byte encoding because they saw the high byte values. In reality there were no multi-byte characters, just high byte characters that meant something specific.

    The solution was to force an ASCII encoding when reading the file. We couldn’t “see” the high byte characters obviously, but we could read them programatically at that point and compare them at least.

Post a Comment

Your Information (Name required. Email address will not be displayed with comment.)

* Copy This Password *

* Type Or Paste Password Here *