Glad You're Ready. Let's Get Started!

Let us know how we can contact you.

Thank you!

We'll respond shortly.

Sanitizing Solr requests

If you’re accepting user input for Solr (which I expect most projects using it are), you’ve probably noticed that you need to sanitize what queries you pass to Solr. After reading a bunch of conflicting documentation and blog posts, I put together a simple little module to handle it for you. It should strip out everything that would cause Solr to throw an error on a query string. Let me know if it works for you or if I missed any corner cases!

module SolrStringSanitizer
  ILLEGAL_SOLR_CHARACTERS_REGEXP = /+|-|!|(|)|{|}|[|]|^||"|~|*|?|:|;|&&|||/

  def self.sanitize(string)
    if string

  1. jeff says:

    I was getting an error:

    invalid regular expression; there’s no previous pattern, to which ‘{‘ would define cardinality at 13: /+|-|!|(|)|{|}|[|]|^||”|~|*|?|:|;|&&|||/):

    So I changed the regex to this and it seems to work:

    ILLEGAL_SOLR_CHARACTERS_REGEXP = /[+-!(){}[]^|”~*?:;&]/

    Basically escaped most of the characters, and put them in a character class rather than having all of the ‘OR’ pipes.

  2. jeff says:

    Wow, no markdown love. pastie to the rescue:

    Feel free to delete the broken posts.

  3. John says:

    The regular expression prevents wildcard searching…

  4. Joseph Palermo says:

    All of those characters are valid text too, escaping them seems more appropriate than removing them.

  5. I’ve also written alternative to accepting raw user input in the form of a Lucene query generator. We mainly used the library for constructing specific searches for view, but it’s also makes building “advanced” search interfaces easier.

    Thanks to Mike Mangino of Elevated Rails for allowing me to release the library with an MIT license.

Post a Comment

Your Information (Name required. Email address will not be displayed with comment.)

* Copy This Password *

* Type Or Paste Password Here *