Zach BrockZach Brock
Sanitizing Solr requests
edit Posted by Zach Brock on Friday July 17, 2009 at 01:29PM

If you're accepting user input for Solr (which I expect most projects using it are), you've probably noticed that you need to sanitize what queries you pass to Solr. After reading a bunch of conflicting documentation and blog posts, I put together a simple little module to handle it for you. It should strip out everything that would cause Solr to throw an error on a query string. Let me know if it works for you or if I missed any corner cases!

module SolrStringSanitizer
  ILLEGAL_SOLR_CHARACTERS_REGEXP = /\+|\-|!|(|)|{|}|[|]|\^|\|"|~|*|\?|:|;|&&|\|\|/

  def self.sanitize(string)
    if string
      string.gsub(ILLEGAL_SOLR_CHARACTERS_REGEXP,"")
    end
  end
end

Comments

  1. jeff jeff on July 18, 2009 at 11:03PM

    I was getting an error:

    invalid regular expression; there's no previous pattern, to which '{' would define cardinality at 13: /\+|\-|!|(|)|{|}|[|]|\^|\|"|~|*|\?|:|;|&&|\|\|/):

    So I changed the regex to this and it seems to work:

    ILLEGAL_SOLR_CHARACTERS_REGEXP = /[\+\-!(){}[]\^\|"~*\?:;&]/

    Basically escaped most of the characters, and put them in a character class rather than having all of the 'OR' pipes.

  2. jeff jeff on July 18, 2009 at 11:08PM

    Wow, no markdown love. pastie to the rescue: http://pastie.org/550997

    Feel free to delete the broken posts.

  3. John John on July 20, 2009 at 03:08AM

    The regular expression prevents wildcard searching...

  4. Joseph Palermo Joseph Palermo on July 20, 2009 at 09:45AM

    All of those characters are valid text too, escaping them seems more appropriate than removing them.

  5. Jeremy Voorhis Jeremy Voorhis on August 03, 2009 at 10:15AM

    I've also written alternative to accepting raw user input in the form of a Lucene query generator. We mainly used the library for constructing specific searches for view, but it's also makes building "advanced" search interfaces easier.

    http://github.com/jvoorhis/lucene_query/tree/master

    Thanks to Mike Mangino of Elevated Rails for allowing me to release the library with an MIT license.