Pivotal Labs

Main menu

Skip to primary content
Skip to secondary content
  • About
  • Case Studies
  • Team
    • Executives
    • Locations
      • San Francisco (HQ)
      • Boston
      • Boulder
      • Denver
      • London
      • Los Angeles
      • New York
  • Community
    • Blogs
    • Tech Talks
    • Events
  • Careers
    • Lifestyle
    • Principles & Practices
    • Benefits
    • FAQ
    • Apply
  • Contact
    • Press Room
    • Press Releases
    • In The News
    • Press Kit
  • All
  • Labs
  • Standup
  • Tracker

Sanitizing Solr requests

Pivotal Labs
Friday, July 17, 2009

If you’re accepting user input for Solr (which I expect most projects using it are), you’ve probably noticed that you need to sanitize what queries you pass to Solr. After reading a bunch of conflicting documentation and blog posts, I put together a simple little module to handle it for you. It should strip out everything that would cause Solr to throw an error on a query string. Let me know if it works for you or if I missed any corner cases!

module SolrStringSanitizer
  ILLEGAL_SOLR_CHARACTERS_REGEXP = /+|-|!|(|)|{|}|[|]|^||"|~|*|?|:|;|&&|||/

  def self.sanitize(string)
    if string
      string.gsub(ILLEGAL_SOLR_CHARACTERS_REGEXP,"")
    end
  end
end
  • 0 Shares
  • Share on Facebook
  • Share on Twitter

5 Comments

  1. jeff says:

    I was getting an error:

    invalid regular expression; there’s no previous pattern, to which ‘{‘ would define cardinality at 13: /+|-|!|(|)|{|}|[|]|^||”|~|*|?|:|;|&&|||/):

    So I changed the regex to this and it seems to work:

    ILLEGAL_SOLR_CHARACTERS_REGEXP = /[+-!(){}[]^|”~*?:;&]/

    Basically escaped most of the characters, and put them in a character class rather than having all of the ‘OR’ pipes.

    July 18, 2009 at 11:03 pm

  2. jeff says:

    Wow, no markdown love. pastie to the rescue: http://pastie.org/550997

    Feel free to delete the broken posts.

    July 18, 2009 at 11:08 pm

  3. John says:

    The regular expression prevents wildcard searching…

    July 20, 2009 at 3:08 am

  4. Joseph Palermo says:

    All of those characters are valid text too, escaping them seems more appropriate than removing them.

    July 20, 2009 at 9:45 am

  5. Jeremy Voorhis says:

    I’ve also written alternative to accepting raw user input in the form of a Lucene query generator. We mainly used the library for constructing specific searches for view, but it’s also makes building “advanced” search interfaces easier.

    http://github.com/jvoorhis/lucene_query/tree/master

    Thanks to Mike Mangino of Elevated Rails for allowing me to release the library with an MIT license.

    August 3, 2009 at 10:15 am

Add New Comment Cancel reply

Your email address will not be published.

Pivotal Labs

Pivotal Labs

Recent Posts

  • Does the set of all sets contain itself?
  • Standup 3/8/2012
  • Standup 3/7/2012
Subscribe to Pivotal's Feed

Author Topics

riddles (1)
agile (167)
capistrano (2)
rails (26)
movember (1)
git (10)
railsdoc (1)
object-design (1)
bdd (3)
cucumber (3)
linkedin (1)
oauth (1)
ruby (17)
tdd (2)
lvh.me (1)
rails 3.1.1 (1)
selenium (6)
homebrew (1)
mysql (5)
rvm (1)
sproutcore (1)
paperclip (2)
pry (1)
amazon (1)
heroku (1)
rails3 (2)
jasmine (3)
design (3)
process (12)
productivity (8)
learning (1)
olin (1)
migrations (2)
mongodb (2)
devise (2)
javascript (13)
rubymine (4)
ipad (1)
whurl (1)
head.js (1)
pairing (2)
tools (4)
pair programming (1)
rspec (10)
rspec2 (1)
ruby19 (1)
incubation (3)
startup (5)
api (1)
presenter (1)
vanna (1)
pivotal tracker (5)
capybara (1)
fakeweb (1)
webmock (1)
intern (1)
ruby on rails (25)
meetup (1)
textmate (1)
testing (20)
solr (4)
nyc-standup (11)
community (1)
opensource (3)
activerecord (4)
chrome (1)
mp4 (1)
activeresource (1)
flash (3)
neo4j (1)
nginx (1)
rsoc (1)
meta programming (1)
agile standup (7)
government (3)
webos (4)
xss (1)
jquery (1)
bundler (2)
ci (3)
gems (5)
postgresql (1)
geminstaller (1)
gemcutter (1)
cloud (2)
rack (2)
refraction (1)
gem (5)
refactoring (1)
validations (1)
webrat (1)
engine-yard (1)
firefox (2)
jsunit (1)
mongrel (2)
thin (1)
unicorn (1)
facebook (1)
rubygems (5)
jruby (1)
actioncontroller (1)
rails 2.3 (1)
palmpre (1)
autotest (1)
mac (2)
hosting (1)
goruco (11)
database (3)
railsconf (11)
gogaruco (4)
deployment (4)
github (1)
ie (1)
ajax (1)
intellij (1)
json (1)
asset packaging (1)
polonium (1)
character encoding (1)
utf-8 (1)
test (3)
civics (1)
hpricot (1)
rake (3)
sms (1)
unicode (1)
iphone (1)
java (1)
safari (1)
memory leaks (1)
rr (3)
editor (1)
css (1)
nyc (3)
performance (5)
fun (5)
enterprise rails (1)
health (1)
new and cool (1)
general (2)
treetop (1)
errors (1)
stack (1)
trace (1)
cache (1)
cookies (1)
freesoftware (1)
conferences (1)
development (1)
driven (1)
proxy (1)
caching (1)
peertopatent (1)
languages (1)
rest (2)
rubyforge (1)
sake (1)
file (1)
upload (1)
constants (1)
osx (1)
terminal (1)
pairprogramming (2)
  • About
  • Case Studies
  • Team
  • Community
  • Careers
  • Contact
  • Labs
  • Events

Contact Us

contact@pivotallabs.com
+1 415-77-PIVOT
TwitterLinkedInFacebook

Pivotal Tracker

Tracker is the award-winning agile project management tool that enables real-time collaboration around a shared, prioritized backlog.
Visit pivotaltracker.com >