Removing "out of sync" error in acts_as_solr

Solr; is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, a web administration interface and many more features. It runs in a Java servlet container such as Tomcat.  -ApacheSolr

Solr can be used in different containers and different wrappers. Our application runs on Ruby on Rails, and we used acts_as_solr. Though solr is a powerful, already stable and yet flexible third party solution that we could rely on, we were still not able to maximize its full capacity. We used the bare minimum features of solr for our search modules.

As of now, we've used a couple of acts_as_solr enhancments and add ons, some of which we learned from different online resources. We were able to use db_free_solr and explored on the highlighting and faceting capabilities of solr. Its been pretty helpful, but of course nothing is almost always seamless. We encounter few problems with syncing records from the database and onto solr. For sure, you've come across this trouble before, if you've been using solr:

Out of sync! Found N items in index, but only n were found in database!

It sure was putting down every page wherein there was this glitch in the count of the records retrieved. It therefore gave the negative impression that our site was frequently unstable. Removing a certain indexed element from the solr index is easy as:

ActsAsSolr:: Post.execute(Solr::Request:: Delete.new(:query => %{type_s:Model AND id:"Model:110809"}))
ActsAsSolr:: Post.execute(Solr::Request::Commit.new)

It could've been pretty straightforward removing this concerned item from the solr index and then everything would be well.. but its a lot harder than that if you're looking at over a thousand indexed elements vs their 'existing' counterparts in the database! Finding the exact data to remove was really the hardest part! I never knew this until I took the liberty of helping out our kind Infra Team to resolve the problem. I decided to tweak the solr parser method returning the "out of sync" error. I thought that it would actually be brilliant to just display the concerned element's id so that they could delete it from the index itself. And so, I had something like this: (in acts_solr/lib/parser_methods.rb)

raise "Out of sync! Found #{ids.size} items in index, but only #{things.size} were found in database! Remove #{(ids - (things.collect{|x| x.id})).to_sentence}." unless things.size == ids.size

And yes, viola! I can now see the faulty ids that were causing the "out of sync" problem. I presented this not-so-brilliant solution to our Infra Team, and they came up with a better idea. My colleague thought that it would be nicer if I could just do away with the "out of sync" error altogether. Since I can already pinpoint the cause of the trouble, then why not remove it for good? I came up with half the solution. It was the quicker one to implement and didn't require much from their end either.

Distinguishing the faulty id from the list of objects from solr vs those that were from the db, it paved the way for me to simply remove these ids from the checking. It was half the solution because (hint, hint.. I may be doing this next time when I have time) I could actually delete the certain indexed element from solr instead of simply removing it from solr's items on hand. This "full" solution could actually bring forth other complications since you'd have to deal with what models were concerned and what fields will solr need to look at, etc.

And so.. the half solution that I did was to clean up the elements on hand for solr. This snippet; is found in acts_as_solr/lib/parser_methods.rb.

def reorder(things, ids)
 ordered_things = Array.new(things.size)

 unless things.size == ids.size
  (ids - (things.collect{|x| x.id})).collect{|missing| ids[ids.index(missing)] = nil}
  ids = ids.compact
  end

 raise "Out of sync! Found #{ids.size} items in index, but only #{things.size} were found in database! Remove #{(ids - (things.collect{|x| x.id})).to_sentence}." unless things.size == ids.size

 things.each do |thing|
  position = ids.index(thing.id)
  ordered_things[position] = thing
  end

 ordered_things
 end
 

The first four lines above the "out of sync" message is what is critical. It will attempt to remove the missing object from the items that solr will return. If all else fails, then it will be displaying the "out of sync" error, but would still be displaying the ids that were causing the problem.

Its quick, but not dirty. It works, but will not really guarantee that your problem will go away permanently. I suggest you do a complete reindex of your whole data. Or better yet, whatever was causing it, just make sure that there are no direct database deletion of any data so that solr will always remain in sync with your database.

Hope this helps.