Alfresco uses Lucene to provide services for indexing and searching metadata and content. Though Lucene is a reliable subsystem for most of our customers, if Alfresco trouble does occur, the Lucene index is often involved. A customer asked me today for advice on maintaining the health of Alfresco indexes. After getting some ideas from support, I decided to document the advice for the larger community.
The live Lucene indexes can not be backed up without corruption, so Alfresco is configured to dump a snapshot of the indexes each night. This index backup can then be included in your Alfresco backup routine (copied off the production machine) and used during a restore of Alfresco. On restore, Alfresco will reload the indexes from the last backup, and index the metadata for content that was added after that point in time. Once the metadata index is complete, users can interact with the system again. The system will complete any missing full-content indexes in the background without impacting accessibility. While this process completes, users will be able to search for and find documents based on the metadata, but not on document content.
The most important reminder the team had was to verify that index backups are being performed and that you can successfully restore from the backups. Too often we (as IT professionals) neglect to confirm that our backup plans perform as designed.
Details on how to perform a restore from index backups are available here:
One strategy for reducing the time necessary to restore Alfresco is to dump the indexes more often than once per day. You can set the schedule to backup indexes as frequently as you would like, but the backup temporarily prevents content from being indexed on the node doing the backup. It shouldn’t be a long pause, but the more often you do it the harder it will be for that node to catch up on its indexing.
Here is documentation on how to change the schedule for Lucene index backups:
Our support team also has an index checking tool that can help validate the consistency of the indexes. They normally only use it when diagnosing a problem, as it can take a long time to return and can affect system performance while it is running. If you do script a scheduled run of the tool, performance impacts can be managed by running it on a cluster instance that is not servicing user requests. I believe this checker is currently an Enterprise-only feature.
I don’t believe we have detailed documentation on the tool, but it is accessible on your Alfresco server by going here:
Support also has a method for evaluating index performance which is documented here:
Be aware that the default settings for index merging are appropriate in most cases, and tweaking those settings are more likely to reduce performance than to help.
Because all of these maintenance steps will impact the performance of the Alfresco instance on which they are run, we recommend configuring Alfresco in an N+1 redundant cluster. During normal system operation, you can use the extra instance to perform maintenance of this type rather than have that server accepting a full load of users.
The next major release of Alfresco is code-named Project Swift and will likely be Alfresco Enterprise 4.0. One of our major architectural changes is to move to Solr as our search architecture. This should help with Lucene reliability, as well as give Alfresco customers access to the many search features that Solr adds to Lucene.
Feel free to add any additional tips you have for index health in the comments.
Thanks go to Andy Hunt for technical corrections on this blog post.