Whats new in Lucene / Solr 4.0
The release 4.0 is one of the important milestone for Lucene and Solr. It has lot of new features and performance important. Few important ones are highliggted in this article.
- ColumnStrideFields: DocValues are stored on a per-document basis where each documents field can hold exactly one value of a given type.
- Facet search, this feature was already available in Solr. It is now integrated to Lucene.
- With flexible indexing it is now possible for an application to create its own postings codec, to alter how fields, terms, docs and positions are encoded into the index.
- Added a variety of different relevance ranking systems to Lucene.
- Added a Codec implementation that works with append-only filesystems (such as e.g. Hadoop DFS).
- Added DirectSpellChecker, which retrieves correction candidates directly from the term dictionary using levenshtein automata.
- Indexed terms are no longer UTF-16 char sequences, instead terms can be any binary value encoded as byte arrays. By default, text terms are now encoded as UTF-8 bytes.
- Substantially faster performance when using a Filter during searching.
- FuzzyQuery is 100-200 times faster than in past releases.
- Added index statistics such as the number of tokens for a term or field, number of postings for a field, and number of documents with a posting for a field.
- Added RegexpQuery support to contrib/queryparser.
Solr 4.0-alpha includes more NoSQL features for those using Solr as a primary data store.
- Distributed indexing designed from the ground up for near real-time (NRT) and NoSQL features such as realtime-get, optimistic locking, and durable updates.
- High availability with no single points of failure.
- Apache Zookeeper integration for distributed coordination and cluster metadata and configuration storage.
- Updates sent to any node in the cluster and are automatically forwarded to the correct shard and replicated to multiple nodes for redundancy.
- Queries sent to any node automatically perform a full distributed search across the cluster with load balancing and fail-over.
- A transaction log ensures that even uncommitted documents are never lost.
- Real-time Get – The ability to quickly retrieve the latest version of a document, without the need to commit or open a new searcher.
- Atomic updates - the ability to add, remove, change, and increment fields of an existing document without having to send in the complete document again.
- Pivot Faceting – Multi-level or hierarchical faceting where the top constraints for one field are found for each top constraint of a different field.
- Pseudo-Join functionality – The ability to select a set of documents based on their relationship to a second set of documents.
- A brand new web admin interface, including support for SolrCloud.