Whats new in Lucene / Solr 4.0

  •        0
  

The release 4.0 is one of the important milestone for Lucene and Solr. It has lot of new features and performance important. Few important ones are highliggted in this article.

Lucene:

  • ColumnStrideFields: DocValues are stored on a per-document basis where each documents field can hold exactly one value of a given type.
  • Facet search, this feature was already available in Solr. It is now integrated to Lucene.
  • With flexible indexing it is now possible for an application to create its own postings codec, to alter how fields, terms, docs and positions are encoded into the index.
  • Added a variety of different relevance ranking systems to Lucene.
  • Added a Codec implementation that works with append-only filesystems (such as e.g. Hadoop DFS).
  • Added DirectSpellChecker, which retrieves correction candidates directly from the term dictionary using levenshtein automata.
  • Indexed terms are no longer UTF-16 char sequences, instead terms can be any binary value encoded as byte arrays. By default, text terms are now encoded as UTF-8 bytes.
  • Substantially faster performance when using a Filter during searching.
  • FuzzyQuery is 100-200 times faster than in past releases.
  • Added index statistics such as the number of tokens for a term or field, number of postings for a field, and number of documents with a posting for a field.
  • Added RegexpQuery support to contrib/queryparser.
Solr: Solr 4.0-alpha includes more NoSQL features for those using Solr as a primary data store.
  • Distributed indexing designed from the ground up for near real-time (NRT) and NoSQL features such as realtime-get, optimistic locking, and durable updates.
  • High availability with no single points of failure.
  • Apache Zookeeper integration for distributed coordination and cluster metadata and configuration storage.
  • Updates sent to any node in the cluster and are automatically forwarded to the correct shard and replicated to multiple nodes for redundancy.
  • Queries sent to any node automatically perform a full distributed search across the cluster with load balancing and fail-over.
  • A transaction log ensures that even uncommitted documents are never lost.
  • Real-time Get – The ability to quickly retrieve the latest version of a document, without the need to commit or open a new searcher.
  • Atomic updates - the ability to add, remove, change, and increment fields of an existing document without having to send in the complete document again.
  • Pivot Faceting – Multi-level or hierarchical faceting where the top constraints for one field are found for each top constraint of a different field.
  • Pseudo-Join functionality – The ability to select a set of documents based on their relationship to a second set of documents.
  • A brand new web admin interface, including support for SolrCloud.
Reference:
http://lucene.apache.org/core/4_0_0-ALPHA/changes/Changes.html
http://lucene.apache.org/solr/


   

comments powered by Disqus


Related Articles

8 Best Open Source Searchengines built on top of Lucene

  • lucene solr searchengine elasticsearch

Lucene is most powerful and widely used Search engine. Here is the list of 7 search engines which is built on top of Lucene. You could imagine how powerful they are.

Read More


Solr vs Elastic Search

  • full-text-search search-engine lucene solr elastic-search

Solr and Elastic Search are built on top of Lucene. Both are open source and both have extra features which makes programmer life easy. This article explains the difference and the best situation to choose between them.

Read More


Lucene Vs Solr

  • searchengine lucene solr

Lucene is a search library built in Java. Solr is a web application built on top of Lucene. Certainly Solr = Lucene + Added features. Often there would a question, when to choose Solr and when to choose Lucene.

Read More


Lucene / Solr as NoSQL database

  • lucene solr no-sql nosql document-store

Lucene and Solr are most popular and widely used search engine. It indexes the content and delivers the search result faster. It has all capabilities of NoSQL database. This article describes about its pros and cons.

Read More


Top 15 Open source alternative to Microsoft products

  • microsoft-alternative open-source-enterprise

Microsoft is monopoly in the commercial software. Here are 15 best alternatives to most popular and widely used Microsoft products.

Read More


How to make money from Open Source

  • opensource how-to money

As open source getting popular day by day, many have questions like How to make money from Open Source? Lot more products are getting introduced and don't know who is making money. Certainly open source means, give the product and source for free then how to make money? Yes sell the product for free but get paid for its services.

Read More


LucidWorks Vs SearchBlox - Enterprise Search Solution

  • lucene solr searchblox lucidworks enterprise-search

Enterprise search software should be capable to search the data available in the entire organization or personnel desktop. The data could be in File system, Web or in Database. It should search contents of Emails, file formats like doc, xls, ppt, pdf and lot more. There are many commercial products available but LucidWorks and SearchBlox are best and free.

Read More


Why require Searchengine? Why not use database for full text search in Enterprise application.

  • searchengine database

Most of the database has support of full text search, basically indexing and saarching. MySQL, Oracle and many more databases has in-built full text search. Then what is the need to go for external search engine like Lucene, Sphinx, Solr etc. Check out the advantage of using Searchengine.

Read More


An introduction to LucidWorks Enterprise Search

  • lucene solr search engine enterprise

Lucidworks Enterprise search solution is built on top of Apache Solr. It scales seamlessly w/sub-second response times under extreme query loads for multi-billion document collections. It has user friendly UI, which does all the job of configuration and search.

Read More


Restrict Solr Admin Access

  • solr searchengine tips

Solr is a search engine built on top of Lucene. It supports REST interface and has lot of built-in capabilities. Solr package has Admin UI interface which has support to perform query and even delete the contents of the index. If you are using Solr in production then you may need to restrict access. I saw couple of questions in the group related to this topic. Thought to write an article explaining few tips to restrict the user access to Solr admin UI.

Read More