Lucene / Solr as NoSQL database
Lucene and Solr are most popular and widely used search engine. It indexes the content and delivers the search result faster. It has all capabilities of NoSQL database. This article describes about its pros and cons.
NoSQL database should have following capabilities
- Does not use SQL as its query language
- May not give full ACID guarantees
- Stores semi structured data
- Ability to store and retrieve faster
- No relationship between records
- Distributed, Scalable
- Clients could be written in any programming language
When you pick a SQL / NoSQL database, it will certainly have more number of database related features than Lucene / Solr. Database offers better storage persistence.
Lucene / Solr is designed for search engine but it has some capabilities of database. You could very well use it as document store but make sure you have persisted data some where else. Lucene / Solr cannot be used as primary database / data store. Data should be stored in a persistent store which could be file system or database or any archiving store. Data will also be stored and indexed in Lucene / Solr. Application will retrieve the information from Lucene db. At the end we have to make our application faster.
Guardian is using Solr as its database
Lucene and Solr as NoSQL database
comments powered by Disqus
Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.
Infinispan is an extremely scalable, highly available key/value NoSQL datastore and distributed data grid platform. The purpose of Infinispan is to expose a data structure that is highly concurrent, designed ground-up to make the most of modern multi-processor/multi-core architectures while at the same time providing distributed cache capabilities. Infinispan offers enterprise features such as efficient eviction algorithms to control memory usage as well as JTA compatibility.
Hypertable is based on Google's Bigtable Design, which is a proven scalable design that powers hundreds of Google services. Many of the current scalable NoSQL database offerings are based on a hash table design which means that the data they manage is not kept physically ordered. Hypertable keeps data physically sorted by a primary key and it is well suited for Analytics.
Carrot2 is an Open Source Search Results Clustering Engine. It could cluster the search results from various sources and generates small collection of documents. Carrot2 offers ready-to-use components for fetching search results from various sources including YahooAPI, GoogleAPI, Bing API, eTools Meta Search, Lucene, SOLR, Google Desktop and more.
HBase provides support to handle BigTable - billions of rows X millions of columns. It is a scalable, distributed, versioned, column-oriented store modeled after Google's Bigtable and runs on top of HDFS (Hadoop Distributed Filesystem). It features compression, in-memory operation per-column. Data could be replicated between the nodes. HBase is used in Facebook and Twitter.
HyperGraphDB is a general purpose, open-source data storage mechanism based on a powerful knowledge management formalism known as directed hypergraphs. While a persistent memory model designed mostly for Knowledge management, Artificial Intelligence and Semantic web projects, it can also be used as an embedded object-oriented database for Java projects of all sizes. It could also be used as graph database or as (non-SQL) relational database.
Terrastore is a modern document store which provides advanced scalability and elasticity features without sacrificing consistency. Terrastore is based on Terracotta, so it relies on an industry-proven, fast (and cool) clustering technology. Terrastore is accessed through the universally supported HTTP protocol. Terrastore is a distributed document store supporting single-cluster and multi-cluster deployments.
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.