Lusql - Plugable pipelined threaded extract transform load (ETL), default from JDBC to Lucene

  •        0

LuSql is a simple but powerful tool for building Lucene indexes from relational databases. It is a command-line Java application for the construction of a Lucene index from an arbitrary SQL query of a JDBC-accessible SQL database. It allows a user to control a number of parameters, including the SQL query to use, individual indexing/storage/term-vector nature of fields, analyzer, stop word list, and other tuning parameters. In its default mode it uses threading to take advantage of multiple cores.

LuSql can handle complex queries, allows for additional per record sub-queries, and has a plug-in architecture for arbitrary Lucene document manipulation. Its only dependencies are three Apache Commons libraries, the Lucene core itself, and a JDBC driver.

LuSql has been extensively tested, including a large 6+ million full-text & article metadata document collection, producing an 86GB Lucene index.



comments powered by Disqus

Related Projects


Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.


Compass is a real time searchengine. It is built on top of lucene. It is transactional, distributed, supports Spring MVC, integrates with Hibernate.

Booksleeve - pipelined .NET bindings for redis

By offering pipelined, asynchronous, multiplexed and thread-safe access to redis, BookSleeve enables efficient redis access even for the busiest applications.

Katta - Lucene and more in the cloud.

Katta is a scalable, failure tolerant, distributed, data storage for real time access. Katta serves large, replicated, indices as shards to serve high loads and very large data sets. These indices can be of different type. Currently implementations are available for Lucene and Hadoop mapfiles.

Semantic Vectors - Creating and Searching Semantic Vector using Lucene

The Semantic Vectors package uses a Random Projection algorithm, a form of automatic semantic analysis. Other methods supported by the package include Latent Semantic Analysis (LSA) and Reflective Random Indexing. Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text. This library is used in semantic analysis and text mining.


Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.

Torque 3D - Game Engine

Torque 3D is the best full source, open source solution available. orque 3D comes equipped with a full suite of tools to allow your team to excel and produce high-quality games and simulations. Its feature include World Editing Suite, Lighting, Programming, Terrain, Asset pipeline, Networking.


Nutch is open source web-search software. It builds on Lucene Java, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.

CUBRID - RDBMS Optimized for Web

CUBRID is a relational database management system highly optimized for Web Applications, especially when complex web services process large amount of data and generate huge concurrent requests. CUBRID is being developed in C/C++. Includes HA, online incremental backup, Replication, Load balancing, Sharding, Caching and other features. It supports JDBC, PHP, ODBC/.NET, Ruby & Python APIs.

zlib - A Massively Spiffy Yet Delicately Unobtrusive Compression Library

zlib is a general purpose data compression library. All the code is thread safe. It is ported to different programming languages like Java, CSharp, Python and Perl.