SoftwareBotany.Sunlight Word Aligned Hybrid Bit Vector Search Framework

  •        87

The Software Botany Sunlight project is a search framework built using Word Aligned Hybrid Bit Vectors. Its sole purpose is to provide high performance in-memory searching of data using unknown combinations of indices. It is developed with .NET 4.0 using C#.

http://softwarebotanysun.codeplex.com/

Tags
Implementation
License
Platform

   




Related Projects

resin - 32-bit vector space search engine

  •    CSharp

A full-text search engine with HTTP API and programmable read/write pipelines. To provide full-text search words and phrases are extracted from documents and mapped to a 2 billion dimensional vector-space that form clusters of syntactically similar "bag-of-chars". In this language model, each character (glyph) is encoded as a 32-bit word (an int), and each word or phrase alike encoded as a 32-bit wide (but sparse) array.

ElasticSearch - Distributed, RESTful search and analytics engine

  •    Java

Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.

Constellio - Enterprise Search engine

  •    Java

Constellio Open Source Enterprise Search is based on Apache Solr and using Google Search Appliances connectors architecture, it allows, with a single click, to find all relevant content in your organization (Web, email, ECM, CRM etc.).

Solr - Blazing-fast, open source enterprise search platform

  •    Java

Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.

ASPseek

  •    C++

ASPseek is an Internet search engine software developed by SWsoft.ASPseek consists of an indexing robot, a search daemon, and a CGI search frontend. It can index as many as a few million URLs and search for words and phrases, use wildcards, and do a Boolean search. Search results can be limited to time period given, site or Web space (set of sites) and sorted by relevance (PageRank is used) or date.


Yioop - Open Source Search Engine Software

  •    PHP

Yioop is an open source, PHP search engine capable of crawling, index, and providing search results for hundred of millions of pages on relatively low end hardware. It can index a variety of text formats HTML, RSS, PDF, RTF, DOC and images GIF, JPEG, PNG, etc. It can import data from ARC, WARC, Media-Wiki, Open Directory RDF. It is easily localized to many languages. It has built-in support for new feeds, discussion groups, blogs, and wikis. It also supports mixing indexes to create mash ups.

loklak_search - Frontend Search for loklak server http://loklak.org

  •    TypeScript

The loklak_search creates a website using the loklak server as a data source. The goal is to get a search site, that offers timeline search as well as custom media search, account and geolocation search. In order to run the service, you can use the API of https://api.loklak.org or install your own loklak server data storage engine. loklak_server is a server application which collects messages from various social media tweet sources, including Twitter. The server contains a search index and a peer-to-peer index sharing interface. All messages are stored in an elasticsearch index.

Lux - XML Search engine

  •    Java

Lux is an open source XML search engine using Lucene /Solr and Saxon XQuery/XSLT processor. Lux provides XML-aware indexing, an XQuery 1.0 optimizer that rewrites queries to use the indexes, and a function library for interacting with Lucene via XQuery. These capabilities are tightly integrated with Solr, and leverage its application framework in order to deliver a REST service, application server, and supporting tools.

Xapian - Search Engine Library

  •    C++

Xapian is an Open Source Search Engine Library. It is written in C++, with bindings to allow use from Perl, Python, PHP, Java, Tcl, C# and Ruby. Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.

Elasticlunr.js - Lightweight full-text search engine in Javascript for browser search and offline search

  •    Javascript

Elasticlunr.js is a lightweight full-text search engine in Javascript for browser search and offline search. Elasticlunr.js is developed based on Lunr.js, but more flexible than lunr.js. Elasticlunr.js provides Query-Time boosting and field search. Elasticlunr.js is a bit like Solr, but much smaller and not as bright, but also provide flexible configuration and query-time boosting.

javaewah - A compressed alternative to the Java BitSet class

  •    Java

The bit array data structure is implemented in Java as the BitSet class. Unfortunately, this fails to scale without compression. JavaEWAH is a word-aligned compressed variant of the Java bitset class. It uses a 64-bit run-length encoding (RLE) compression scheme. The goal of word-aligned compression is not to achieve the best compression, but rather to improve query processing time. Hence, we try to save CPU cycles, maybe at the expense of storage. However, the EWAH scheme we implemented is always more efficient storage-wise than an uncompressed bitmap (implemented in Java as the BitSet class). Unlike some alternatives, javaewah does not rely on a patented scheme.

compass - Searchengine built on top of Lucene

  •    Java

Compass is a real time searchengine. It is built on top of lucene. It is transactional, distributed, supports Spring MVC, integrates with Hibernate.

Pinot - A realtime distributed OLAP datastore

  •    Java

Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally, so that it can scale to larger data sets and higher query rates as needed.

TurboPFor - Fastest Integer Compression

  •    C

Generate and test (zipfian) skewed distribution (100.000.000 integers, Block size=128/256) Note: Unlike general purpose compression, a small fixed size (ex. 128 integers) is in general used in "integer compression". Large blocks involved, while processing queries (inverted index, search engines, databases, graphs, in memory computing,...) need to be entirely decoded. (*) codecs inefficient for small block sizes are tested with 64Ki integers/block.

susper.com - Susper Decentralised Search Engine http://susper.com

  •    TypeScript

Susper is a decentralized Search Engine that uses the peer to peer system yacy and Apache Solr to crawl and index search results. This is a front-end for Susper running on Yacy server. The retrieval of search results is done using YaCy search API.

tntsearch - A fully featured full text search engine written in PHP

  •    PHP

We created also some demo pages that show tolerant retrieval with n-grams in action. The package has bunch of helper functions like jaro-winkler and cosine similarity for distance calculations. It supports stemming for English, Croatian, Arabic, Italian, Russian, Portuguese and Ukrainian. If the built in stemmers aren't enough, the engine lets you easily plugin any compatible snowball stemmer. Some forks of the package even support Chinese. Unlike many other engines, the index can be easily updated without doing a reindex or using deltas.

EWAHBoolArray - A compressed bitmap class in C++.

  •    C++

The class EWAHBoolArray is a compressed bitset data structure. It supports several word sizes by a template parameter (16-bit, 32-bit, 64-bit). You should expect the 64-bit word-size to provide better performance, but higher memory usage, while a 32-bit word-size might compress a bit better, at the expense of some performance.The library also provides a basic BoolArray class which can serve as a traditional bitmap.

Open Search Server

  •    C++

Open Search Server is both a modern crawler and search engine and a suite of high-powered full text search algorithms. Built using the best open source technologies like lucene, zkoss, tomcat, poi, tagsoup. Open Search Server is a stable, high-performance piece of software.

MG4J - Managing Gigabytes for Java

  •    Java

MG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java. MG4J is a highly customisable, high-performance, full-fledged search engine providing state-of-the-art features (such as BM25/BM25F scoring) and new research algorithms. The main points of MG4J are Powerful indexing, Multi-index interval semantics, Virtual fields, Clustering and lot more.

Blacklight - Discovery Interface for any Apache Solr

  •    Ruby

Blacklight is an open source OPAC (online public access catalog). It is ruby-on-rails based discovery interface (a.k.a. “next-generation catalog”) especially optimized for heterogeneous collections. It could be used as a library catalog, as a front end for a digital repository, or as a single-search interface to aggregate digital content that would otherwise be siloed. Blacklight uses Solr, an enterprise-scale index for its search engine.