Luwak - Open Source Streaming Query Engine

  •        800

Luwak is a high performance stored query engine. Simply put, it allows you to define a set of search queries and then monitor a stream of documents for any that might match these queries: a function also known as reverse search and document routing. This is similar to Elasticsearch Percolator.

https://github.com/flaxsearch/luwak

Tags
Implementation
License
Platform

   




Related Projects

ElasticSearch - Distributed, RESTful search and analytics engine

  •    Java

Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.

Fluo - Make incremental updates to large data sets stored in Apache Accumulo

  •    Java

Apache Fluo (incubating) is an open source implementation of Percolator (which populates Google's search index) for Apache Accumulo. Fluo makes it possible to update the results of a large-scale computation, index, or analytic as new data is discovered. When combining new data with existing data, Fluo offers reduced latency when compared to batch processing frameworks (e.g Spark, MapReduce).

Sphinix - Search server

  •    C++

Sphinix is free open-source SQL full-text search engine. How do you implement full-text search for that 10+ million row table, keep up with the load, and stay relevant? Sphinx is good at those kinds of riddles.

match - :crystal_ball: Scalable reverse image search built on Kubernetes and Elasticsearch

  •    Python

Match makes it easy to search for images that look similar to each other. Using a state-of-the-art perceptual hash, it is invariant to scaling and 90 degree rotations. Its HTTP API is quick to integrate and flexible for a number of reverse image search applications. Kubernetes and Elasticsearch allow Match to scale to billions of images with ease while giving you full control over where your data is stored. Match uses the awesome ascribe/image-match under the hood for most of the image search legwork. The number of gunicorn workers to spin up.

Vespa - Yahoo's big data serving engine

  •    Java

Vespa is an engine for low-latency computation over large data sets. It stores and indexes your data such that queries, selection and processing over the data can be performed at serving time. Vespa is serving platform for Yahoo.com, Yahoo News, Yahoo Sports, Yahoo Finance, Yahoo Gemini, Flickr.


LuMongo - Realtime Time Distributed Search

  •    Java

LuMongo is a real-time distributed search and storage system based on Lucene. LuMongo is designed from the ground up to scale both vertically and horizontally across servers. LuMongo stores Lucene indexes directly into MongoDB. Documents can be stored natively into MongoDB. When stored natively document can be queried as normal out of MongoDB and use of Map-Reduce and the Aggregation Framework is possible.

Electronic Document Discovery

  •    CSharp

AJAX,document discovery search system for legal discovery,email search amp; other types of docs i.e. MSWord,Excel, TIFF, etc. Uses SQL Server FTS - full text search, to index emails amp; attachments. Documents are stored in SQL Server Written in C# ASP.NET

Open Search Server

  •    C++

Open Search Server is both a modern crawler and search engine and a suite of high-powered full text search algorithms. Built using the best open source technologies like lucene, zkoss, tomcat, poi, tagsoup. Open Search Server is a stable, high-performance piece of software.

Lucene - A high-performance, full-featured text search engine library

  •    Java

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

elasticsearch-learning-to-rank - Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch

  •    Java

Rank Elasticsearch results using tree based (LambdaMART, Random Forest, MART) and linear models. Models are trained using the scores of Elasicsearch queries as features. You train offline using tooling such as with xgboost or ranklib. You then POST your model to a to Elasticsearch in a specific text format (the custom "ranklib" language, documented here). You apply a model using this plugin's ltr query. See blog post and the full demo (training and searching).Models are stored using an Elasticsearch script plugin. Tree-based models can be large. So we recommend increasing the script.max_size_in_bytes setting. Don't worry, just because tree-based models are verbose, doesn't nescesarilly imply they'll be slow.

reindexer - Embeddable, in-memory, document-oriented database with a high-level Query builder interface

  •    C++

Reindexer is an embeddable, in-memory, document-oriented database with a high-level Query builder interface. Reindexer's goal is to provide fast search with complex queries. We at Restream weren't happy with Elasticsearch and created Reindexer as a more performant alternative.

geocoder - Complete Ruby geocoding solution.

  •    Ruby

Geocoder is a complete geocoding solution for Ruby. With Rails, it adds geocoding (by street or IP address), reverse geocoding (finding street address based on given coordinates), and distance queries. It's as simple as calling geocode on your objects, and then using a scope like Venue.near("Billings, MT"). Please note that this README is for the current HEAD and may document features not present in the latest gem release. For this reason, you may want to instead view the README for your particular version.

LibreOffice - The Document foundation

  •    C

LibreOffice is the free power-packed Open Source personal productivity suite for Windows, Macintosh and Linux. LibreOffice is the perfect choice for home users, businesses, government and other organizations. It's native file format is the ISO standardized ODF (Open Document Format), but LibreOffice can open and save Microsoft Word, PowerPoint and Excel files, as well as many other formats, bringing you the widest-available compatibility with other products.

FLIRTDB - A community driven collection of IDA FLIRT signature files

  •    Max

Fast Library Identification and Recognition Technology, also known as FLIRT, is IDA's internal symbols identifier that searches through disassembled binaries in order to locate, rename, and highlight known library subroutines. FLIRT elimates the need to analyze functions that could be understood simply by reading documentation or source code from the library it came from and reduces the amount of work required in order to reverse and understand symbol-stripped binaries by a considerable amount. The input to the system is a library file (.lib on Windows) from a library of choice while the output is a signature file (.sig) stored under /sig (and only there or else IDA won't find it). Using one of the tools (plb/pcf/pelf) (provided here for paying customers) you convert all the functions in the library to signatures stored in a PAT file (.pat). The final stage in creating a signature file involves converting the generated PAT file into a .sig file usable by IDA with the use of sigmake. The problem with this is that sometimes collisions will exist for signatures since the method Hex-Rays uses is not fool proof. When an error occurs an EXC (.exc) file is created. In order to ignore collisions, simply edit this file by removing the first few comments (lines that start with ';') and re-run sigmake.

ambar - :mag: Ambar: Document Search System

  •    

Ambar is an open-source document search and management system with automated crawling, OCR, tagging and instant full-text search.There are two editions available: Community and Enterprise. Enterprise Edition is a full featured document search and management system that can handle terabytes of data.

Seeks - An Open Decentralized Platform for Collaborative Search, Filtering and content Curation

  •    C++

Seeks acts as a personalizing Web server or proxy between you and your data feeds. Connect most search engines, RSS/ATOM feeds, Twitter / Identica, Youtube / Dailymotion, Wikis, and basically any source of data, and Seeks will produce a fused personalized batch / stream of results to your queries. Its specific purpose is to regroup users whose queries are similar so they can share both the query results and their experience on these results.

Easy Full-Text Search Queries

  •    

Very lightweight class to convert user-friendly search queries into Microsoft SQL Server Full-Text Search queries. Gracefully handles errors in input query.

grpc-proxy - gRPC proxy is a Go reverse proxy that allows for rich routing of gRPC calls with minimum overhead

  •    Go

The project now exists as a proof of concept, with the key piece being the proxy package that is a generic gRPC reverse proxy handler. The package proxy contains a generic gRPC reverse proxy handler that allows a gRPC server to not know about registered handlers or their data types. Please consult the docs, here's an exaple usage.

RavenDB - NoSQL Database .NET

  •    CSharp

Raven is an document database for the .NET/Windows platform. Raven offers a flexible data model design to fit the needs of real world systems. Raven stores schema-less JSON documents, allow you to define indexes using Linq queries and focus on low latency and high performance.

Solr - Blazing-fast, open source enterprise search platform

  •    Java

Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.