OpenGrok is a fast and usable source code search and cross reference engine. It helps you search, cross-reference and navigate your source tree. It can understand various program file formats and version control histories like Mercurial, Git, SCCS, RCS, CVS, Subversion, Teamware, ClearCase, Perforce, Monotone and Bazaar.

OpenGrok -  A wicked fast source browser

Searchkit is a tool allowing you can quickly build a search experience using predefined GraphQL resolvers and React components. Built on the top of Apollo GraphQL, React & Elasticsearch, Searchkit makes building a high-quality API a lot easier. The library provides GraphQL resolvers dedicated to handling the most common use cases you could think of when using filtering oriented API beginning with basic operations such as simple querying, ending on more tricky ones such as efficient facet filtering or pagination.

Searchkit - GraphQL API & React UI components for Elasticsearch.

Jina is a neural search framework that empowers anyone to build SOTA & scalable deep learning search applications in minutes. It helps to build solutions for indexing, querying, understanding multi-/cross-modal data such as video, image, text, audio, source code, PDF.

Jina - Cloud-native neural search framework for any kind of data

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want to perform Question Answering or semantic document search, you can use the State-of-the-Art NLP models in Haystack to provide unique search experiences and allow your users to query in natural language. Haystack is built in a modular fashion so that you can combine the best technology from other open-source projects like Huggingface's Transformers, Elasticsearch, or Milvus.
 
Haystack can perform semantic search and retrieve documents according to meaning, not keywords. It can ask questions in natural language and find granular answers in your documents. It can automate processes by automatically applying a list of questions to new documents and using the extracted answers.

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want to perform Question Answering or semantic document search, you can use the State-of-the-Art NLP models in Haystack to provide unique search experiences and allow your users to query in natural language. Haystack is built in a modular fashion so that you can combine the best technology from other open-source projects like Huggingface's Transformers, Elasticsearch, or Milvus.

Haystack - Build a natural language interface for your data

fd is a simple, fast and user-friendly alternative to find. While it does not seek to mirror all of find's powerful functionality, it provides sensible (opinionated) defaults for 80% of the use cases.
 
It supports regular expression (default) and glob-based patterns. It is very fast due to parallelized directory traversal. It ignores hidden directories and files and .gitignore by default.

fd is a simple, fast and user-friendly alternative to find. While it does not seek to mirror all of find's powerful functionality, it provides sensible (opinionated) defaults for 80% of the use cases.

fd - A simple, fast and user-friendly alternative to 'find'

TNTSearch is a full-text search (FTS) engine written entirely in PHP. A simple configuration allows you to add an amazing search experience in just minutes. Its features include Fuzzy search, Geo-search, Text classification, Stemming, Bm25 ranking algorithm, Result highlighting, Boolean search and lot more.

TNTSearch - A fully featured full text search engine written in PHP

ripgrep is a line-oriented search tool that recursively searches the current directory for a regex pattern. By default, ripgrep will respect gitignore rules and automatically skip hidden files/directories and binary files. ripgrep has first class support on Windows, macOS and Linux, with binary downloads available for every release. ripgrep is similar to other popular search tools like The Silver Searcher, ack and grep. It is built on top of <a href="https://github.com/rust-lang/regex" target="_blank" rel="noopener">Rust's regex engine</a>. Rust's regex engine uses finite automata, SIMD and aggressive literal optimizations to make searching very fast.
ripgrep supports many features found in grep, such as showing the context of search results, searching multiple patterns, highlighting matches with color and full Unicode support. Unlike GNU grep, ripgrep stays fast while supporting Unicode (which is always on). ripgrep defaults to recursive search and does automatic filtering. It won't search files ignored by your .gitignore/.ignore/.rgignore files, it won't search hidden files and it won't search binary files.&nbsp;
&nbsp;

ripgrep is a line-oriented search tool that recursively searches the current directory for a regex pattern. By default, ripgrep will respect gitignore rules and automatically skip hidden files/directories and binary files. ripgrep has first class support on Windows, macOS and Linux, with binary downloads available for every release. ripgrep is similar to other popular search tools like The Silver Searcher, ack and grep.

ripgrep - Recursively searches directories for a regex pattern while respecting your gitignore

Typesense is an open source, typo tolerant search engine that is optimized for instant sub-50ms searches, while providing an intuitive developer experience. 
 
Features: 
<ul>
<li>Typo Tolerance: Handles typographical errors elegantly, out-of-the-box.</li>
<li>Simple and Delightful: Simple to set-up, integrate with, operate and scale.</li>
<li>Blazing Fast: Built in C++. Meticulously architected from the ground-up for low-latency (&lt;50ms) instant searches.</li>
<li>Tunable Ranking: Easy to tailor your search results to perfection.</li>
<li>Sorting: Sort results based on a particular field at query time (helpful for features like "Sort by Price (asc)").</li>
<li>Faceting &amp; Filtering: Drill down and refine results.</li>
<li>Grouping &amp; Distinct: Group similar results together to show more variety.</li>
<li>Federated Search: Search across multiple collections (indices) in a single HTTP request.</li>
<li>Scoped API Keys: Generate API keys that only allow access to certain records, for multi-tenant applications.</li>
<li>Synonyms: Define words as equivalents of each other, so searching for a word will also return results for the synonyms defined.</li>
<li>Curation &amp; Merchandizing: Boost particular records to a fixed position in the search results, to feature them.</li>
<li>Raft-based Clustering: Setup a distributed cluster that is highly available.</li>
<li>Seamless Version Upgrades: As new versions of Typesense come out, upgrading is as simple as swapping out the binary and restarting Typesense.</li>
</ul>

Typesense is an open source, typo tolerant search engine that is optimized for instant sub-50ms searches, while providing an intuitive developer experience. 

Typesense - Fast, typo tolerant search engine for building delightful search experiences

Paperless is an application by Daniel Quinn and contributors that indexes your scanned documents and allows you to easily search for documents and store metadata alongside your documents. It performs OCR on your documents, adds selectable text to image only documents and adds tags, correspondents and document types to your documents. It supports PDF documents, images, plain text files, and Office documents (Word, Excel, Powerpoint, and LibreOffice equivalents). 

Paperless-ng - A supercharged version of paperless: scan, index and archive all your physical documents

A React components library for building Airbnb / Yelp like search experiences. It is a React components library for building realtime search experiences. It is built on top of the appbase.io realtime DB service and ships with 25+ components for Lists, Dropdowns, Range Sliders, Data Search, Multi Level Menu, Calendars, Feeds Maps, Ratings Filter, Result Cards and Result Lists.
 
It has support for own declarative API to query Elasticsearch, which is called ReactiveSearch API. Query generation happens on the server-side, addressing the primary security concern around query injection.

A React components library for building Airbnb / Yelp like search experiences. It is a React components library for building realtime search experiences. It is built on top of the appbase.io realtime DB service and ships with 25+ components for Lists, Dropdowns, Range Sliders, Data Search, Multi Level Menu, Calendars, Feeds Maps, Ratings Filter, Result Cards and Result Lists.

Reactivesearch - A React components library for building awesome search UIs

Arguflow is a truly all-in-one service for hosting AI powered semantic search and LLM retrieval-augmented generation (RAG) on your data.
Semantic search is the core feature. With traditional lexical search (google, duckduckgo, etc.) you search for a word or phrase and the search engine returns results that contain that word or phrase. With semantic search you search for a concept and the search engine returns results that contain that concept.
Arguflow creates embeddings for you, store them in a vector db, and coordinate metadata joins with a transactional db. It includes best in class API for data servicing and white-label UI's for semantic / full-text search and source-cited llm-chat. User can pick any SQL database compatible with diesel for &nbsp;OLTP storage and choose any vector database of their choice.
&nbsp;

Build AI powered semantic search and LLM retrieval-augmented generation (RAG) fast


Arguflow - Build AI powered semantic search and LLM retrieval-augmented generation (RAG) fast

MG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java. MG4J is a highly customisable, high-performance, full-fledged search engine providing state-of-the-art features (such as BM25/BM25F scoring) and new research algorithms. The main points of MG4J are Powerful indexing, Multi-index interval semantics, Virtual fields, Clustering and lot more. Release 5.0 has several source and binary incompatibilities, and introduces quasi-succinct indices.

MG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java. MG4J is a highly customisable, high-performance, full-fledged search engine providing state-of-the-art features (such as BM25/BM25F scoring) and new research algorithms. The main points of MG4J are Powerful indexing, Multi-index interval semantics, Virtual fields, Clustering and lot more. 

MG4J - Managing Gigabytes for Java

Terrier is a highly flexible, efficient, and effective open source search engine, readily deployable on large-scale collections of documents. Terrier implements state-of-the-art indexing and retrieval functionalities, and provides an ideal platform for the rapid development and evaluation of large-scale retrieval applications. Terrier can index large corpora of documents, and provides multiple indexing strategies, such as multi-pass, single-pass and large-scale MapReduce indexing.

Terrier - Information Retrieval Platform

The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software. The project is best known for its Indri search engine, Lemur Toolbar, and ClueWeb09 dataset. 

Lemur - Search Engine

Yioop is an open source, PHP search engine capable of crawling, index, and providing search results for hundred of millions of pages on relatively low end hardware. It can index a variety of text formats HTML, RSS, PDF, RTF, DOC and images GIF, JPEG, PNG, etc. It can import data from ARC, WARC, Media-Wiki, Open Directory RDF. It is easily localized to many languages. It has built-in support for new feeds, discussion groups, blogs, and wikis. It also supports mixing indexes to create mash ups.

Yioop - Open Source Search Engine Software

Discover open source projects across all platforms

Projects

OpenGrok - A wicked fast source browser