Displaying 1 to 20 from 35 results

gensim - Topic Modelling for Humans

  •    Python

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

Haystack - Build a natural language interface for your data

  •    Python

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want to perform Question Answering or semantic document search, you can use the State-of-the-Art NLP models in Haystack to provide unique search experiences and allow your users to query in natural language. Haystack is built in a modular fashion so that you can combine the best technology from other open-source projects like Huggingface's Transformers, Elasticsearch, or Milvus.

ranking - Learning to Rank in TensorFlow

  •    Python

We envision that this library will provide a convenient open platform for hosting and advancing state-of-the-art ranking models based on deep learning techniques, and thus facilitate both academic research and industrial applications. TF-Ranking was presented at premier conferences in Information Retrieval, SIGIR 2019 and ICTIR 2019! The slides are available here.




Lemur - Search Engine

  •    Java

The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software. The project is best known for its Indri search engine, Lemur Toolbar, and ClueWeb09 dataset.

Terrier - Information Retrieval Platform

  •    Java

Terrier is a highly flexible, efficient, and effective open source search engine, readily deployable on large-scale collections of documents. Terrier implements state-of-the-art indexing and retrieval functionalities, and provides an ideal platform for the rapid development and evaluation of large-scale retrieval applications. Terrier can index large corpora of documents, and provides multiple indexing strategies, such as multi-pass, single-pass and large-scale MapReduce indexing.

resin - 32-bit vector space search engine

  •    CSharp

A full-text search engine with HTTP API and programmable read/write pipelines. To provide full-text search words and phrases are extracted from documents and mapped to a 2 billion dimensional vector-space that form clusters of syntactically similar "bag-of-chars". In this language model, each character (glyph) is encoded as a 32-bit word (an int), and each word or phrase alike encoded as a 32-bit wide (but sparse) array.


allRank - allRank is a framework for training learning-to-rank neural models based on PyTorch.

  •    Python

allRank provides an easy and flexible way to experiment with various LTR neural network models and loss functions. It is easy to add a custom loss, and to configure the model and the training procedure. We hope that allRank will facilitate both research in neural LTR and its industrial applications. To help you get started, we provide a run_example.sh script which generates dummy ranking data in libsvm format and trains a Transformer model on the data using provided example config.json config file. Once you run the script, the dummy data can be found in dummy_data directory and the results of the experiment in test_run directory. To run the example, Docker is required.

RankyMcRankFace - Hardened Fork of Ranklib learning to rank library

  •    Java

This project is OpenSource Connections API-compatible fork of Ranklib, deployed on Maven, with various improvements making it easier to integrate with the Elasticsearch Learning to Rank Plugin.It is under the com.o19s:RankyMcRankFace Maven namespace.

BM25Transformer - (Python) transform a document-term matrix to an Okapi/BM25 representation

  •    Python

This library transforms a document-term matrix to a Okapi/BM25 representation. API of this library inherits from sklearn.feature_extraction.text.TfidfTransformer.

tika-similarity - Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features

  •    Python

This project demonstrates using the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features. The script can iterate over all files in the current directory or given files by command line and derives their metadata features, then computes the union of all features. The union of all features become the "golden feature set" that all document features are compared to via intersect. The length of that intersect per file divided by the length of the unioned set becomes the similarity score.

cuNVSM - Neural Vector Space Models

  •    Cuda

⚠️ You need a CUDA-compatible GPU (compute capability 5.2+) to use this software. cuNVSM is a C++/CUDA implementation of state-of-the-art NVSM and LSE representation learning algorithms.

pyndri - pyndri is a Python interface to the Indri search engine.

  •    Python

pyndri is a Python interface to the Indri search engine (http://www.lemurproject.org/indri/). During development, we use Python 3.5. Some of the examples require numpy.

pytrec_eval - pytrec_eval is an Information Retrieval evaluation tool for Python, based on the popular trec_eval

  •    C++

pytrec_eval is a Python interface to TREC's evaluation tool, trec_eval. It is an attempt to stop the cultivation of custom implementations of Information Retrieval evaluation measures for the Python programming language. The module was developed using Python 3.5. You need a Python distribution that comes with development headers. In addition to the default Python modules, numpy and scipy are required.

SERT - Semantic Entity Retrieval Toolkit

  •    Python

The Semantic Entity Retrieval Toolkit (SERT) is a collection of neural entity retrieval algorithms. SERT requires Python 3.5 and assorted modules. The trec_eval utility is required for evaluation and the end-to-end scripts. If you wish to train your models on GPGPUs, you will need a GPU compatible with Theano.

pke - Python Keyphrase Extraction module

  •    Python

pke is an open source python-based keyphrase extraction toolkit. It provides an end-to-end keyphrase extraction pipeline in which each component can be easily modified or extented to develop new approaches. pke also allows for easy benchmarking of state-of-the-art keyphrase extraction approaches, and ships with supervised models trained on the SemEval-2010 dataset. pke works only for Python 2.x at the moment.

indonesian-nlp-playground - Repositori personal terkait penelitian linguistik bahasa Indonesia

  •    Python

Sesuai namanya, ini adalah repositori personal terkait penelitian linguistik bahasa Indonesia. Semua yang ada di repositori ini sifatnya eksperimental dan sewaktu-waktu dapat berubah menurut petunjuk rumput yang bergoyang atau menurut menu makan siang di restoran Mbah Jingkrak.

Mimir - OSINT Threat Intel Interface

  •    Python

OSINT Threat Intel Interface - Named after the old Norse God of knowledge. Mimir functions as a CLI to HoneyDB which in short is an OSINT aggregative threat intel pool. Starting the program brings you to a menu the options for which are as follows.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.