Displaying 1 to 8 from 8 results

gensim - Topic Modelling for Humans


Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

Lemur - Search Engine


The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software. The project is best known for its Indri search engine, Lemur Toolbar, and ClueWeb09 dataset.

Terrier - Information Retrieval Platform


Terrier is a highly flexible, efficient, and effective open source search engine, readily deployable on large-scale collections of documents. Terrier implements state-of-the-art indexing and retrieval functionalities, and provides an ideal platform for the rapid development and evaluation of large-scale retrieval applications. Terrier can index large corpora of documents, and provides multiple indexing strategies, such as multi-pass, single-pass and large-scale MapReduce indexing.

RankyMcRankFace - Hardened Fork of Ranklib learning to rank library


This project is OpenSource Connections API-compatible fork of Ranklib, deployed on Maven, with various improvements making it easier to integrate with the Elasticsearch Learning to Rank Plugin.It is under the com.o19s:RankyMcRankFace Maven namespace.




BM25Transformer - (Python) transform a document-term matrix to an Okapi/BM25 representation


This library transforms a document-term matrix to a Okapi/BM25 representation. API of this library inherits from sklearn.feature_extraction.text.TfidfTransformer.

pke - Python Keyphrase Extraction module


pke is an open source python-based keyphrase extraction toolkit. It provides an end-to-end keyphrase extraction pipeline in which each component can be easily modified or extented to develop new approaches. pke also allows for easy benchmarking of state-of-the-art keyphrase extraction approaches, and ships with supervised models trained on the SemEval-2010 dataset. pke works only for Python 2.x at the moment.

indonesian-nlp-playground - Repositori personal terkait penelitian linguistik bahasa Indonesia


Sesuai namanya, ini adalah repositori personal terkait penelitian linguistik bahasa Indonesia. Semua yang ada di repositori ini sifatnya eksperimental dan sewaktu-waktu dapat berubah menurut petunjuk rumput yang bergoyang atau menurut menu makan siang di restoran Mbah Jingkrak.