pyndri - pyndri is a Python interface to the Indri search engine.

  •        33

pyndri is a Python interface to the Indri search engine (http://www.lemurproject.org/indri/). During development, we use Python 3.5. Some of the examples require numpy.

http://ilps.science.uva.nl
https://github.com/cvangysel/pyndri

Tags
Implementation
License
Platform

   




Related Projects

Lemur - Search Engine

  •    Java

The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software. The project is best known for its Indri search engine, Lemur Toolbar, and ClueWeb09 dataset.

The Lemur Project

  •    Java

The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software, including the Indri search engine and ClueWeb09 dataset.

Terrier - Information Retrieval Platform

  •    Java

Terrier is a highly flexible, efficient, and effective open source search engine, readily deployable on large-scale collections of documents. Terrier implements state-of-the-art indexing and retrieval functionalities, and provides an ideal platform for the rapid development and evaluation of large-scale retrieval applications. Terrier can index large corpora of documents, and provides multiple indexing strategies, such as multi-pass, single-pass and large-scale MapReduce indexing.

Xapian - Search Engine Library

  •    C++

Xapian is an Open Source Search Engine Library. It is written in C++, with bindings to allow use from Perl, Python, PHP, Java, Tcl, C# and Ruby. Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.

Information Retrieval Toolkit

  •    C++

High-performance software for information retrieval research. Emphasis on semi-structured text retrieval, especially for HTML and XML. The goal is to facilitate information retrieval research by providing an interchangable toolkit of functions.


Ferret - The extensible information retrieval library for ruby.

  •    Ruby

Ferret is an information retrieval library in the same vein as Apache Lucene. Originally it was a full port of Lucene but it now uses it's own file format and indexing algorithm although it is still very similar in many ways to Lucene. Everything you can do in Lucene you should be able to do in Ferret.

resin - 32-bit vector space search engine

  •    CSharp

A full-text search engine with HTTP API and programmable read/write pipelines. To provide full-text search words and phrases are extracted from documents and mapped to a 2 billion dimensional vector-space that form clusters of syntactically similar "bag-of-chars". In this language model, each character (glyph) is encoded as a 32-bit word (an int), and each word or phrase alike encoded as a 32-bit wide (but sparse) array.

Mutis Full Text Search Engine

  •    ASPNET

Mutis is a Delphi port of the Lucene Search Engine. Provide a flexible API for index, catalog and search text-based information with great performance. Excelent for implement custom search engines, researching, text retrieval, data mining and more.

Zebra - Indexing and Retrieval Engine

  •    searchengine

Zebra is a high-performance, general-purpose structured text indexing and retrieval engine. It can index records in XML/SGML, MARC, e-mail archives and many other formats and allows access to them through exact boolean search expressions and relevance-ranked free-text queries.

essentia - C++ library for audio and music analysis, description and synthesis, including Python bindings

  •    Jupyter

Essentia is an open-source C++ library for audio analysis and audio-based music information retrieval released under the Affero GPL license. It contains an extensive collection of reusable algorithms which implement audio input/output functionality, standard digital signal processing blocks, statistical characterization of data, and a large set of spectral, temporal, tonal and high-level music descriptors. The library is also wrapped in Python and includes a number of predefined executable extractors for the available music descriptors, which facilitates its use for fast prototyping and allows setting up research experiments very rapidly. Furthermore, it includes a Vamp plugin to be used with Sonic Visualiser for visualization purposes. Essentia is designed with a focus on the robustness of the provided music descriptors and is optimized in terms of the computational cost of the algorithms. The provided functionality, specifically the music descriptors included in-the-box and signal processing algorithms, is easily expandable and allows for both research experiments and development of large-scale industrial applications. If you use example extractors (located in src/examples), or your own code employing Essentia algorithms to compute descriptors, you should be aware of possible incompatibilities when using different versions of Essentia.

Bobo - Faceted search library based on Lucene

  •    Java

Bobo Browse is an information retrieval technology that provides navigational browsing into a semi-structured dataset. Beyond the result set from queries and selections, Bobo Browse also provides the facets from this point of browsing. It provides support to sort documents on fields that have multiple values. It is stable and used by LinkedIn.

susper.com - Susper Decentralised Search Engine http://susper.com

  •    TypeScript

Susper is a decentralized Search Engine that uses the peer to peer system yacy and Apache Solr to crawl and index search results. This is a front-end for Susper running on Yacy server. The retrieval of search results is done using YaCy search API.

tntsearch - A fully featured full text search engine written in PHP

  •    PHP

We created also some demo pages that show tolerant retrieval with n-grams in action. The package has bunch of helper functions like jaro-winkler and cosine similarity for distance calculations. It supports stemming for English, Croatian, Arabic, Italian, Russian, Portuguese and Ukrainian. If the built in stemmers aren't enough, the engine lets you easily plugin any compatible snowball stemmer. Some forks of the package even support Chinese. Unlike many other engines, the index can be easily updated without doing a reindex or using deltas.

Jumper - Collaborative search engine in PHP

  •    PHP

Jumper 2.0 is a collaborative community search platform that revolutionizes search by crowdsourcing knowledge management powered by a shared bookmarking engine. It is easily and quickly deployed into a community of practice that benefits users with complex and specialized search requirements. Jumper delivers universal search of any databases, flat files, fileshares, content systems, web pages, blogs and wikis, even people - through one simple search box.

FSSearchIndexFX - A cross platform information retrieval API framework

  •    

FSSearchIndexFX is a cross platform Information Retrieval (IR) framework written in C# and supports both Windows and Mac OSX OSes It aims at developers writing or looking for some basic infrastructure API needed to perform IR tasks such as searching and indexing of text content.

madmom - Python audio and music signal processing library

  •    Python

Madmom is an audio signal processing library written in Python with a strong focus on music information retrieval (MIR) tasks. The library is internally used by the Department of Computational Perception, Johannes Kepler University, Linz, Austria (http://www.cp.jku.at) and the Austrian Research Institute for Artificial Intelligence (OFAI), Vienna, Austria (http://www.ofai.at).

awesome-deep-learning-music - List of articles related to deep learning applied to music

  •    TeX

By Yann Bayle (Website, GitHub) from LaBRI (Website, Twitter), Univ. Bordeaux (Website, Twitter), CNRS (Website, Twitter) and SCRIME (Website). The role of this curated list is to gather scientific articles, thesis and reports that use deep learning approaches applied to music. The list is currently under construction but feel free to contribute to the missing fields and to add other resources! To do so, please refer to the How To Contribute section. The resources provided here come from my review of the state-of-the-art for my PhD Thesis for which an article is being written. There are already surveys on deep learning for music generation, speech separation and speaker identification. However, these surveys do not cover music information retrieval tasks that are included in this repository.

cider - "Content Integration Framework: Document Extraction and Retrieval" - A document parser framework that stores parsed entities into jena ( http://jena

  •    Java

"Content Integration Framework: Document Extraction and Retrieval" - A document parser framework that stores parsed entities into jena ( http://jena.sourceforge.net/ ) RDF vocabularies and provides knowledge-base enhanced semantic ananlysis of content. Annotated content can be used by search engines to present content navigation which will be implemented in the YaCy Search Engine

MG4J - Managing Gigabytes for Java

  •    Java

MG4J (Managing Gigabytes for Java) is a free full-text search engine for large document collections written in Java. MG4J is a highly customisable, high-performance, full-fledged search engine providing state-of-the-art features (such as BM25/BM25F scoring) and new research algorithms. The main points of MG4J are Powerful indexing, Multi-index interval semantics, Virtual fields, Clustering and lot more.

MEAnalyzer - Intel Engine Firmware Analysis Tool

  •    Python

ME Analyzer is a tool which parses Intel Engine & PMC firmware images from the (Converged Security) Management Engine, (Converged Security) Trusted Execution Engine, (Converged Security) Server Platform Services & Power Management Controller families. It can be used by end-users who are looking for all relevant firmware information such as Family, Version, Release, Type, Date, SKU, Platform etc. It is capable of detecting new/unknown firmware, checking firmware health, Updated/Outdated status and many more. ME Analyzer is also a powerful Engine firmware research analysis tool with multiple structures which allow, among others, full parsing and unpacking of Converged Security Engine (CSE) code & file system, Flash Partition Table (FPT), Boot Partition Descriptor Table (BPDT/IFWI), CSE Layout Table (LT), advanced Size detection etc. Moreover, with the help of its extensive database, ME Analyzer is capable of uniquely categorizing all supported Engine firmware as well as check for any firmware which have not been stored at the Intel Engine Firmware Repositories yet. ME Analyzer allows end-users and/or researchers to quickly analyze and/or report new firmware versions without the use of special Intel tools (FIT/FITC, FWUpdate) or Hex Editors. To do that effectively, a database had to be built. The Intel Engine Firmware Repositories is a collection of every (CS)ME, (CS)TXE & (CS)SPS firmware we have found. Its existence is very important for ME Analyzer as it allows us to continue doing research, find new types of firmware, compare same major version releases for similarities, check for updated firmware etc. Bundled with ME Analyzer is a file called MEA.dat which is required for the program to run. It includes entries for all Engine firmware that are available to us. This accommodates primarily three actions: a) Detect each firmware's Family via unique identifier keys, b) Check whether the imported firmware is up to date and c) Help find new Engine firmware sooner by reporting them at the Intel Management Engine: Drivers, Firmware & System Tools or Intel Trusted Execution Engine: Drivers, Firmware & System Tools threads respectively.