Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.
gensim topic-modeling information-retrieval machine-learning natural-language-processing nlp data-science data-mining word2vec word-embeddings text-summarization neural-network document-similarity word-similarity fasttextDeep neural network to extract intelligent information from invoice documents. The InvoiceNet logo was designed by Sidhant Tibrewal. Check out his work for some more beautiful designs.
information-retrieval deep-neural-networks deep-learning invoices keras information-extraction classification invoice billing deeplearning keras-neural-networks invoice-pdf invoice-management keras-tensorflow invoice-software invoice-insight invoice-parserHaystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want to perform Question Answering or semantic document search, you can use the State-of-the-Art NLP models in Haystack to provide unique search experiences and allow your users to query in natural language. Haystack is built in a modular fashion so that you can combine the best technology from other open-source projects like Huggingface's Transformers, Elasticsearch, or Milvus.
search nlp search-engine elasticsearch information-retrieval pytorch question-answering summarization transfer-learning ann language-model semantic-search squad bert dpr retriever neural-search natural-languageWe envision that this library will provide a convenient open platform for hosting and advancing state-of-the-art ranking models based on deep learning techniques, and thus facilitate both academic research and industrial applications. TF-Ranking was presented at premier conferences in Information Retrieval, SIGIR 2019 and ICTIR 2019! The slides are available here.
machine-learning information-retrieval deep-learning ranking learning-to-rank recommender-systemsAccelerated deep learning R&D
infrastructure machine-learning natural-language-processing information-retrieval research reinforcement-learning computer-vision deep-learning text-classification distributed-computing image-processing pytorch image-classification metric-learning recommender-system object-detection image-segmentation reproducibility text-segmentationThe Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software. The project is best known for its Indri search engine, Lemur Toolbar, and ClueWeb09 dataset.
searchengine search-engine full-text-search lucene-alternative search information-retrievalTerrier is a highly flexible, efficient, and effective open source search engine, readily deployable on large-scale collections of documents. Terrier implements state-of-the-art indexing and retrieval functionalities, and provides an ideal platform for the rapid development and evaluation of large-scale retrieval applications. Terrier can index large corpora of documents, and provides multiple indexing strategies, such as multi-pass, single-pass and large-scale MapReduce indexing.
searchengine search-engine full-text-search lucene-alternative search information-retrievalA full-text search engine with HTTP API and programmable read/write pipelines. To provide full-text search words and phrases are extracted from documents and mapped to a 2 billion dimensional vector-space that form clusters of syntactically similar "bag-of-chars". In this language model, each character (glyph) is encoded as a 32-bit word (an int), and each word or phrase alike encoded as a 32-bit wide (but sparse) array.
information-retrieval search-engine dotnet-core vector-space-model lsm-treeallRank provides an easy and flexible way to experiment with various LTR neural network models and loss functions. It is easy to add a custom loss, and to configure the model and the training procedure. We hope that allRank will facilitate both research in neural LTR and its industrial applications. To help you get started, we provide a run_example.sh script which generates dummy ranking data in libsvm format and trains a Transformer model on the data using provided example config.json config file. Once you run the script, the dummy data can be found in dummy_data directory and the results of the experiment in test_run directory. To run the example, Docker is required.
machine-learning information-retrieval deep-learning pytorch transformer ranking learning-to-rank ndcg click-modelThis project is OpenSource Connections API-compatible fork of Ranklib, deployed on Maven, with various improvements making it easier to integrate with the Elasticsearch Learning to Rank Plugin.It is under the com.o19s:RankyMcRankFace Maven namespace.
machine-learning search information-retrievalThis library transforms a document-term matrix to a Okapi/BM25 representation. API of this library inherits from sklearn.feature_extraction.text.TfidfTransformer.
machine-learning information-retrieval scikit-learn natural-language-processingA curated list of NLP resources for Hungarian
nlp natural-language-processing text-mining information-retrieval information-extraction hungarian hungarian-language awesome awesome-list nlu natural-language-understanding opinion-mining named-entity-recognition tagger dataset nlp-resources parser corpus-linguistics computational-linguistics corpusThis project demonstrates using the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features. The script can iterate over all files in the current directory or given files by command line and derives their metadata features, then computes the union of all features. The union of all features become the "golden feature set" that all document features are compared to via intersect. The length of that intersect per file divided by the length of the unioned set becomes the similarity score.
similarity-score machine-learning clustering information-retrieval cosine-similarity cosine-distance tika jaccard-similarity tika-similarity metadata-features tika-python⚠️ You need a CUDA-compatible GPU (compute capability 5.2+) to use this software. cuNVSM is a C++/CUDA implementation of state-of-the-art NVSM and LSE representation learning algorithms.
vector-space-model information-retrieval machine-learning neural-networks representation-learningpyndri is a Python interface to the Indri search engine (http://www.lemurproject.org/indri/). During development, we use Python 3.5. Some of the examples require numpy.
indri-search-engine research information-retrievalpytrec_eval is a Python interface to TREC's evaluation tool, trec_eval. It is an attempt to stop the cultivation of custom implementations of Information Retrieval evaluation measures for the Python programming language. The module was developed using Python 3.5. You need a Python distribution that comes with development headers. In addition to the default Python modules, numpy and scipy are required.
information-retrieval evaluationThe Semantic Entity Retrieval Toolkit (SERT) is a collection of neural entity retrieval algorithms. SERT requires Python 3.5 and assorted modules. The trec_eval utility is required for evaluation and the end-to-end scripts. If you wish to train your models on GPGPUs, you will need a GPU compatible with Theano.
representation-learning deeplearning neural-network information-retrievalpke is an open source python-based keyphrase extraction toolkit. It provides an end-to-end keyphrase extraction pipeline in which each component can be easily modified or extented to develop new approaches. pke also allows for easy benchmarking of state-of-the-art keyphrase extraction approaches, and ships with supervised models trained on the SemEval-2010 dataset. pke works only for Python 2.x at the moment.
keyphrase-extraction natural-language-processing information-retrieval computational-linguistics semeval-2010 topicrank tf-idf kea wingnusSesuai namanya, ini adalah repositori personal terkait penelitian linguistik bahasa Indonesia. Semua yang ada di repositori ini sifatnya eksperimental dan sewaktu-waktu dapat berubah menurut petunjuk rumput yang bergoyang atau menurut menu makan siang di restoran Mbah Jingkrak.
natural-language-processing information-retrievalOSINT Threat Intel Interface - Named after the old Norse God of knowledge. Mimir functions as a CLI to HoneyDB which in short is an OSINT aggregative threat intel pool. Starting the program brings you to a menu the options for which are as follows.
osint threatintel intel honeypot honeydb cli interface information-retrieval ioc nmap scan-tool
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.