We would like to thank all participants for their submissions and we hope that this challenge provided insights to practitioners and researchers about the challenges in semantic code search and motivated new research. We would like to encourage everyone to continue using the dataset and the human evaluations, which we now provide publicly. Please, see below for details, specifically the Evaluation section. No new submissions to the challenge will be accepted.
nlp data-science data machine-learning natural-language-processing deep-learning tensorflow ml cnn open-data neural-networks rnn datasets representation-learning nlp-machine-learning bert programming-language-theory self-attention machine-learning-on-source-codeCode and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training. The easiest way to try out TAPAS with free GPU/TPU is in our Colab, which shows how to do predictions on SQA.
tensorflow question-answering nlp-machine-learning table-parsingA Python port of the Apache Tika library that makes Tika available using the Tika REST Server. This makes Apache Tika available as a Python library, installable via Setuptools, Pip and Easy Install.
tika-server tika-python tika-server-jar parser-interface parse translation-interface usc text-extraction mime buffer memex text-recognition detection recognition nlp nlp-machine-learning nlp-libraryThe Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated multi-domain, task-oriented conversations between a human and a virtual assistant. These conversations involve interactions with services and APIs spanning 20 domains, ranging from banks and events to media, calendar, travel, and weather. For most of these domains, the dataset contains multiple different APIs, many of which have overlapping functionalities but different interfaces, which reflects common real-world scenarios. The wide range of available annotations can be used for intent prediction, slot filling, dialogue state tracking, policy imitation learning, language generation, user simulation learning, among other tasks in large-scale virtual assistants. Besides these, the dataset has unseen domains and services in the evaluation set to quantify the performance in zero-shot or few shot settings. The dataset is provided "AS IS" without any warranty, express or implied. Google disclaims all liability for any damages, direct or indirect, resulting from the use of this dataset.
dialogue assistant dataset nlp-machine-learning dialogue-systemsMelusine is a high-level Python library for email classification and feature extraction, written in Python and capable of running on top of Scikit-Learn, Tensorflow 2 and Keras. Integrated models runs with Tensorflow 2.2. It is developed with a focus on emails written in French. Melusine is compatible with Python >= 3.6.
emails datascience nlp-machine-learningpackage lingo provides the data structures and algorithms required for natural language processing.Specifically, it provides a POS Tagger (lingo/pos), a Dependency Parser (lingo/dep), and a basic tokenizer (lingo/lexer) for English. It also provides data structures for holding corpuses (lingo/corpus), and treebanks (lingo/treebank).
natural-language-processing nlp nlp-library nlp-parsing nlp-dependency-parsing nlp-machine-learning language-model part-of-speech-tagger part-of-speech inflection conll-uI was rewatching The Da Vinci Code the other day and came across an incredible scene near the start where Robert and Sophie, the two leading protagonists playing detective roles, stumble across an anagram puzzle in the Louvre Museum in Paris. It was a dark night and their lives depended on them cracking the code quickly! Silas, the ruthless Opus Dei Zealot, was out for blood. An anagram is a word, phrase, or name formed by rearranging the letters of another word. For example: car => rac. These two words are anagrams of each other.
nlp-machine-learning nlp combinatorics interview-questionsMarseille learns to predict argumentative proposition types and the support relations between them, as inference in a expressive factor graph. Vlad Niculae, Joonsuk Park, Claire Cardie. Argument Mining with Structured SVMs and RNNs. In: Proceedings of ACL, 2017.
nlp nlp-machine-learning argumentation discourse-analysis structured-learning deep-learning machine-learning natural-language-processingReuters-21578 multi-class multi-label Classification with Keras
keras nlp-machine-learning reuters-corpus multi-class multi-label classificationAn implementation of a geonames.org-based Gazetteer using Apache Lucene.
gazetteer geonames lucene irds geoindex allcountries opennlp nlp nlp-machine-learning apacheAbstract: Portmanteaus are a word formation phenomenon where two words are combined to form a new word. We propose character-level neural sequence-to-sequence (S2S) methods for the task of portmanteau generation that are end-to-end-trainable, language independent, and do not explicitly use additional phonetic information. We propose a noisy-channel-style model, which allows for the incorporation of unsupervised word lists, improving performance over a standard source-to-target model. This model is made possible by an exhaustive candidate generation strategy specifically enabled by the features of the portmanteau task. Experiments find our approach superior to a state-of-the-art FST-based baseline with respect to ground truth accuracy and human evaluation. Code/ contains the code. Data/ contains the dataset.
nlp nlp-machine-learning nlg seq2seq nlproc sequence-to-sequence char-rnn char-embeddingA PureScript, browser-based implementation of latent Dirichlet allocation (LDA) topic modeling. Able to take in two or more documents and soft cluster them by up to four topics. Try it at lettier.com/lda-topic-modeling. Read more about LDA.
lda topic-modeling data-science natural-language-processing nlp nlp-machine-learning purescript thermite machine-learning bayesian gibbs-sampling latent-dirichlet-allocation functional-programming clustering reactive-programming reactive machine-learning-algorithms bulma bulma-css text-miningopencog-ull package will be installed to your virtual environment. Command line scripts from src/cli-scripts are copied to /bin subdirectory in your virtual environment. Command line scripts (which are located in src/cli-scripts) can be run from any location. In activated virtual environment type the name of the script you need to run.
nlp nlp-machine-learningThis package provides a sequence tagger implementation customized for Arabic features, including a named entity detection model especially intended for Arabic Wikipedia. It was trained on labeled ACE and ANER data as well as an unlabeled Wikipedia corpus. Learning is with the structured perceptron, optionally in a cost-augmented fashion. Feature extraction is handled as a preprocessing step prior to learning/decoding. The Java tagger was adapted from Michael Heilman's supersense tagger implementation for English (http://www.ark.cs.cmu.edu/mheilman/questions/). It requires a minimum Java version of 1.6. Feature extraction uses Python and depends on the MADA toolkit (http://www1.ccls.columbia.edu/MADA/; version 3.1 was used for the Named Entity Corpus).
arabic arabic-language arabic-nlp arabic-wikipedia tagger named-entities nlp sequence-tagger nlp-machine-learningMiniCat is short for Mini Text Categorizer. It is recommended to use a Virtual Environment, but not required. Installing the above dependencies in a new virtual environment allows you to run the sample without changing global python packages on your system.
machinelearning text-classifier convolutional-neural-networks tensorflow nlp-machine-learningDuring the time that I was writing my bachelor's thesis Sequence-to-Sequence Learning of Financial Time Series in Algorithmic Trading (in which I used LSTM-based RNNs for modeling the thesis problem), I became interested in natural language processing. After reading Andrej Karpathy's blog post titled The Unreasonable Effectiveness of Recurrent Neural Networks, I decided to give text generation using LSTMs for NLP a go. Although slightly trivial, the project still comprises an interesting program and demo, and gives really interesting (and sometimes very funny) results. I implemented the program over the course of a weekend in Hy (a LISP built on top of Python) using Keras and TensorFlow. You can train the model on any text sources you like. Remember to give it enough time to go over at least fifty epochs, otherwise the generated text will not be very interesting, rather seemingly random garbage.
lstm lstm-neural-networks rnn tensorflow tensorflow-experiments keras text-generation natural-language-processing nlp-machine-learning machine-learning lisp hylang keras-neural-networks artificial-intelligence artificial-neural-networks recurrent-neural-networksTry it here. In this repo one can find code for training and infering intent classification that is presented as shallow-and-wide Convolutional Neural Network[1].
intent-classification natural-language-processing natural-language-understanding neural-networks nlp-machine-learningIn this repo you can find several neural network architectures for named entity recognition from the paper "Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition" https://arxiv.org/pdf/1709.09686.pdf, which is inspired by LSTM+CRF architecture from https://arxiv.org/pdf/1603.01360.pdf. NER class from ner/network.py provides methods for construction, training and inference neural networks for Named Entity Recognition.
nlp-machine-learning named-entity-recognition neural-network deep-learning natural-language-understanding natural-language-processingRETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:
stt openstt voice nlu nlp nlp-machine-learning speech-recognition speech-to-text speech-processingIt consists of examples, assignments discussed in data science/analytics course at algorithmica. It also helps us to do build solutions to assignment problems collaboratively. You can push solutions to solutions branch created inside assignments section.
algorithms datastructures problem-solving coding-interview-challenges deep-learning machine-learning nlp-machine-learning
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.