Displaying 1 to 20 from 50 results

CodeSearchNet - Datasets, tools, and benchmarks for representation learning of code.

  •    Jupyter

We would like to thank all participants for their submissions and we hope that this challenge provided insights to practitioners and researchers about the challenges in semantic code search and motivated new research. We would like to encourage everyone to continue using the dataset and the human evaluations, which we now provide publicly. Please, see below for details, specifically the Evaluation section. No new submissions to the challenge will be accepted.

tapas - End-to-end neural table-text understanding models.

  •    Python

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training. The easiest way to try out TAPAS with free GPU/TPU is in our Colab, which shows how to do predictions on SQA.

dstc8-schema-guided-dialogue - The Schema-Guided Dialogue Dataset

  •    

The Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated multi-domain, task-oriented conversations between a human and a virtual assistant. These conversations involve interactions with services and APIs spanning 20 domains, ranging from banks and events to media, calendar, travel, and weather. For most of these domains, the dataset contains multiple different APIs, many of which have overlapping functionalities but different interfaces, which reflects common real-world scenarios. The wide range of available annotations can be used for intent prediction, slot filling, dialogue state tracking, policy imitation learning, language generation, user simulation learning, among other tasks in large-scale virtual assistants. Besides these, the dataset has unseen domains and services in the evaluation set to quantify the performance in zero-shot or few shot settings. The dataset is provided "AS IS" without any warranty, express or implied. Google disclaims all liability for any damages, direct or indirect, resulting from the use of this dataset.




melusine - Melusine is a high-level library for emails classification and feature extraction "dédiée aux courriels français"

  •    Jupyter

Melusine is a high-level Python library for email classification and feature extraction, written in Python and capable of running on top of Scikit-Learn, Tensorflow 2 and Keras. Integrated models runs with Tensorflow 2.2. It is developed with a focus on emails written in French. Melusine is compatible with Python >= 3.6.

lingo - package lingo provides the data structures and algorithms required for natural language processing

  •    Go

package lingo provides the data structures and algorithms required for natural language processing.Specifically, it provides a POS Tagger (lingo/pos), a Dependency Parser (lingo/dep), and a basic tokenizer (lingo/lexer) for English. It also provides data structures for holding corpuses (lingo/corpus), and treebanks (lingo/treebank).

cracking-the-da-vinci-code-with-google-interview-problems-and-nlp-in-python - A guide on how to crack combinatorics puzzles shown in The Da Vinci Code movie using CS fundamentals and NLP

  •    Python

I was rewatching The Da Vinci Code the other day and came across an incredible scene near the start where Robert and Sophie, the two leading protagonists playing detective roles, stumble across an anagram puzzle in the Louvre Museum in Paris. It was a dark night and their lives depended on them cracking the code quickly! Silas, the ruthless Opus Dei Zealot, was out for blood. An anagram is a word, phrase, or name formed by rearranging the letters of another word. For example: car => rac. These two words are anagrams of each other.

marseille - Mining Argument Structures with Expressive Inference (Linear and LSTM Engines)

  •    Python

Marseille learns to predict argumentative proposition types and the support relations between them, as inference in a expressive factor graph. Vlad Niculae, Joonsuk Park, Claire Cardie. Argument Mining with Structured SVMs and RNNs. In: Proceedings of ACL, 2017.


Charmanteau-CamReady - Code for "CharManteau: Character Embedding Models For Portmanteau Creation

  •    Python

Abstract: Portmanteaus are a word formation phenomenon where two words are combined to form a new word. We propose character-level neural sequence-to-sequence (S2S) methods for the task of portmanteau generation that are end-to-end-trainable, language independent, and do not explicitly use additional phonetic information. We propose a noisy-channel-style model, which allows for the incorporation of unsupervised word lists, improving performance over a standard source-to-target model. This model is made possible by an exhaustive candidate generation strategy specifically enabled by the features of the portmanteau task. Experiments find our approach superior to a state-of-the-art FST-based baseline with respect to ground truth accuracy and human evaluation. Code/ contains the code. Data/ contains the dataset.

language-learning - OpenCog Unsupervised Language Learning

  •    Jupyter

opencog-ull package will be installed to your virtual environment. Command line scripts from src/cli-scripts are copied to /bin subdirectory in your virtual environment. Command line scripts (which are located in src/cli-scripts) can be run from any location. In activated virtual environment type the name of the script you need to run.

arabic-tagger - AQMAR Arabic Tagger: Sequence tagger with cost-augmented structured perceptron training

  •    Java

This package provides a sequence tagger implementation customized for Arabic features, including a named entity detection model especially intended for Arabic Wikipedia. It was trained on labeled ACE and ANER data as well as an unlabeled Wikipedia corpus. Learning is with the structured perceptron, optionally in a cost-augmented fashion. Feature extraction is handled as a preprocessing step prior to learning/decoding. The Java tagger was adapted from Michael Heilman's supersense tagger implementation for English (http://www.ark.cs.cmu.edu/mheilman/questions/). It requires a minimum Java version of 1.6. Feature extraction uses Python and depends on the MADA toolkit (http://www1.ccls.columbia.edu/MADA/; version 3.1 was used for the Named Entity Corpus).

MiniCat - Custom Text Classifier

  •    Python

MiniCat is short for Mini Text Categorizer. It is recommended to use a Virtual Environment, but not required. Installing the above dependencies in a new virtual environment allows you to run the sample without changing global python packages on your system.

LSTM-Text-Generation - Tons of fun with text and recurrent neural networks! Let your computer read a book and tell you its own story

  •    Hy

During the time that I was writing my bachelor's thesis Sequence-to-Sequence Learning of Financial Time Series in Algorithmic Trading (in which I used LSTM-based RNNs for modeling the thesis problem), I became interested in natural language processing. After reading Andrej Karpathy's blog post titled The Unreasonable Effectiveness of Recurrent Neural Networks, I decided to give text generation using LSTMs for NLP a go. Although slightly trivial, the project still comprises an interesting program and demo, and gives really interesting (and sometimes very funny) results. I implemented the program over the course of a weekend in Hy (a LISP built on top of Python) using Keras and TensorFlow. You can train the model on any text sources you like. Remember to give it enough time to go over at least fifty epochs, otherwise the generated text will not be very interesting, rather seemingly random garbage.

intent_classifier

  •    Python

Try it here. In this repo one can find code for training and infering intent classification that is presented as shallow-and-wide Convolutional Neural Network[1].

ner - Named Entity Recognition

  •    Python

In this repo you can find several neural network architectures for named entity recognition from the paper "Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition" https://arxiv.org/pdf/1709.09686.pdf, which is inspired by LSTM+CRF architecture from https://arxiv.org/pdf/1603.01360.pdf. NER class from ner/network.py provides methods for construction, training and inference neural networks for Named Entity Recognition.

ZZZ-RETIRED_openstt - RETIRED - OpenSTT is now retired

  •    

RETIRED - OpenSTT is now retired. If you would like more information on Mycroft AI's open source STT projects, please visit:

datascience - It consists of examples, assignments discussed in data science course at algorithmica.

  •    Python

It consists of examples, assignments discussed in data science/analytics course at algorithmica. It also helps us to do build solutions to assignment problems collaboratively. You can push solutions to solutions branch created inside assignments section.