Displaying 1 to 20 from 33 results

Chinese-Word-Vectors - 100+ Chinese Word Vectors 上百种预训练中文词向量

  •    Python

This project provides 100+ Chinese Word Vectors (embeddings) trained with different representations (dense and sparse), context features (word, ngram, character, and more), and corpora. One can easily obtain pre-trained vectors with different properties and use them for downstream tasks. Moreover, we provide a Chinese analogical reasoning dataset CA8 and an evaluation toolkit for users to evaluate the quality of their word vectors.

lightly - A python library for self-supervised learning on images.

  •    Python

Lightly is a computer vision framework for self-supervised learning. We, at Lightly, are passionate engineers who want to make deep learning more efficient. That's why - together with our community - we want to popularize the use of self-supervised methods to understand and curate raw image data. Our solution can be applied before any data annotation step and the learned representations can be used to visualize and analyze datasets. This allows to select the best core set of samples for model training through advanced filtering.

magnitude - A fast, efficient universal vector embedding utility package.

  •    Python

A feature-packed Python package and vector storage file format for utilizing vector embeddings in machine learning models in a fast, efficient, and simple manner developed by Plasticity. It is primarily intended to be a simpler / faster alternative to Gensim, but can be used as a generic key-vector store for domains outside NLP. Vector space embedding models have become increasingly common in machine learning and traditionally have been popular for natural language processing applications. A fast, lightweight tool to consume these large vector space embedding models efficiently is lacking.

hub - A library for transfer learning by reusing parts of TensorFlow models.

  •    Python

TensorFlow Hub is a library to foster the publication, discovery, and consumption of reusable parts of machine learning models. In particular, it provides modules, which are pre-trained pieces of TensorFlow models that can be reused on new tasks. If you'd like to contribute to TensorFlow Hub, be sure to review the contribution guidelines. This project adheres to TensorFlow's code of conduct. By participating, you are expected to uphold this code.

PyTorch-NLP - Supporting Rapid Prototyping with a Toolkit (incl. Datasets and Neural Network Layers)

  •    Python

PyTorch-NLP, or torchnlp for short, is a library of neural network layers, text processing modules and datasets designed to accelerate Natural Language Processing (NLP) research. Join our community, add datasets and neural network layers! Chat with us on Gitter and join the Google Group, we're eager to collaborate with you.

tensorflow-triplet-loss - Implementation of triplet loss in TensorFlow

  •    Python

This repository contains a triplet loss implementation in TensorFlow with online triplet mining. Please check the blog post for a full description. The code structure is adapted from code I wrote for CS230 in this repository at tensorflow/vision. A set of tutorials for this code can be found here.

whatlies - Toolkit to help understand "what lies" in word embeddings. Also benchmarking!

  •    Python

A library that tries to help you to understand (note the pun). This small library offers tools to make visualisation easier of both word embeddings as well as operations on them.

go2vec - Read and use word2vec vectors in Go

  •    Go

This is a package for reading word2vec vectors in Go and finding similar words and analogies.

graph-pattern-learner - Evolutionary Graph Pattern Learner that learns SPARQL queries for a given set of source-target-pairs from an endpoint

  •    Python

In this repository you find the code for a graph pattern learner. Given a list of source-target-pairs and a SPARQL endpoint, it will try to learn SPARQL patterns. Given a source, the learned patterns will try to lead you to the right target. As you can immediately see, associations don't only follow a single pattern. Our algorithm is designed to be able to deal with this. It will try to learn several patterns, which in combination model your input list of source-target-pairs. If your list of source-target-pairs is less complicated, the algorithm will happily terminate earlier.

deep-scite - :rowboat: A simple recommendation engine (by way of convolutions and embeddings) written in TensorFlow

  •    HTML

DeepScite takes in papers (titles, abstracts) and emits recommendations on whether or not they should be scited by the particular users whose data we've used for training (in the case of this repo, it is me). As output, it also gives a "goodness" score for each word; when this number is high, it has contributed strongly to the paper being (recommended) for sciting, when it is negative, it has contributed strongly to the paper not being recommended.

bier - Cleaned up reference implementation of BIER: Boosting Independent Embeddings Robustly.

  •    Python

To run the code, see ./run.sh and ./run_eval.sh. The train-images file is a numpy file consisting of images of size 256x256 and train-labels are the corresponding labels. The label indices should be between [0, total-number-of-labels) (i.e. they should be non-negative, and continuous).

knn4qa - k-nearest neighbor search for question answering (QA) and information retrieval (IR)

  •    Java

This is a learning-to-rank pipeline, which is a part of the project where we study applicability of k-nearest neighbor search methods in IR and QA applications. This project is supported primarily by the NSF grant #1618159 : "Matching and Ranking via Proximity Graphs: Applications to Question Answering and Beyond". For more details, please, check the Wiki page.

bimu - Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders

  •    Python

The individual similarity scores, presented as averages in the paper, are reported in appendix. See python3.4 examples/run_bimu.py --help for the full list of options, and set the Theano flags as THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32.

dna2vec - dna2vec: Consistent vector representations of variable-length k-mers

  •    Python

Dna2vec is an open-source library to train distributed representations of variable-length k-mers. Note that this implementation has only been tested on Python 3.5.3, but we welcome any contributions or bug reporting to make it more accessible.

cofactor - CoFactor: Regularizing Matrix Factorization with Item Co-occurrence

  •    Jupyter

This repository contains the source code to reproduce the experimental results as described in the paper "Factorization Meets the Item Embedding: Regularizing Matrix Factorization with Item Co-occurrence" (RecSys'16). Note: The code is mostly written for Python 2.7. For Python 3.x, it is still usable with minor modification. If you run into any problem with Python 3.x, feel free to contact me and I will try to get back to you with a helpful solution.

nlp-augment - A collection of utilities used in exploring data augmentation of low-resource parallel corpuses

  •    Python

A collection of utilities used in exploring data augmentation of low-resource parallel corpuses. My experiments suggest that augmenting only the source side of the parallel data with rare words is more beneficial.

GloVe-experiments - GloVe word vector embedding experiments (similar to Word2Vec)

  •    Python

This repository contains a few brief experiments with Stanford NLP's GloVe, an unsupervised learning algorithm for obtaining vector representations for words. Similar to Word2Vec, GloVe creates a continuous N-dimensional representation of a word that is learned from its surrounding context words in a training corpus. Trained on a large corpus of text, these co-occurance statistics (an N-dimensional vector embedding) cause semantically similar words to appear near each-other in their resulting N-dimensional embedding space (e.g. "dog" and "cat" may appear nearby a region of other pet related words in the embedding space because the context words that surround both "dog" and "cat" in the training corpus are similar). All three scripts use the GloVe.6B pre-trained word embeddings created from the combined Wikipedia 2014 and Gigaword 5 datasets. They were trained using 6 billion tokens and contains 400,000 unique lowercase words. Trained embeddings are provided in 50, 100, 200, and 300 dimensions (822 MB download).

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.