Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.
gensim topic-modeling information-retrieval machine-learning natural-language-processing nlp data-science data-mining word2vec word-embeddings text-summarization neural-network document-similarity word-similarity fasttextThis project provides 100+ Chinese Word Vectors (embeddings) trained with different representations (dense and sparse), context features (word, ngram, character, and more), and corpora. One can easily obtain pre-trained vectors with different properties and use them for downstream tasks. Moreover, we provide a Chinese analogical reasoning dataset CA8 and an evaluation toolkit for users to evaluate the quality of their word vectors.
chinese chinese-word-segmentation embeddings word-embeddings vectors-trained embeddingtext2vec is an R package which provides an efficient framework with a concise API for text analysis and natural language processing (NLP). To learn how to use this package, see text2vec.org and the package vignettes. See also the text2vec articles on my blog.
word2vec text-mining natural-language-processing glove vectorization topic-modeling word-embeddings latent-dirichlet-allocationA feature-packed Python package and vector storage file format for utilizing vector embeddings in machine learning models in a fast, efficient, and simple manner developed by Plasticity. It is primarily intended to be a simpler / faster alternative to Gensim, but can be used as a generic key-vector store for domains outside NLP. Vector space embedding models have become increasingly common in machine learning and traditionally have been popular for natural language processing applications. A fast, lightweight tool to consume these large vector space embedding models efficiently is lacking.
natural-language-processing nlp machine-learning vectors embeddings word2vec fasttext glove gensim fast memory-efficient machine-learning-library word-embeddingsA very simple framework for state-of-the-art NLP. Developed by Zalando Research. A powerful syntactic-semantic tagger / classifier. Flair allows you to apply our state-of-the-art models for named entity recognition (NER), part-of-speech tagging (PoS), frame sense disambiguation, chunking and classification to your text.
pytorch nlp named-entity-recognition sequence-labeling chunking semantic-role-labeling word-embeddingsDoxygen documentation can be found here. We have walkthroughs for a few different parts of MeTA on the MeTA homepage.
nlp nlp-parsing search-engine inverted-index pos-tag text-analysis text-analytics text-classification language-modeling graph-algorithms c-plus-plus word-embeddingsWe created biomedical word and sentence embeddings using PubMed and the clinical notes from MIMIC-III Clinical Database. Both PubMed and MIMIC-III texts were split and tokenized using NLTK. We also lowercased all the words. The statistics of the two corpora are shown below. We applied fastText to compute 200-dimensional word embeddings. We set the window size to be 20, learning rate 0.05, sampling threshold 1e-4, and negative examples 10. Both the word vectors and the model with hyperparameters are available for download below. The model file can be used to compute word vectors that are not in the dictionary (i.e. out-of-vocabulary terms). This work extends the original BioWordVec which provides fastText word embeddings trained using PubMed and MeSH. We used the same parameters as the original BioWordVec which has been thoroughly evaluated in a range of applications.
natural-language-processing word-embeddings pubmed fasttext mimic-iii bionlp sentence-similarity sentence-embeddings sent2vecDeep learning is one of the most popular domains in the artificial intelligence (AI) space, which allows you to develop multi-layered models of varying complexities. This book is designed to help you grasp things, from basic deep learning algorithms to the more advanced algorithms. The book is designed in a way that first you will understand the algorithm intuitively, once you have a basic understanding of the algorithms, then you will master the underlying math behind them effortlessly and then you will learn how to implement them using TensorFlow step by step. The book covers almost all the state of the art deep learning algorithms. First, you will get a good understanding of the fundamentals of neural networks and several variants of gradient descent algorithms. Later, you will explore RNN, Bidirectional RNN, LSTM, GRU, seq2seq, CNN, capsule nets and more. Then, you will master GAN and various types of GANs and several different autoencoders.
tensorflow word-embeddings gru autoencoder gans doc2vec skip-thoughts adagrad cyclegan deep-learning-mathematics capsule-network few-shot-learning quick-thought deep-learning-scratch nadam deep-learning-math lstm-math cnn-math rnn-derivation contractive-autonencodersThis real-world scenario focuses on how a large amount of unstructured unlabeled data corpus such as PubMed article abstracts can be analyzed to train a domain-specific word embedding model. Then the output embeddings are considered as automatically generated features to train a neural entity extraction model using Keras with TensorFlow deep learning framework as backend and a small amoht of labeled data.The detailed documentation for this scenario including the step-by-step walk-through: https://review.docs.microsoft.com/en-us/azure/machine-learning/preview/scenario-tdsp-biomedical-recognition.
deep-learning deep-neural-networks natural-language-processing word-embeddings keras-neural-networks tensorflow-tutorials azure-machine-learningDna2vec is an open-source library to train distributed representations of variable-length k-mers. Note that this implementation has only been tested on Python 3.5.3, but we welcome any contributions or bug reporting to make it more accessible.
computational-biology bioinformatics word-embeddings embeddings machine-learning neural-network word2vec ml nlpAlso, check out this link to download the final .bin model and the preprocessed dataset.
fasttext language-detection word-embeddings text-classificationPrograms with word vectors, RNN, NLP stuff, etc
tensorflow deep-learning word-embeddings natural-language-processingR wrapper for fastText C++ code from Facebook. fastText is a library for efficient learning of word representations and sentence classification.
fasttext rstats machine-learning nlp classification word-embeddings text-classification prediction embeddedProject dense vector representations of texts on a 2D plan to better understand neural models applied to NLP. Since the famous word2vec, embeddings are everywhere in NLP (and other close areas like IR). The main idea behind embeddings is to represent texts (made of characters, words, sentences, or even larger blocks) as numeric vectors. This works very well and provides some abilities unreachable with the classic BoW approach. However, embeddings (e.g. vector representations) are difficult to understand, analyze (and debug) for humans because they are made of much more than just 3 dimensions.
word-embeddings pca tsne plot graph shiny rstatsFor more info on Azure ML Workbench compute targets see documentation. Download the word2vec embeddings and emoji2vec embeddings and update respective paths in config.py.
deep-learning word-embeddings emojisIn this tutorial, we'll see how to convert GloVe embeddings to TensorFlow layers. This could also work with embeddings generated from word2vec. First, we'll download the embedding we need.
gpu gpu-acceleration gpu-tensorflow word-embeddings word2vec glove-embeddings tensorflow cosine-similarity glove neural-network tensorflow-layersThis is the implementation of word embedding (a.k.a word representation) models in Golang. Like this example, the models generate the vectors that could calculate word meaning by arithmetic operations for other vectors.
word2vec glove machine-learning natural-language-processing shallow-neural-network gorgonia dep word-embeddingsClusterCat induces word classes from unannotated text. It is programmed in modern C, with no external libraries. A Python wrapper is also provided. Word classes are unsupervised part-of-speech tags, requiring no manually-annotated corpus. Words are grouped together that share syntactic/semantic similarities. They are used in many dozens of applications within natural language processing, machine translation, neural net training, and related fields.
clusters word-clusters machine-learning exchange-algorithm d3js word-embeddingsA demo code in Matlab for S-WMD [Supervised Word Mover's Distance, NIPS 2016] [Oral presentation video recording by Matt Kusner]. The only difference with the above datasets is that because there are pre-defined train-test splits, there are already variables BOW_xtr, BOW_xte, xtr, xte, ytr, yte.
metric-learning wasserstein word-embeddings
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.