Displaying 1 to 20 from 45 results

gensim - Topic Modelling for Humans

  •    Python

Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.

flashtext - Extract Keywords from sentence or Replace keywords in sentences.

  •    Python

This module can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm. Documentation can be found at FlashText Read the Docs.

sense2vec - ๐Ÿฆ† Use NLP to go beyond vanilla word2vec

  •    C++

sense2vec (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting, detailed and context-sensitive word vectors. For an interactive example of the technology, see our sense2vec demo that lets you explore semantic similarities across all Reddit comments of 2015. This library is a simple Python/Cython implementation for loading and querying sense2vec models. While it's best used in combination with spaCy, the sense2vec library itself is very lightweight and can also be used as a standalone module. See below for usage details.

wordvectors - Pre-trained word vectors of 30+ languages

  •    Python

This project has two purposes. First of all, I'd like to share some of my experience in nlp tasks such as segmentation or word vectors. The other, which is more important, is that probably some people are searching for pre-trained word vector models for non-English languages. Alas! English has gained much more attention than any other languages has done. Check this to see how easily you can get a variety of pre-trained English word vectors without efforts. I think it's time to turn our eyes to a multi language version of this. Nearing the end of the work, I happened to know that there is already a similar job named polyglot. I strongly encourage you to check this great project. How embarrassing! Nevertheless, I decided to open this project. You will know that my job has its own flavor, after all.

word2vec - Python interface to Google word2vec

  •    C

Python interface to Google word2vec. Training is done using the original C code, other functionality is pure Python with numpy.

TensorFlow-Tutorials - ํ…์„œํ”Œ๋กœ์šฐ๋ฅผ ๊ธฐ์ดˆ๋ถ€ํ„ฐ ์‘์šฉ๊นŒ์ง€ ๋‹จ๊ณ„๋ณ„๋กœ ์—ฐ์Šตํ•  ์ˆ˜ ์žˆ๋Š” ์†Œ์Šค ์ฝ”๋“œ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค

  •    Python

ํ…์„œํ”Œ๋กœ์šฐ๋ฅผ ๊ธฐ์ดˆ๋ถ€ํ„ฐ ์‘์šฉ๊นŒ์ง€ ๋‹จ๊ณ„๋ณ„๋กœ ์—ฐ์Šตํ•  ์ˆ˜ ์žˆ๋Š” ์†Œ์Šค ์ฝ”๋“œ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ํ…์„œํ”Œ๋กœ์šฐ ๊ณต์‹ ์‚ฌ์ดํŠธ์—์„œ ์ œ๊ณตํ•˜๋Š” ์•ˆ๋‚ด์„œ์˜ ๋Œ€๋ถ€๋ถ„์˜ ๋‚ด์šฉ์„ ๋‹ค๋ฃจ๊ณ  ์žˆ์œผ๋ฉฐ, ๊ณต์‹ ์‚ฌ์ดํŠธ์—์„œ ์ œ๊ณตํ•˜๋Š” ์†Œ์Šค ์ฝ”๋“œ๋ณด๋‹ค๋Š” ํ›จ์”ฌ ๊ฐ„๋žตํ•˜๊ฒŒ ์ž‘์„ฑํ•˜์˜€์œผ๋ฏ€๋กœ ์‰ฝ๊ฒŒ ๊ฐœ๋…์„ ์ตํž ์ˆ˜ ์žˆ์„ ๊ฒƒ ์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ชจ๋“  ์ฃผ์„์€ ํ•œ๊ธ€๋กœ(!) ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

text2vec - Fast vectorization, topic modeling, distances and GloVe word embeddings in R.

  •    R

text2vec is an R package which provides an efficient framework with a concise API for text analysis and natural language processing (NLP). To learn how to use this package, see text2vec.org and the package vignettes. See also the text2vec articles on my blog.

magnitude - A fast, efficient universal vector embedding utility package.

  •    Python

A feature-packed Python package and vector storage file format for utilizing vector embeddings in machine learning models in a fast, efficient, and simple manner developed by Plasticity. It is primarily intended to be a simpler / faster alternative to Gensim, but can be used as a generic key-vector store for domains outside NLP. Vector space embedding models have become increasingly common in machine learning and traditionally have been popular for natural language processing applications. A fast, lightweight tool to consume these large vector space embedding models efficiently is lacking.

practical-1 - Oxford Deep NLP 2017 course - Practical 1: word2vec

  •    Jupyter

For this practical, you'll be provided with a partially-complete IPython notebook, an interactive web-based Python computing environment that allows us to mix text, code, and interactive plots. We will be training word2vec models on TED Talk and Wikipedia data, using the word2vec implementation included in the Python package gensim. After training the models, we will analyze and visualize the learned embeddings.

go2vec - Read and use word2vec vectors in Go

  •    Go

This is a package for reading word2vec vectors in Go and finding similar words and analogies.

NWord2Vec - :speech_balloon: C# library for working with Word2Vec models

  •    CSharp

C# library for working with Word2Vec models. First build your model with the word2vec command line tools.

wikimark - get a sens of it

  •    Python

wikimark goal is to give you an idea of what the text is about. You can also use your own corpus.

experiments - Some research experiments

  •    Jupyter

Some research experiments I have done during the years. Most of the notes can be found on City of Wings.

languagecrunch - LanguageCrunch NLP server docker image

  •    Python

sentence: The new twitter is so weird. Seriously. Why is there a new twitter? What was wrong with the old one? Fix it now.