word_cloud - A little word cloud generator in Python

  •        553

A little word cloud generator in Python. Read more about it on the blog post or the website. The code is Python 2, but Python 3 compatible. worcloud depends on numpy>=1.5.1, pillow and matplotlib. To install it via pip, you will also need a C compiler.

https://github.com/amueller/word_cloud

Tags
Implementation
License
Platform

   




Related Projects

Source Code Word Cloud Generator

  •    CSharp

Generate word cloud form your code to see what your code is about and what it does. A word cloud is a set of randomly arranged keywords, variable and class names etc. used in your code. The size and the color of each word expresses it's usage frequency. Rarely used words are s...

stylecloud - Python package + CLI to generate stylistic wordclouds, including gradients and icon shapes!

  •    Python

This package is a more formal implementation of my stylistic word cloud project from 2016. You can use stylecloud in a Python script or as a standalone CLI app. For example, let's say you have a text of the U.S. Constitution constitution.txt.

kumo - Kumo - Java Word Cloud

  •    Java

Kumo's goal is to create a powerful and user friendly Word Cloud API in Java. Kumo directly generates an image file without the need to create an applet (as many other libraries do).

wordcloud2.js - Tag cloud/Wordle presentation on 2D canvas or HTML

  •    Javascript

Create a tag cloud/Wordle presentation on 2D canvas or HTML. This library is a spin-off project from HTML5 Word Cloud.


word-mesh - A context-preserving word cloud generator

  •    Python

A wordcloud/wordmesh generator that allows users to extract keywords from text, and create a simple and interpretable wordcloud. Most popular open-source wordcloud generators (word_cloud, d3-cloud, echarts-wordcloud) focus more on the aesthetics of the visualization than on effectively conveying textual features. word-mesh strikes a balance between the two and uses the various statistical, semantic and grammatical features of the text to inform visualization parameters.

CamelCaseMotion - A vim script to provide CamelCase motion through words (fork of inkarkat's camelcasemotion script)

  •    Vim

This script defines motions similar to w, b, e which do not move word-wise (forward/backward), but Camel-wise; i.e. to word boundaries and uppercase letters. The motions also work on underscore notation, where words are delimited by underscore ('_') characters. From here on, both CamelCase and underscore_notation entities are referred to as "words" (in double quotes). Just like with the regular motions, a [count] can be prepended to move over multiple "words" at once. Outside of "words" (e.g. in non-keyword characters like / or ;), the new motions move just like the regular motions. Vim provides a built-in iw text object called 'inner word', which works in operator-pending and visual mode. Analog to that, this script defines inner "word" motions which select the "word" (or multiple "words" if a [count] is given) where the cursor is located.

fastText_multilingual - Multilingual word vectors in 78 languages

  •    Jupyter

Facebook recently open-sourced word vectors in 89 languages. However these vectors are monolingual; meaning that while similar words within a language share similar vectors, translation words from different languages do not have similar vectors. In a recent paper at ICLR 2017, we showed how the SVD can be used to learn a linear transformation (a matrix), which aligns monolingual vectors from two languages in a single vector space. In this repository we provide 78 matrices, which can be used to align the majority of the fastText languages in a single space. Word embeddings define the similarity between two words by the normalised inner product of their vectors. The matrices in this repository place languages in a single space, without changing any of these monolingual similarity relationships. When you use the resulting multilingual vectors for monolingual tasks, they will perform exactly the same as the original vectors. To learn more about word embeddings, check out Colah's blog or Sam's introduction to vector representations.

wordcloud - HTML5 Word Cloud

  •    Javascript

HTML5 Word Cloud

BioSentVec - BioWordVec & BioSentVec: pre-trained embeddings for biomedical words and sentences

  •    Jupyter

We created biomedical word and sentence embeddings using PubMed and the clinical notes from MIMIC-III Clinical Database. Both PubMed and MIMIC-III texts were split and tokenized using NLTK. We also lowercased all the words. The statistics of the two corpora are shown below. We applied fastText to compute 200-dimensional word embeddings. We set the window size to be 20, learning rate 0.05, sampling threshold 1e-4, and negative examples 10. Both the word vectors and the model with hyperparameters are available for download below. The model file can be used to compute word vectors that are not in the dictionary (i.e. out-of-vocabulary terms). This work extends the original BioWordVec which provides fastText word embeddings trained using PubMed and MeSH. We used the same parameters as the original BioWordVec which has been thoroughly evaluated in a range of applications.

english-words - :memo: A text file containing 479k English words for all your dictionary/word-based projects e

  •    Python

A text file containing 466k English words. While searching for a list of english words (for an auto-complete tutorial) I found: http://stackoverflow.com/questions/2213607/how-to-get-english-language-word-database which refers to http://www.infochimps.com/datasets/word-list-350000-simple-english-words-excel-readable (link no longer available).

inclavare-containers - A novel container runtime, aka confidential container, for cloud-native confidential computing and enclave runtime ecosystem

  •    C

Inclavare, pronounced as [ˈinklɑveə], is the Latin etymology of the word enclave, which means to isolate the user's sensitive workload from the untrusted and uncontrollable infrastructure in order to meet the protection requirement for the data in use. Inclavare Containers is an innovation of container runtime with the novel approach for launching protected containers in hardware-assisted Trusted Execution Environment (TEE) technology, aka Enclave, which can prevent the untrusted entity, such as Cloud Service Provider (CSP), from accessing the sensitive and confidential assets in use.

jQCloud - jQuery plugin for drawing neat word clouds that actually look like clouds

  •    Javascript

jQCloud is a jQuery plugin that builds neat and pure HTML + CSS word clouds and tag clouds that are actually shaped like a cloud (otherwise, why would we call them 'word clouds'?). You can easily substitute jqcloud.css with a custom CSS stylesheet following the guidelines explained later.

Words - Letterpress Word List

  •    

The word list behind Letterpress. Loosely based on a collection of other word lists with refinements from real-world feedback. I hope this is useful for other word-based-app makers, and as a way to improve Letterpress. Each language should be in a sorted, UTF-8 encoded file "Words/[ISO 639-1 code].txt".

Word Index helps to build indexes

  •    

WordIndex is an Add-in for Word 2007 (or 2003) that helps build index for big documents. It extracts and filters words from the doc using regular expressions. The list of words can then be used by the "Insert Index/AutoMark" Word feature. Key words : Brute force. Multithreading.

budou - Budou is an automatic organizer tool for beautiful line breaking in CJK (Chinese, Japanese, and Korean)

  •    Python

English uses spacing and hyphenation as cues to allow for beautiful and legible line breaks. Certain CJK languages have none of these, and are notoriously more difficult. Breaks occur randomly, usually in the middle of a word. This is a long standing issue in typography on web, and results in degradation of readability.Budou automatically translates CJK sentences into organized HTML code with lexical chunks wrapped in non-breaking markup so as to semantically control line breaks. Budou uses Google Cloud Natural Language API (NL API) to analyze the input sentence, and it concatenates proper words in order to produce meaningful chunks utilizing part-of-speech (pos) tagging and syntactic information. Processed chunks are wrapped with SPAN tag, so semantic units will no longer be split at the end of a line by specifying their display property as inline-block in CSS.

Text Statistics and More

  •    C++

Text Stats quickly gives you interesting information about any size document. It provides word counts, largest words, average word length, complete word lists, and much more. It will also allow you to easily search large texts for any words.

d3-cloud - Create word clouds in JavaScript.

  •    Javascript

Create word clouds in JavaScript.

LibreOffice - The Document foundation

  •    C

LibreOffice is the free power-packed Open Source personal productivity suite for Windows, Macintosh and Linux. LibreOffice is the perfect choice for home users, businesses, government and other organizations. It's native file format is the ISO standardized ODF (Open Document Format), but LibreOffice can open and save Microsoft Word, PowerPoint and Excel files, as well as many other formats, bringing you the widest-available compatibility with other products.

SenseRelate

  •    Perl

SenseRelate uses measures of semantic similarity to perform word sense disambiguation. AllWords assigns a sense to each word in a text, TargetWord assigns a sense to a given word, and WordToSet assigns the sense of a word most related to a set of words.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.