elasticsearch-ruby - Ruby integrations for Elasticsearch

  •    Ruby

For integration with Ruby models and Rails applications, see the elasticsearch-rails project.The Elasticsearch client is compatible with Ruby 1.8.7 and higher. Other libraries in this repository might require a more recent Ruby version.

machine-learning-with-ruby - Curated list: Resources for machine learning in Ruby.

  •    Ruby

Machine Learning is a field of Computational Science - often nested under AI research - with many practical applications due to the ability of resulting algorithms to systematically implement a specific solution without explicit programmer's instructions. Obviously many algorithms need a definition of features to look at or a biggish training set of data to derive the solution from. This curated list comprises awesome libraries, data sources, tutorials and presentations about Machine Learning utilizing the Ruby programming language.

nlp-with-ruby - Practical Natural Language Processing done in Ruby.

  •    Ruby

This curated list comprises awesome resources, libraries, information sources about computational processing of texts in human languages with the Ruby programming language. That field is often referred to as NLP, Computational Linguistics, HLT (Human Language Technology) and can be brought in conjunction with Artificial Intelligence, Machine Learning, Information Retrieval, Text Mining, Knowledge Extraction and other related disciplines. This list comes from our day to day work on Language Models and NLP Tools. Read why this list is awesome. Our FAQ describes the important decisions and useful answers you may be interested in.

ruby-tesseract-ocr - A Ruby wrapper library to the tesseract-ocr API.

  •    Ruby

This wrapper binds the TessBaseAPI object through ffi-inline (which means it will work on JRuby too) and then proceeds to wrap said API in a more ruby-esque Engine class. To make this library work you need tesseract-ocr and leptonica libraries and headers and a C++ compiler.

unicode - Unicode normalization library

  •    C

Unicode normalization library. (Mirror of Yoshida-san's code base to maintain the RubyGem.)

fuzzy_tools - Fuzzy document finding in Ruby

  •    Ruby

FuzzyTools is a toolset for fuzzy searches in Ruby. The default algorithm has been tuned for accuracy (and reasonable speed) on 23 different test files gathered from many sources. Because it's mostly Ruby, FuzzyTools is best for searching smaller datasets—say less than 50Kb in size. Data cleaning or auto-complete over known options are potential uses.

punkt-segmenter - Ruby port of the NLTK Punkt sentence segmentation algorithm

  •    Ruby

This code is a ruby 1.9.x port of the Punkt sentence tokenizer algorithm implemented by the NLTK Project (http://www.nltk.org/). Punkt is a language-independent, unsupervised approach to sentence boundary detection. It is based on the assumption that a large number of ambiguities in the determination of sentence boundaries can be eliminated once abbreviations have been identiļ¬ed. I simply did the ruby port and some API changes.

nlp-pure - Natural language processing algorithms implemented in pure Ruby with minimal dependencies

  •    Ruby

Natural language processing algorithms implemented in pure Ruby with minimal dependencies. NOTE: this is not affiliated with, endorsed by, or in any way connected with Pure NLP, a trademark of John La Valle.

engtagger - English Part-of-Speech Tagger Library; a Ruby port of Lingua::EN::Tagger

  •    Ruby

A Ruby port of Perl Lingua::EN::Tagger, a probability based, corpus-trained tagger that assigns POS tags to English text based on a lookup dictionary and a set of probability values. The tagger assigns appropriate tags based on conditional probabilities--it examines the preceding tag to determine the appropriate tag for the current word. Unknown words are classified according to word morphology or can be set to be treated as nouns or other parts of speech. The tagger also extracts as many nouns and noun phrases as it can, using a set of regular expressions. The set of POS tags used here is a modified version of the Penn Treebank tagset. Tags with non-letter characters have been redefined to work better in our data structures. Also, the "Determiner" tag (DET) has been changed from 'DT', in order to avoid confusion with the HTML tag, <DT>.

lemmatizer - Lemmatizer for text in English. Inspired by Python's nltk.corpus.reader.wordnet.morphy

  •    Ruby

Lemmatizer for text in English. Inspired by Python's nltk.corpus.reader.wordnet.morphy package. Licensed under the MIT license.

words_counted - A Ruby natural language processor.

  •    Ruby

We are all in the gutter, but some of us are looking at the stars. WordsCounted is a Ruby NLP (natural language processor). WordsCounted lets you implement powerful tokensation strategies with a very flexible tokeniser class.

tokenizer - A simple tokenizer in Ruby for NLP tasks.

  •    Ruby

A simple multilingual tokenizer – a linguistic tool intended to split a written text into tokens for NLP tasks. This tool provides a CLI and a library for linguistic tokenization which is an anavoidable step for many HLT (Human Language Technology) tasks in the preprocessing phase for further syntactic, semantic and other higher level processing goals. Tokenization task involves Sentence Segmentation, Word Segmentation and Boundary Disambiguation for the both tasks.

wlapi - Ruby based API for the project Wortschatz Leipzig.

  •    Ruby

WLAPI is a programmatic API for web services provided by the project Wortschatz, University of Leipzig. These services are a great source of linguistic knowledge for morphological, syntactic and semantic analysis of German both for traditional and Computational Linguistics (CL). Use this API to gain data on word frequencies, left and right neighbours, collocations and semantic similarity. Check it out if you are interested in Natural Language Processing (NLP) and Human Language Technology (HLT).

monkeylearn-ruby - Official Ruby client for the MonkeyLearn API

  •    Ruby

Official Ruby client for the MonkeyLearn API. Build and consume machine learning models for language processing from your Ruby apps.

textoken - Simple and customizable text tokenization gem.

  •    Ruby

Textoken is a Ruby library for text tokenization. This gem extracts words from text with many customizations. It can be used in many fields like Web Crawling and Natural Language Processing. only_regexp: Accepts any regexp but only one regexp can be given.

Rley - An Earley parser written in Ruby

  •    Ruby

A Ruby library for constructing general parsers for any context-free language. Rley uses the Earley algorithm which is a general parsing algorithm that can handle any context-free grammar. Earley parsers can literally swallow anything that can be described by a context-free grammar. That's why Earley parsers find their place in so many NLP (Natural Language Processing) libraries/toolkits.

ruby-interoperability - Ruby Mixture with other Programming Languages


Ruby Interoperability by Andrei Beliankou and Contributors. To the extent possible under law, the person who associated CC0 with Ruby Interoperability has waived all copyright and related or neighboring rights to Ruby Interoperability.

