Displaying 1 to 20 from 55 results

markdown-js - A Markdown parser for javascript

  •    Javascript

If you want to use from the browser go to the releases page on GitHub and download the version you want (minified or not). We only officially support node >= 0.10 as the libraries we use for building and testing don't work on older versions of node. That said since this module is so simple and doesn't use any parts of the node API if you use the pre-built version and find a bug let us know and we'll try and fix it.

Command-line-text-processing - :zap: From finding text to search and replace, from sorting to beautifying text and more :art:

  •    Shell

Learn about various commands available for common and exotic text processing needs. Examples have been tested on GNU/Linux - there'd be syntax/feature variations with other distributions, consult their respective man pages for details. ⚠️ 🚧 Work in progress, stay tuned...

Gate - General Architecture for Text Engineering

  •    Java

GATE excels at text analysis of all shapes and sizes. It provides support for diverse language processing tasks such as parsers, morphology, tagging, Information Retrieval tools, Information Extraction components for various languages, and many others. It provides support to measure, evaluate, model and persist the data structure. It could analyze text or speech. It has built-in support for machine learning and also adds support for different implementation of machine learning via plugin.

OpenPipe - Document Pipeline

  •    Java

OpenPipe is an open source scalable platform for manipulating a stream of documents. A pipeline is an ordered set of steps / operations performed on a document to convert from its raw form to something ready to be put into the index. The operations performed on documents include language detection, field manipulation, POS tagging, entity extraction or submitting the document to a search engine.




TextTeaser - Automatic Summarization Algorithm

  •    Scala

TextTeaser is an automatic summarization algorithm that combines the power of natural language processing and machine learning to produce good results. It can provide provide a gist of an article, Better previews in news readers.

pyparsing

  •    Python

The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the traditional lex/yacc approach, or the use of regular expressions. The pyparsing module provides a library of classes that client code uses to construct the grammar directly in Python code. The Python representation of the grammar is quite readable, owing to the self-explanatory class names, and the use of '+', '|' and '^' operator definitions.

CAML.NET

  •    CSharp

A set of .NET language-based tools for creating dynamic, reusable CAML query components. CAML.NET leverages the power and flexibility of the .NET Common Language Runtime (CLR) to build CAML queries dynamically in code while preserving the syntactic structure of the native CAML...

Conversor de textos formatados para OpenXML

  •    C++

Projeto de um conversor para o formato OpenXML, mais precisamente arquivos de texto formatado utilizando a linguagem WordprocessingML, a partir de outros formatos diversos. O conversor consiste basicamente de duas partes: um parser/interpretador para o formato original e um...


RazorEngine

  •    

A templating engine built upon Microsoft's Razor parsing technology. The RazorEngine allows you to use Razor syntax to build robust templates. Currently we have integrated the vanilla Html + Code support, but we hope to support other markup languages in future.

Flat File Parser

  •    CSharp

A flat file parser capable of loading in complete or partial flat text files. It will convert each row in the file into a standard CLR object.

TextGenerator

  •    

A simple tool for quick, polymorphic text generation based on a variable input pattern.

Regex Batch Replacer (Multi-File)

  •    

Regex Batch Replacer uses regular expression to find and replace text in multiple files.

fotelo: A formatted text loader library

  •    

fotelo (foe-tell-o): A formatted text loader library. Fotelo will allow you to import text files of various formats into a strongly-typed .NET DataTable for use within your applications.

OpenTextSummarizer C# Port

  •    

This is a port to C# of the fantastic Open Text Summarizer (http://libots.sourceforge.net/) . It uses the same dictionary files and algorithms of the original OTS, though all of the code was rewritten.

pynlpl - PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing

  •    Python

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotatation). The library is a divided into several packages and modules. It works on Python 2.7, as well as Python 3.

python-nameparser - A simple Python module for parsing human names into their individual components

  •    Python

A simple Python (3.2+ & 2.6+) module for parsing human names into their individual components. The supported name structure is generally "Title First Middle Last Suffix", where all pieces are optional. Comma-separated format like "Last, First" is also supported.

whatlanggo - Natural language detection library for Go

  •    Go

Natural language detection for Go.Thanks to greyblake Potapov Sergey for creating whatlang-rs from where I got the idea and logic.

tif - Text Interchange Formats

  •    R

This package describes and validates formats for storing common object arising in text analysis as native R objects. Representations of a text corpus, document term matrix, and tokenized text are included. The tokenized text format is extensible to include other annotations. There are two versions of the corpus and tokens objects; packages should accept both and return or coerce to at least one of these.corpus (data frame) - A valid corpus data frame object is a data frame with at least two columns. The first column is called doc_id and is a character vector with UTF-8 encoding. Document ids must be unique. The second column is called text and must also be a character vector in UTF-8 encoding. Each individual document is represented by a single row in the data frame. Addition document-level metadata columns and corpus level attributes are allowed but not required.

simple-markdown - JavaScript markdown parsing, made simple

  •    Javascript

simple-markdown is a markdown-like parser designed for simplicity and extensibility.Most markdown-like parsers aim for speed or edge case handling. simple-markdown aims for extensibility and simplicity.