Displaying 1 to 10 from 10 results

CloverETL - Rapid Data Integration

  •    Java

Java based data integration framework can be used to transform/map/manipulate data in various formats (CSV,FIXLEN,XML,XBASE,COBOL,LOTUS, etc.); can be used standalone or embedded(as a library). Connects to RDBMS/JMS/SOAP/LDAP/S3/HTTP/FTP/ZIP/TAR.

tabula - Tabula is a tool for liberating data tables trapped inside PDF files

  •    CSS

Repo Note: The master branch is an in development version of Tabula. This may be substantially different from the latest releases of Tabula.As of August 2015, the master branch (and Tabula 1.1.X+) uses tabula-java instead of tabula-extractor under the hood. Previous versions of Tabula use tabula-extractor.

flashtext - Extract Keywords from sentence or Replace keywords in sentences.

  •    Python

This module can be used to replace keywords in sentences or extract keywords from sentences. It is based on the FlashText algorithm. Documentation can be found at FlashText Read the Docs.

Scriptella - ETL (Extract-Transform-Load) and Script Execution Tool


Scriptella is an ETL (Extract-Transform-Load) and script execution tool. Its primary focus is simplicity. It doesn't require the user to learn another complex XML-based language to use it, but allows the use of SQL or another scripting language suitable for the data source to perform required transformations.

phearjs - PhearJS - render dynamic Javascript webpages to JSON with PhantomJS

  •    CoffeeScript

PhearJS renders webpages. It runs a server which supervises a set number of PhantomJS workers that do the actual parsing and evaluation. Many websites rely on AJAX and front-end rendering. When a machine requests a page from such a website it sees a completely different page than you would see when viewing it in a browser.

flash - Golang Keyword extraction/replacement Datastructure using Tries instead of regexes

  •    Go

Fast Keyword extraction using Aho–Corasick algorithm and Tries. Flash is meant as a replacement for Regex, which in such cases can be extremely slow.

hacker-news-digest - :newspaper: A responsive interface of Hacker News with summaries and illustrations

  •    Python

This service extracts summaries and illustrations from hacker news articles for people who want to get the most out of hacker news while cutting down the time spent on deciding which one to read and which to skip.

infoboxer - Wikipedia information extraction library

  •    Ruby

Infoboxer is pure-Ruby Wikipedia (and generic MediaWiki) client and parser, targeting information extraction (hence the name). The whole idea is: you can have any Wikipedia page as a parsed tree with obvious structure, you can navigate that tree easily, and you have a bunch of hi-level helpers method, so typical information extraction tasks should be super-easy, one-liners in best cases.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.