Displaying 1 to 20 from 24 results

mtail - extract whitebox monitoring data from application logs for collection in a timeseries database

  •    Go

mtail is a tool for extracting metrics from application logs to be exported into a timeseries database or timeseries calculator for alerting and dashboarding.It aims to fill a niche between applications that do not export their own internal state, and existing monitoring systems, without patching those applications or rewriting the same framework for custom extraction glue code.

textract - node

  •    HTML

A text extraction node module. In almost all cases above, what textract cares about is the mime type. So .html and .htm, both possessing the same mime type, will be extracted. Other extensions that share mime types with those above should also extract successfully. For example, application/vnd.ms-excel is the mime type for .xls, but also for 5 other file types.

aubio - a library for audio and music analysis

  •    C

aubio is a library to label music and sounds. It listens to audio signals and attempts to detect events. For instance, when a drum is hit, at which frequency is a note, or at what tempo is a rhythmic melody. Its features include segmenting a sound file before each of its attacks, performing pitch detection, tapping the beat and producing midi streams from live audio.




meyda - Audio feature extraction for JavaScript.

  •    Javascript

Meyda is a Javascript audio feature extraction library. Meyda supports both offline feature extraction as well as real-time feature extraction using the Web Audio API. We wrote a paper about it, which is available here. Please see the documentation for setup and usage instructions.

Document Generation Utility

  •    

This is about extracting different XML entities and wrapping up with jumbled legal English alphabets and outputs as per File format defined in settings.

unrpa - A program to extract files from the RPA archive format.

  •    Python

unrpa is a script to extract files from the RPA archive format created for the Ren'Py Visual Novel Engine. You will need Python 3.x in order to run it (either install through your package manager or directly from python.org).

yakaa - Yet Another Keep Alive Agent for Node

  •    Javascript

This is an extracted copy of Node 0.12's keep-alive Agent implementation with some small changes intended to make it work with older versions of Node. It also has one extra feature, which I needed.The HTTP Agent is used for pooling sockets used in HTTP client requests.


creditcard - Creditcard number parsing, validation and information extraction

  •    Javascript

Creditcard number parsing, validation and information extraction. The source code has been commented using JSDoc and converted to documentation which can be found in the docs folder. The module is available in the NPM registry. It can be installed using the npm command line utility.

jarchivelib - A simple archiving and compression library for Java

  •    Java

A simple library that facades org.apache.commons.compress, to provide an easy-to-use API for archiving and compressing into and out of File objects.

schenkerian - HTML keyword analyzer

  •    Javascript

Schenkerian analysis is a method of musical analysis by interpreting the underlying structure of a tonal work and to help reading the score according to that structure. This library is that, but for HTML built on top of Natural Node which includes term frequency, string similarities, and tokenizing. Given most webpages (attempt) to use the semantics of HTML, it takes into account not only term frequency, but the weight of an HTML tag, placement in document, and other useful forms of denoting significance (like Open Graph).

node-scrappy - Extract rich metadata from URLs

  •    TypeScript

Extract rich metadata from URLs. Scrappy uses a simple two step process to extract the metadata from any URL or file. First, it runs through plugin-able scrapeStream middleware to extract metadata about the file itself. With the result in hand, it gets passed on to a plugin-able extract pipeline to format the metadata for presentation and extract additional metadata about related entities.

full-text-rss - Full-Text RSS can transform partial feeds to deliver the full content stripped of clutter and ads

  •    PHP

This is a our public version of Full-Text RSS available to download for free from http://code.fivefilters.org. For best extraction results, and to help us sustain the project, you can purchase the most up-to-date version at http://fivefilters.org/content-only/#download - so if you like this free version, please consider supporting us by purchasing the latest release.

autolink-java - Java library to extract links (URLs, email addresses) from plain text; fast, small and smart

  •    Java

Java library to extract links (URLs, email addresses) from plain text; fast, small and smart about recognizing where links end

mokolo - Collection of Machine Learning Algorithms for Node

  •    Javascript

mokolo intends to become a collection of machine learning algorithms for Node.js. The current release supports only the Non-Negative Matrix Factorization (NMF) algorithm; more are coming soon. Feedbacks and contributions are greatly appreciated. The easiest way to install mokolo is through npm, the nodejs package manager.

node-goldwasher - Extraction of text and related metadata.

  •    Javascript

Alternatively, the output can be configured as XML, Atom or RSS format with the output option. The reason redundant information is included, such as the source, is that each returned nugget is supposed to be an atomic piece of information. As such, each nugget is to contain the information that "somewhere, at some point in time, something was written (with a link to some place)".

webdext - Intelligent Web Data Extractor

  •    HTML

Webdext is a Javascript library for web data extraction (web scraping). Currently, it only supports data records extraction from a list page (a web page containing 2 or more data records). Intelligent extraction algorithm is heavily based on AutoRM [1] and DAG-MTM [2] (not an exact implementation though).

YouTubeExtractor - A helper to extract the metadata, including streaming video Urls from a YouTube video

  •    Kotlin

This library was only created to extract video stream URLs from YouTube, not provide a video player. ExoMedia is a great library for playing the video streams to the user. See the sample app for an example. This library uses OkHttp, Moshi and Rhino under the hood, so you may need to apply their ProGuard rules.

stegextract - Detect hidden files and text in images

  •    Shell

Bash script to extract hidden files and strings from images. Stegextract extracts any trailing data after the image's closing bytes, and any hidden files (or other images) embedded within the image. Short byte combinations such as JPEG's FFD8 FFE0 might sometimes create false positives. Manually reviewing the hexdump is sometimes inevitable in cases of highly complex embedded files. Stegextract is not the solution for any color/pixel/filter/LSB related Steganography, nor does it try to be. It relies on magic numbers, hexdumps and binary data alone. Currently supports PNG, JPG, and GIF.