ferret is a web scraping system. It aims to simplify data extraction from the web for UI testing, machine learning, analytics and more. ferret allows users to focus on the data. It abstracts away the technical details and complexity of underlying technologies using its own declarative language. It is extremely portable, extensible, and fast. It as the ability to scrape JS rendered pages, handle all page events and emulate user interactions. 
 
ferret uses Chrome/Chromium via Chrome Devtools Protocol to handle dynamically rendered web pages. It helps you to focus on the data you need using an easy to learn declarative language.

ferret is a web scraping system. It aims to simplify data extraction from the web for UI testing, machine learning, analytics and more.  ferret allows users to focus on the data. It abstracts away the technical details and complexity of underlying technologies using its own declarative language. It is extremely portable, extensible, and fast. It as the ability to scrape JS rendered pages, handle all page events and emulate user interactions. 

ferret - Declarative web scraping

scikit-learn is a Python module for machine learning built on top of SciPy. It is simple and efficient tools for data mining and data analysis. It supports automatic classification, clustering, model selection, pre processing and lot more.

Scikit Learn - Machine Learning in Python

InfiniDB Community Edition is a scale-up, column-oriented database for data warehousing, analytics, business intelligence and read-intensive applications. InfiniDB's data warehouse columnar engine is multi-terabyte capable and accessed via MySQL.

 <ul><li>column-oriented database</li><li>multi-threaded, scale up processing</li><li>insert, update, delete transaction support</li><li>MySQL front-end integration</li><li>high-performance query support</li><li>Automatic vertical and horizontal partitioning</li><li>High concurrency</li><li>High-speed data loader</li><li>DML support</li><li>Crash recovery</li><li>Performance diagnostics</li><li>BI Tool Compatible</li><li>Logical Data compression</li><li>ALTER TABLE is supported</li><li>Low Maintenance</li><li>No need for indexing</li><li>MVCC design</li>
 </ul>

InfiniDB Community Edition is a scale-up, column-oriented database for data warehousing, analytics, business intelligence and read-intensive applications. InfiniDB's data warehouse columnar engine is multi-terabyte capable and accessed via MySQL.

InfiniDB - Scale-up analytics database engine for data warehousing and  business intelligence

LucidDB is the RDBMS built entirely for data warehousing and business intelligence. It is based on architectural cornerstones such as column-store, bitmap indexing, hash join/aggregation, and page-level multiversioning. Every component of LucidDB was designed with the requirements of flexible, high-performance data integration and sophisticated query processing in mind. 

 Rather than throwing hardware at data warehousing problems by relying on expensive clusters or specialized "appliances", the scalability offered by LucidDB's unique architecture allows you to achieve great performance using only a single off-the-shelf Linux or Windows server. Besides keeping costs down, this also minimizes maintenance and administration hassles. 
 <UL>
	<LI>Support of both 32 and 64 bit platforms</LI>
	<LI>Integrated with Pentaho Business Intelligence product</LI>
	<LI>Automatically adapts to either bitmap or btree representation depending on data distribution</LI>
	<LI>Versioning at page-level is much more efficient than transactional databases</LI>
	<LI>Supports incremental backup</LI>
	<LI>High performance and greater cache and disk effectiveness</LI>
	<LI>Allows flat files (e.g. BCP or CSV format) to be queried as foreign tables via LucidDB</LI>
 </UL>

LucidDB is the RDBMS built entirely for data warehousing and business intelligence. It is based on architectural cornerstones such as column-store, bitmap indexing, hash join/aggregation, and page-level multi versioning. Every component of LucidDB was designed with the requirements of flexible, high-performance data integration and sophisticated query processing in mind.

LucidDB - RDBMS built entirely for Data Warehousing and Business Intelligence

MLlib is a Spark implementation of some common machine learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction and lot more.

MLIB - Apache Spark's scalable machine learning library

MonetDB is a high-performance applications in data mining, OLAP, GIS, XML Query, text and multimedia retrieval. MonetDB often achieves a significant speed improvement for SQL and XQuery over other open-source systems. MonetDB achieves its goal by innovations at all layers of a DBMS, e.g. a storage model based on vertical fragmentation, a modern CPU-tuned query execution architecture, automatic and self-tuning indexes, run-time query optimization, and a modular software architecture. Its major features include: 

 <UL>
	<LI>High-performance database management system</LI>
	<LI>Multi-model system</LI>
	<LI>Column-store based database kernel</LI>
	<LI>Broad spectrum database system</LI>
	<LI>Extendable database system</LI>
	<LI>Database back-end is multi-threaded and guards a single physical database instance</LI>
	<LI>Operations are cache- and memory-aware with supreme performance</LI>
	<LI>Index selection, creation and maintenance is automatic</LI>
	<LI>Connectivity is provided through TCP/IP sockets and SSH on many platforms</LI>
	<LI>Could be Embedded</LI>
	<LI>Programming interfaces available in all major languages</LI>
 </UL>

MonetDB is a high-performance SQL- and XQuery- column-store database management system with automatic index management, flexible optimizer infrastructure, and programmable backend functionality.

MonetDB

Orange is a component-based data mining software. It includes a range of data visualization, exploration, preprocessing and modeling techniques. It supports . interactive data analysis workflows with a large toolbox. 

Orange - Data Mining Suite

KNIME, pronounced [naim], is a modern data analytics platform that allows you to perform sophisticated statistics and data mining on your data to analyze trends and predict potential results. Its visual workbench combines data access, data transformation, initial investigation, powerful predictive analytics and visualization. KNIME also provides the ability to develop reports based on your information or automate the application of new insight back into production systems.

Knime - Data Analytics Platform

NFStream is a Python package providing fast, flexible, and expressive data structures designed to make working with online or offline network data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world network data analysis in Python. Additionally, it has the broader goal of becoming a common network data processing framework for researchers providing data reproducibility across experiments. NFStream extracts +90 flow features and can convert it directly to a pandas Dataframe or a CSV file.


NFStream - A Flexible Network Data Analysis Framework

Discover open source projects across all platforms

Projects