We have collection of more than 1 Million open source products ranging from Enterprise product to
small libraries in all platforms. We aggregate information from all open source repositories.
Search and find the best for your needs. Check out projects section.
Java based data integration framework can be used to transform/map/manipulate data in various formats (CSV,FIXLEN,XML,XBASE,COBOL,LOTUS, etc.); can be used standalone or embedded(as a library). Connects to RDBMS/JMS/SOAP/LDAP/S3/HTTP/FTP/ZIP/TAR.
Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the pipeline. The pipeline is then executed by one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.
Storm is a distributed real time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real time processing what Hadoop did for batch processing. Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more.
SiteWhere is an open source platform for capturing, storing, integrating, and analyzing data from IoT devices. SiteWhere is a multi-tenant, application enablement platform for the Internet of Things (IoT) providing device management, complex event processing (CEP) and integration through a modern, scalable architecture. SiteWhere provides REST APIs for all system functionality.
In today's corporate world, data is spread across multiple data sources (ex: DB2, Oracle, Sybase etc) and business users wish to generate types of business reports from a single source, no matter how the data is distributed. This scenario calls for data integration from multip...
Logstash is part of the Elastic Stack along with Beats, Elasticsearch and Kibana. Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite "stash." (Ours is Elasticsearch, naturally.). Logstash has over 200 plugins, and you can write your own very easily as well.The license is Apache 2.0, meaning you are pretty much free to use it however you want in whatever way.
No 1 in Business Analytics: Data Mining, Predictive Analytics, ETL, Reporting, Dashboards in One Tool. 1000+ methods: data mining, business intelligence, ETL, data mining, data analysis + Weka + R, forecasting, visualization, business intelligence
Apache Tajo is a robust big data relational and distributed data warehouse system for Apache Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation, and ETL (extract-transform-load process) on large-data sets stored on HDFS (Hadoop Distributed File System) and other data sources.
Most data warehousing is done using an ETL pattern. ETL stands for Extract, Transform, and Load and usually refers to extracting data, modifying it "in flight" - so either on a file system or in memory, and loading that data into a different data store. This pattern is mostly useful for converting incompatible data formats, such as CSV and XML into something more structured such as SQL.
* This API uses a WSSE authentication header on every call, each example class has a method called getWSSEHeader for this purpose * The code examples can not be run without Partner API credentials (a Username and Secret). These must be obtained through an Adobe Partner Integration Manager after appropriate agreements are in place.* Each example passes JSON encoded data as a String and received JSON encoded data as a String * parsing the JSON data is left as an exercise for the developer so you
GATE excels at text analysis of all shapes and sizes. It provides support for diverse language processing tasks such as parsers, morphology, tagging, Information Retrieval tools, Information Extraction components for various languages, and many others. It provides support to measure, evaluate, model and persist the data structure. It could analyze text or speech. It has built-in support for machine learning and also adds support for different implementation of machine learning via plugin.
Conjecture is a framework for building machine learning models in Hadoop using the Scalding DSL. The goal of this project is to enable the development of statistical models as viable components in a wide range of product settings. Applications include classification and categorization, recommender systems, ranking, filtering, and regression (predicting real-valued numbers). Conjecture has been designed with a primary emphasis on flexibility and can handle a wide variety of inputs. Integration with Hadoop and scalding enable seamless handling of extremely large data volumes, and integration with established ETL processes. Predicted labels can either be consumed directly by the web stack using the dataset loader, or models can be deployed and consumed by live web code. Currently, binary classification (assigning one of two possible labels to input data points) is the most mature component of the Conjecture package.There are a few stages involved in training a machine learning model using Conjecture.
The Open XML SDK provides open-source libraries for working with Open XML Documents (DOCX, XLSX, and PPTX). It supports scenarios such as: - High-performance generation of word-processing documents, spreadsheets, and presentations - Document modification, such as removing tracked revisions or removing unacceptable content from documents - Data and content querying and extraction, such as transformation from DOCX to HTML, or extraction of data from spreadsheets