Displaying 1 to 4 from 4 results

Zingg - Scalable fuzzy matching for data mastering, deduplication and entity resolution

  •    Java

Zingg is a scalable fuzzy matching for data mastering, deduplication and entity resolution. Real world data contains multiple records belonging to the same customer. These records can be in single or multiple systems and they have variations across fields which makes it hard to combine them together, especially with growing data volumes. Zingg integrates different records of an entity like customer, patient, supplier, product etc in same or disparate data sources.

Chaos Genius - ML powered analytics engine for outlier detection and root cause analysis

  •    Python

Chaos Genius is an open source ML powered analytics engine for outlier detection and root cause analysis. Chaos Genius can be used to monitor and analyse high dimensionality business, data and system metrics at scale. Using Chaos Genius, users can segment large datasets by key performance metrics (e.g. Daily Active Users, Cloud Costs, Failure Rates) and important dimensions (e.g., countryID, DeviceID, ProductID, DayofWeek) across which they want to monitor and analyse the key metrics.

deequ - Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets

  •    Scala

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets. Deequ depends on Java 8 and Apache Spark 2.2. We will make it available as a maven artifact soon.









We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.