Always know what to expect from your data. Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling.
data-science pipeline exploratory-data-analysis eda data-engineering data-quality data-profiling datacleaner exploratory-analysis cleandata dataquality datacleaning mlops pipeline-tests pipeline-testing dataunittest data-unit-tests exploratorydataanalysis pipeline-debt data-profilersZingg is a scalable fuzzy matching for data mastering, deduplication and entity resolution. Real world data contains multiple records belonging to the same customer. These records can be in single or multiple systems and they have variations across fields which makes it hard to combine them together, especially with growing data volumes. Zingg integrates different records of an entity like customer, patient, supplier, product etc in same or disparate data sources.
data-science identity-resolution spark etl dedupe entity-resolution data-transformation ml fuzzy-matching deduplication masterdata dataengineering fuzzymatch dataquality datapreparation analytics-engineering data-transformationsChaos Genius is an open source ML powered analytics engine for outlier detection and root cause analysis. Chaos Genius can be used to monitor and analyse high dimensionality business, data and system metrics at scale. Using Chaos Genius, users can segment large datasets by key performance metrics (e.g. Daily Active Users, Cloud Costs, Failure Rates) and important dimensions (e.g., countryID, DeviceID, ProductID, DayofWeek) across which they want to monitor and analyse the key metrics.
machine-learning alert ai monitoring deep-learning time-series analytics ml data-visualization business-intelligence outlier-detection alert-messages observability monitoring-tool dataengineering anomaly-detection dataquality seasonality rootcauseanalysisDeequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets. Deequ depends on Java 8 and Apache Spark 2.2. We will make it available as a maven artifact soon.
dataquality spark unit-testing
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.