Stop plotting your data - annotate your data and let it visualize itself. HoloViews is an open-source Python library designed to make data analysis and visualization seamless and simple. With HoloViews, you can usually express what you want to do in very few lines of code, letting you focus on what you are trying to explore and convey, not on the process of plotting.
visualization analysis bokeh matplotlib interactive data-science exploratory-data-analysis pandasAlways know what to expect from your data. Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling.
data-science pipeline exploratory-data-analysis eda data-engineering data-quality data-profiling datacleaner exploratory-analysis cleandata dataquality datacleaning mlops pipeline-tests pipeline-testing dataunittest data-unit-tests exploratorydataanalysis pipeline-debt data-profilersLux is a Python library that facilitate fast and easy data exploration by automating the visualization and data analysis process. By simply printing out a dataframe in a Jupyter notebook, Lux recommends a set of visualizations highlighting interesting trends and patterns in the dataset. Visualizations are displayed via an interactive widget that enables users to quickly browse through large collections of visualizations and make sense of their data. Here is a 1-min video introducing Lux, and slides from a more extended talk.
visualization data-science jupyter exploratory-data-analysis pandas visualization-toolsThese series of tutorials on Data Science engineering will try to compare how different concepts in the discipline can be implemented in the two dominant ecosystems nowadays: R and Python. We will do this from a neutral point of view. Our opinion is that each environment has good and bad things, and any data scientist should know how to use both in order to be as prepared as posible for job market or to start personal project.
data-science data-science-engineering tutorial data-frame exploratory-data-analysis r jupyter notebook machine-learningThis project is about building a music recommendation system for users who want to listen to happy songs. Such a system can not only be used to brighten up one's mood on a rainy weekend; especially in hospitals, other medical clinics, or public locations such as restaurants, the MusicMood classifier could be used to spread positive mood among people.
song-dataset exploratory-data-analysis lyrics mood machine-learningInitially inspired by csv-fingerprint, vis_dat helps you visualise a dataframe and "get a look at the data" by displaying the variable classes in a dataframe as a plot with vis_dat, and getting a brief look into missing data patterns using vis_miss.The name visdat was chosen as I think in the future it could be integrated with testdat. The idea being that first you visualise your data (visdat), then you run tests from testdat to fix them.
missingness visualisation r exploratory-data-analysisIf you are a Data Journalist looking to improve your coding skills, or you work as a developer giving support in a newsroom, you arrived to the right place. This is a repository of articles and tutorials, as IPython/Jupyter notebooks or web products, about doing data journalism. The articles presented here, apart from analysing data to present some facts about the current, past, and sometimes future world situation, will show programming instructions explaining how to repeat the analysis by yourself. We live in a world where governments and the media, more often than not, serve the interests of a few. Our belief is that to empower people to do their own analysis and arrive to conclusions based on facts (data), is a way to make us all more aware and strong as a society.
data-journalism data-analysis notebook exploratory-data-analysis data-visualisation data-visualizationThis is a collection of Jupyter notebooks intended to train the reader on different Apache Spark concepts, from basic to advanced, by using the R language. If your are interested in being introduced to some basic Data Science Engineering concepts and applications, you might find these series of tutorials interesting. There we explain different concepts and applications using Python and R. Additionally, if you are interested in using Python with Spark, you can have a look at our pySpark notebooks.
sparkr notebook jupyter-notebook exploratory-data-analysis jupyter r data-science data-analysis big-data bigdataFunctions useful for exploratory data analysis using random forests. This package extends the functionality of random forests fit by party (multivariate, regression, and classification), randomForestSRC (regression and classification,), randomForest (regression and classification), and ranger (classification and regression).
random-forest r rstats exploratory-data-analysis machine-learningThis package contains several tools to perform initial exploratory analysis on any input dataset. It includes custom functions for plotting the data as well as performing different kinds of analyses such as univariate, bivariate and multivariate investigation which is the first step of any predictive modeling pipeline. This package can be used to get a good sense of any dataset before jumping on to building predictive models. More functions to be added soon.
r exploratory-data-analysis data-science data-analysisPredict whether or not someone is an alien.
r machine-learning exploratory-data-analysisThis repository aims to provide tutorials for implementing various visualisations using Seaborn, Plotly, Bokeh, Networkx and even a sample report built using Tableau. 01 - Winter Olympics Analysis - Tableau.pdf : This file has a demo of the kind of plots you can make using Tableau. The data used for this tutorial is the Winter Olympics data.
visualisation plotly tutorial exploratory-data-analysisThe model takes as input one or more views for a study of an upper extremity. On each view, our 169-layer convolutional neural network predicts the probability of abnormality. We compute the overall probability of abnormality for the study by taking the arithmetic mean of the abnormality probabilities output by the network for each image. The model implemented in model.py takes as input 'all' the views for a study of an upper extremity. On each view the model predicts the probability of abnormality. The Model computes the overall probability of abnormality for the study by taking the arithmetic mean of the abnormality probabilites output by the network for each image.
densenet mura-dataset radiology pytorch standford-ml-group exploratory-data-analysisStack Overflow and Hacker News are portals mainly (but not only) read and used by programmers and other people who occupy their (professional or free) time with writing code. Stack Overflow lets their users easily Stack Overflow (SO), an established in 2008 portal on which programmers help each other by asking and answering coding questions, lets their users easily find questions related to a certain programming language/framework/library etc. by tags. The questions and replies/comments are evaluated in a form of points so it is usually instantly obvious which answer was rated the highest (and therefore is considered as the best one by the community) or whether a described problem is reproducible, i.e. you can replicate it with a piece of code prepared by a person asking a question.
hackernews stackoverflow eda exploratory-data-analysis granger-causality relationshipA Jupyter Notebook kernel written in D. Currently supports an echo "interpreter" and D Repl. Tested on Arch Linux; should work on other distributions and Windows with minor changes.
jupyter-kernels jupyter dlang data-science exploratory-data-analysis hedgefund fastcodeIn indentifying outliers I will cover both visual inspection as well a machine learning method called Isolation Forests. Since I will completing this project over multiple days and using Google Cloud, I will go over the basics of using BigQuery for storing the datasets so I won't have to start all over again each time I work on it. At the end of this blogpost I will summarize the findings, and give some specific recommendations to reduce mulitfamily and office building energy usage. In this second post I cover imputations techniques for missing data using Scikit-Learn's impute module using both point estimates (i.e. mean, median) using the SimpleImputer class as well as more complicated regression models (i.e. KNN) using the IterativeImputer class. The later requires that the features in the model are correlated. This is indeed the case for our dataset and in our particular case we also need to transform the feautres in order to discern a more meaningful and predictive relationship between them. As we will see, the transformation of the features also gives us much better results for imputing missing values.
data-science scikit-learn exploratory-data-analysis regression xgboost bokeh outlier-detection missing-data google-app-engine regression-models energy-efficiency outlier-removal missing-values
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.