Displaying 1 to 16 from 16 results

holoviews - Stop plotting your data - annotate your data and let it visualize itself.

  •    Python

Stop plotting your data - annotate your data and let it visualize itself. HoloViews is an open-source Python library designed to make data analysis and visualization seamless and simple. With HoloViews, you can usually express what you want to do in very few lines of code, letting you focus on what you are trying to explore and convey, not on the process of plotting.

lux - Python API for Intelligent Visual Data Discovery

  •    Python

Lux is a Python library that facilitate fast and easy data exploration by automating the visualization and data analysis process. By simply printing out a dataframe in a Jupyter notebook, Lux recommends a set of visualizations highlighting interesting trends and patterns in the dataset. Visualizations are displayed via an interactive widget that enables users to quickly browse through large collections of visualizations and make sense of their data. Here is a 1-min video introducing Lux, and slides from a more extended talk.

data-science-your-way - Ways of doing Data Science Engineering and Machine Learning in R and Python

  •    Jupyter

These series of tutorials on Data Science engineering will try to compare how different concepts in the discipline can be implemented in the two dominant ecosystems nowadays: R and Python. We will do this from a neutral point of view. Our opinion is that each environment has good and bad things, and any data scientist should know how to use both in order to be as prepared as posible for job market or to start personal project.




musicmood - A machine learning approach to classify songs by mood.

  •    OpenEdge

This project is about building a music recommendation system for users who want to listen to happy songs. Such a system can not only be used to brighten up one's mood on a rainy weekend; especially in hospitals, other medical clinics, or public locations such as restaurants, the MusicMood classifier could be used to spread positive mood among people.

visdat - Preliminary Exploratory Visualisation of Data

  •    R

Initially inspired by csv-fingerprint, vis_dat helps you visualise a dataframe and "get a look at the data" by displaying the variable classes in a dataframe as a plot with vis_dat, and getting a brief look into missing data patterns using vis_miss.The name visdat was chosen as I think in the future it could be integrated with testdat. The idea being that first you visualise your data (visdat), then you run tests from testdat to fix them.

data-journalism - Data journalism and easy to replicate notebooks using Python, R, and Web visualisations

  •    HTML

If you are a Data Journalist looking to improve your coding skills, or you work as a developer giving support in a newsroom, you arrived to the right place. This is a repository of articles and tutorials, as IPython/Jupyter notebooks or web products, about doing data journalism. The articles presented here, apart from analysing data to present some facts about the current, past, and sometimes future world situation, will show programming instructions explaining how to repeat the analysis by yourself. We live in a world where governments and the media, more often than not, serve the interests of a few. Our belief is that to empower people to do their own analysis and arrive to conclusions based on facts (data), is a way to make us all more aware and strong as a society.

spark-r-notebooks - R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks

  •    Jupyter

This is a collection of Jupyter notebooks intended to train the reader on different Apache Spark concepts, from basic to advanced, by using the R language. If your are interested in being introduced to some basic Data Science Engineering concepts and applications, you might find these series of tutorials interesting. There we explain different concepts and applications using Python and R. Additionally, if you are interested in using Python with Spark, you can have a look at our pySpark notebooks.


edarf - exploratory data analysis using random forests

  •    R

Functions useful for exploratory data analysis using random forests. This package extends the functionality of random forests fit by party (multivariate, regression, and classification), randomForestSRC (regression and classification,), randomForest (regression and classification), and ranger (classification and regression).

xda - R package for exploratory data analysis

  •    R

This package contains several tools to perform initial exploratory analysis on any input dataset. It includes custom functions for plotting the data as well as performing different kinds of analyses such as univariate, bivariate and multivariate investigation which is the first step of any predictive modeling pipeline. This package can be used to get a good sense of any dataset before jumping on to building predictive models. More functions to be added soon.

DataVisualization - Tutorials on visualizing data using python packages like bokeh, plotly, seaborn and igraph

  •    Jupyter

This repository aims to provide tutorials for implementing various visualisations using Seaborn, Plotly, Bokeh, Networkx and even a sample report built using Tableau. 01 - Winter Olympics Analysis - Tableau.pdf : This file has a demo of the kind of plots you can make using Tableau. The data used for this tutorial is the Winter Olympics data.

DenseNet-MURA-PyTorch - Implementation of DenseNet model on Standford's MURA dataset using PyTorch

  •    Python

The model takes as input one or more views for a study of an upper extremity. On each view, our 169-layer convolutional neural network predicts the probability of abnormality. We compute the overall probability of abnormality for the study by taking the arithmetic mean of the abnormality probabilities output by the network for each image. The model implemented in model.py takes as input 'all' the views for a study of an upper extremity. On each view the model predicts the probability of abnormality. The Model computes the overall probability of abnormality for the study by taking the arithmetic mean of the abnormality probabilites output by the network for each image.

HN_SO_analysis - Is there a relationship between popularity of a given technology on Stack Overflow (SO) and Hacker News (HN)? And a few words about causality

  •    Python

Stack Overflow and Hacker News are portals mainly (but not only) read and used by programmers and other people who occupy their (professional or free) time with writing code. Stack Overflow lets their users easily Stack Overflow (SO), an established in 2008 portal on which programmers help each other by asking and answering coding questions, lets their users easily find questions related to a certain programming language/framework/library etc. by tags. The questions and replies/comments are evaluated in a form of points so it is usually instantly obvious which answer was rated the highest (and therefore is considered as the best one by the community) or whether a described problem is reproducible, i.e. you can replicate it with a piece of code prepared by a person asking a question.

jupyterd - WIP Jupyter notebook for the D programming language / DSLs written in D

  •    D

A Jupyter Notebook kernel written in D. Currently supports an echo "interpreter" and D Repl. Tested on Arch Linux; should work on other distributions and Windows with minor changes.

NYCBuildingEnergyUse - Creating Regression Models Of Building Emissions On Google Cloud

  •    Jupyter

In indentifying outliers I will cover both visual inspection as well a machine learning method called Isolation Forests. Since I will completing this project over multiple days and using Google Cloud, I will go over the basics of using BigQuery for storing the datasets so I won't have to start all over again each time I work on it. At the end of this blogpost I will summarize the findings, and give some specific recommendations to reduce mulitfamily and office building energy usage. In this second post I cover imputations techniques for missing data using Scikit-Learn's impute module using both point estimates (i.e. mean, median) using the SimpleImputer class as well as more complicated regression models (i.e. KNN) using the IterativeImputer class. The later requires that the features in the model are correlated. This is indeed the case for our dataset and in our particular case we also need to transform the feautres in order to discern a more meaningful and predictive relationship between them. As we will see, the transformation of the features also gives us much better results for imputing missing values.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.