Displaying 1 to 9 from 9 results

python-machine-learning-book - The "Python Machine Learning (1st edition)" book code repository and info resource

  •    Jupyter

This GitHub repository contains the code examples of the 1st Edition of Python Machine Learning book. If you are looking for the code examples of the 2nd Edition, please refer to this repository instead. What you can expect are 400 pages rich in useful material just about everything you need to know to get started with machine learning ... from theory to the actual code that you can directly put into action! This is not yet just another "this is how scikit-learn works" book. I aim to explain all the underlying concepts, tell you everything you need to know in terms of best practices and caveats, and we will put those concepts into action mainly using NumPy, scikit-learn, and Theano.

software-analytics - A repository with my data analysis results of software artifacts

  •    Jupyter

In this repository, I show some examples around mining valuable information out of software artifacts are presented.

mli-resources - Machine Learning Interpretability Resources

  •    Jupyter

Machine learning algorithms create potentially more accurate models than linear models, but any increase in accuracy over more traditional, better-understood, and more easily explainable techniques is not practical for those who must explain their models to regulators or customers. For many decades, the models created by machine learning algorithms were generally taken to be black-boxes. However, a recent flurry of research has introduced credible techniques for interpreting complex, machine-learned models. Materials presented here illustrate applications or adaptations of these techniques for practicing data scientists. Want to contribute your own examples? Just make a pull request.

diabetes_use_case - Sample use case for Xavier AI in Healthcare conference: https://www

  •    Jupyter

Recent advances enable practitioners to break open machine learning’s “black box”. From machine learning algorithms guiding analytical tests in drug manufacture, to predictive models recommending courses of treatment, to sophisticated software that can read images better than doctors, machine learning has promised a new world of healthcare where algorithms can assist, or even outperform, professionals in consistency and accuracy, saving money and avoiding potentially life-threatening mistakes. But what if your doctor told you that you were sick but could not tell you why? Imagine a hospital that hospitalized and discharged patients but was unable to provide specific justification for these decisions. For decades, this was a roadblock for the adoption of machine learning algorithms in healthcare: they could make data-driven decisions that helped practitioners, payers, and patients, but they couldn’t tell users why those decisions were made.




GWU_data_mining - Materials for GWU DNSC 6279 and DNSC 6290.

  •    Jupyter

DNSC 6279 ("Data Mining") provides exposure to various data preprocessing, statistics, and machine learning techniques that can be used both to discover relationships in large data sets and to build predictive models. Techniques covered will include basic and analytical data preprocessing, regression models, decision trees, neural networks, clustering, association analysis, and basic text mining. Techniques will be presented in the context of data driven organizational decision making using statistical and machine learning approaches. DNSC 6290 ("Machine Learning") provides a follow up course to DNSC 6279 that will expand on both the theoretical and practical aspects of subjects covered in the pre-requisite course while optionally introducing new materials. Techniques covered may include feature engineering, penalized regression, neural networks and deep learning, ensemble models including stacked generalization and super learner approaches, matrix factorization, model validation, and model interpretation. Classes will be taught as workshops where groups of students will apply lecture materials to the ongoing Kaggle Advanced Regression and Digit Recognizer contests.

interpretable_machine_learning_with_python - Practical techniques for interpreting machine learning models

  •    Jupyter

Monotonicity constraints can turn opaque, complex models into transparent, and potentially regulator-approved models, by ensuring predictions only increase or only decrease for any change in a given input variable. In this notebook, I will demonstrate how to use monotonicity constraints in the popular open source gradient boosting package XGBoost to train a simple, accurate, nonlinear classifier on the UCI credit card default data. Once we have trained a monotonic XGBoost model, we will use partial dependence plots and individual conditional expectation (ICE) plots to investigate the internal mechanisms of the model and to verify its monotonic behavior. Partial dependence plots show us the way machine-learned response functions change based on the values of one or two input variables of interest, while averaging out the effects of all other input variables. ICE plots can be used to create more localized descriptions of model predictions, and ICE plots pair nicely with partial dependence plots. An example of generating regulator mandated reason codes from high fidelity Shapley explanations for any model prediction is also presented. The combination of monotonic XGBoost, partial dependence, ICE, and Shapley explanations is likely the most direct way to create an interpretable machine learning model today.

text-summarization-and-visualization-using-watson-studio - Can we quickly summarize & visualize text to get the details about the unstructured data? Yes we can! Please review this code pattern for all the steps involved to quickly summarize & visualize the data

  •    Jupyter

We will demonstrate a methodology to summarize & visualize text using Watson Studio. Text summarization is the process of creating a short and coherent version of a longer document. There are two methods to summarize the text, extractive & abstractive summarization. We will focus on extractive summarization which involves the selection of phrases and sentences from the source document to make up the new summary. Techniques involve ranking the relevance of phrases in order to choose only those most relevant to the meaning of the source. Some of the advantages of text summarization are below. We will also demonstrate different methods to visualize the data which can aid in providing quick peek of the data. Summaries reduce reading time. When researching documents, summaries make the selection process easier.Text summarization improves the effectiveness of indexing.Text summarization algorithms are less biased than human summarizers. Personalized summaries are useful in question-answering systems as they provide personalized information.Using automatic or semi-automatic summarization systems enables commercial abstract services to increase the number of texts they are able to process.

data-science-toolkit - Collection of stats, modeling, and data science tools in Python and R.

  •    Jupyter

Welcome! The purpose of this repository is to serve as stockpile of statistical methods, modeling techniques, and data science tools. The content itself includes everything from educational vignettes on specific topics to tailored functions built to enhance and optimize analyses. This is and will remain a work in progress, and I welcome all contributions and constructive criticism. If you have a suggestion or request, please make use of the "Issues" tab and I will respond expeditiously. All are welcome and encouraged to contribute to this repository. My only request is that you include a detailed description of your contribution, that your code be thoroughly-commented, and that you test your contribution locally with the most recent version of the Master branch integrated prior to submitting the PR.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.