For more details, please refer to Features.Experiments on public datasets show that LightGBM can outperform existing boosting frameworks on both efficiency and accuracy, with significantly lower memory consumption. What's more, the experiments show that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings.
gbdt gbm machine-learning data-mining kaggle efficiency distributed lightgbm gbrtGoogle Cloud Dataflow SDK for Java is a distribution of Apache Beam designed to simplify usage of Apache Beam on Google Cloud Dataflow service. This artifact includes the parent POM for other Dataflow SDK artifacts.
google-cloud-dataflow data-science data-analysis data-mining big-data data-processingApache Mahout has implementations of a wide range of machine learning and data mining algorithms: clustering, classification, collaborative filtering and frequent pattern mining.
machine-learning classification data-mining fuzzyscikit-learn is a Python module for machine learning built on top of SciPy. It is simple and efficient tools for data mining and data analysis. It supports automatic classification, clustering, model selection, pre processing and lot more.
machine-learning data-mining data-analysis classificationA curated list of amazingly awesome tools and resources related to the use of machine learning for cyber security. Please read CONTRIBUTING if you wish to add tools or resources.
machine-learning cyber-security data-mining awesome-listGensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. If this feature list left you scratching your head, you can first read more about the Vector Space Model and unsupervised document analysis on Wikipedia.
gensim topic-modeling information-retrieval machine-learning natural-language-processing nlp data-science data-mining word2vec word-embeddings text-summarization neural-network document-similarity word-similarity fasttextDex : The data explorer is a data visualization tool written in Java/JavaFX capable of powerful ETL and data visualization. There are 2 main ways to install Dex.
data-science data-visualization visualization data-analysis data-mining javafx d3 dataviz datavis datavisualization d3jsExtract text from any document. No muss. No fuss. Full documentation.
natural-language-processing data-mining text-miningMlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks.
machine-learning data-science data-mining association-rules supervised-learning unsupervised-learningThis GitHub repository contains the code examples of the 1st Edition of Python Machine Learning book. If you are looking for the code examples of the 2nd Edition, please refer to this repository instead. What you can expect are 400 pages rich in useful material just about everything you need to know to get started with machine learning ... from theory to the actual code that you can directly put into action! This is not yet just another "this is how scikit-learn works" book. I aim to explain all the underlying concepts, tell you everything you need to know in terms of best practices and caveats, and we will put those concepts into action mainly using NumPy, scikit-learn, and Theano.
machine-learning machine-learning-algorithms logistic-regression data-science data-mining scikit-learn neural-networkAn open source Data Science repository to learn and apply towards solving real world problems. First of all, Data Science is one of the hottest topics on the Computer and Internet farmland nowadays. People have gathered data from applications and systems until today and now is the time to analyze them. The next steps are producing suggestions from the data and creating predictions about the future. Here you can find the biggest question for Data Science and hundreds of answers from experts. Our favorite data scientist is Clare Corthell. She is an expert in data-related systems and a hacker, and has been working on a company as a data scientist. Clare's blog. This website helps you to understand the exact way to study as a professional data scientist.
data-science machine-learning data-visualization science data-mining awesome-list deep-learning analytics data-scientistsPython implementations of some of the fundamental Machine Learning models and algorithms from scratch. The purpose of this project is not to produce as optimized and computationally efficient algorithms as possible but rather to present the inner workings of them in a transparent and accessible way.
machine-learning deep-learning deep-reinforcement-learning machine-learning-from-scratch data-science data-mining genetic-algorithmПостоянно обновляемая подборка ресурсов по машинному обучению. Обсуждение машинного обучения в мессенджерах (группы, каналы, чаты, сообщества).
machine-learning data-science collections university mooc data-mining nlp neural-networks deep-learning russianCleverCSV provides a drop-in replacement for the Python csv package with improved dialect detection for messy CSV files. It also provides a handy command line tool that can standardize a messy file or generate Python code to import it. Click here to go to the introduction with more details about CleverCSV. If you're in a hurry, below is a quick overview of how to get started with the CleverCSV Python package and the command line interface.
csv-converter data-science data-mining csv csv-files python-library python3 datascience csv-format csv-reading csv-parser csv-reader csv-export csv-import csv-parsingFor deep learning, see our companion package: sktime-dl. The package is actively being developed and some features may not be stable yet.
data-science machine-learning data-mining time-series scikit-learn forecasting time-series-analysis time-series-classification time-series-regressionThis repository contains a curated list of awesome open source libraries that will help you deploy, monitor, version, scale, and secure your production machine learning.
machine-learning data-mining awesome deep-learning awesome-list interpretability privacy-preserving production-machine-learning mlops privacy-preserving-machine-learning explainability responsible-ai machine-learning-operations ml-ops ml-operations privacy-preserving-ml large-scale-ml production-ml large-scale-machine-learningnovel deep learning research works with PaddlePaddle
nlp data-mining computer-vision deep-learning knowledge-graph spatial-temporalAlink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
machine-learning data-mining statistics kafka graph-algorithms clustering word2vec regression xgboost classification recommender recommender-system apriori feature-engineering flink fm flink-ml flink-machine-learningferret is a web scraping system. It aims to simplify data extraction from the web for UI testing, machine learning, analytics and more. ferret allows users to focus on the data. It abstracts away the technical details and complexity of underlying technologies using its own declarative language. It is extremely portable, extensible, and fast. It as the ability to scrape JS rendered pages, handle all page events and emulate user interactions.
query-language data-mining scraping scraping-websites dsl cdp crawling scraper crawler chrome web-scrappingMLlib is a Spark implementation of some common machine learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction and lot more.
machine-learning data-mining data-analysis classification
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.