Displaying 1 to 5 from 5 results

DataScienceVM - Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)

  •    HTML

The Data Science Virtual Machine (DSVM) is a customized VM image on Microsoft’s Azure cloud built specifically for doing data science. It has many popular data science and other tools pre-installed and pre-configured to jump-start building intelligent applications for advanced analytics. It is available on Windows Server 2016, Windows Server 2012, and on Linux. We offer Linux edition of the DSVM in either Ubuntu 16.04 LTS or on OpenLogic 7.2 CentOS-based Linux distributions. You can try the Data Science VM for free for 30 days (with $200 credits) with a free Azure Trial. The Linux (Ubuntu-based) DSVM also provides a test drive through a button on the product page. The Test Drive will provide full access to you own instance of the VM with just a free Microsoft account (No Azure subscription or CC needed).On this repo, we will feature tools, tips and extensions (see below) to the Data Science VM. We invite the DSVM user community to contribute any useful tools or scripts, extensions you may have written to enhance the user experience on the DSVM.

ethz-web-scale-data-mining-project - ETH Zurich - Web Scale Data Processing and Mining Project

  •    HTML

This is the main repository for the web scale data mining project, which took place in summer 2014 as a research project. One of the results are the visualized topics, which have been learned autonomously from terabytes of raw HTML data.

big-data-upf - RECSM-UPF Summer School: Social Media and Big Data Research

  •    HTML

Citizens across the globe spend an increasing proportion of their daily lives online. Their activities leave behind granular, time-stamped footprints of human behavior and personal interactions that represent a new and exciting source of data to study standing questions about political and social behavior. At the same time, the volume and heterogeneity of web data present unprecedented methodological challenges. The goal of this course is to introduce participants to new computational methods and tools required to explore and analyze Big Data from online sources using the R programming language. We will focus in particular on data collected from social networking sites, such as Facebook and Twitter, whose use is becoming widespread in the social sciences. There are two ways you can follow the course and run the code contained in this GitHub repository. The recommended method is to connect to the provided RStudio server where all the R packages have already been installed, and all the R code is available. To access the server, visit bigdata.pablobarbera.com and log in with the information provided during class.




bigstatsr - R package for statistical tools with big matrices stored on disk.

  •    HTML

R package {bigstatsr} provides functions for fast statistical analysis of large-scale data encoded as matrices. The package can handle matrices that are too large to fit in memory thanks to memory-mapping to binary files on disk. This is very similar to the format big.matrix provided by R package {bigmemory}, which is no longer used by this package (see the corresponding vignette). Note that most of the algorithms of this package don't handle missing values.





We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.