the-stack - Website and datasets for The Stack, Daily Bruin's data journalism and newsroom tech blog

  •        231

Daily Bruin's data journalism and newsroom tech blog. Follow these instructions. When given the choice, install Rouge instead of Pygments for syntax highlighting. Here are some other considerations when using Jekyll on Windows.



Related Projects

awesome-datascience - :memo: An awesome Data Science repository to learn and apply for real world problems


An open source Data Science repository to learn and apply towards solving real world problems. First of all, Data Science is one of the hottest topics on the Computer and Internet farmland nowadays. People have gathered data from applications and systems until today and now is the time to analyze them. The next steps are producing suggestions from the data and creating predictions about the future. Here you can find the biggest question for Data Science and hundreds of answers from experts. Our favorite data scientist is Clare Corthell. She is an expert in data-related systems and a hacker, and has been working on a company as a data scientist. Clare's blog. This website helps you to understand the exact way to study as a professional data scientist.

Agile_Data_Code_2 - Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

  •    Jupyter

Like my work? I am Principal Consultant at Data Syndrome, a consultancy offering assistance and training with building full-stack analytics products, applications and systems. Find us on the web at There is now a video course using code from chapter 8, Realtime Predictive Analytics with Kafka, PySpark, Spark MLlib and Spark Streaming. Check it out now at

data-science-with-ruby - Practical Data Science with Ruby based tools.

  •    Ruby

Data Science is a new "sexy" buzzword without specific meaning but often used to substitute Statistics, Scientific Computing, Text and Data Mining and Visualization, Machine Learning, Data Processing and Warehousing as well as Retrieval Algorithms of any kind. This curated list comprises awesome tutorials, libraries, information sources about various Data Science applications using the Ruby programming language.

elastiflow - Network flow Monitoring (Netflow, sFlow and IPFIX) with the Elastic Stack

  •    Shell

ElastiFlow™ provides network flow data collection and visualization using the Elastic Stack (Elasticsearch, Logstash and Kibana). It supports Netflow v5/v9, sFlow and IPFIX flow types (1.x versions support only Netflow v5/v9). The following dashboards are provided.

urban-data-science - Course materials, Jupyter notebooks, tutorials, guides, and demos for a Python-based urban data science course

  •    Jupyter

This repo is my workspace for developing a cycle of course materials, IPython notebooks, and tutorials towards an academic urban data science course based on Python. Between Fall 2013 and Fall 2016, I was the grad student instructor (3 years) and co-lead instructor (1 year) for CP255, Urban Informatics and Visualization, at UC Berkeley. This course was developed by Paul Waddell and is ongoing at Berkeley with the fantastic contributions of @Arezoo-bz. If you're interested in these topics at all, you owe it to yourself to check out the latest iterations of Paul's excellent pedagogy in his CP255 repo. A couple years ago, I wrote this blog post describing our efforts for the course.

earthdata-search - Earthdata Search is a web application developed by NASA EOSDIS to enable data discovery, search, comparison, visualization, and access across EOSDIS' Earth Science data holdings

  •    Ruby

Earthdata Search is a web application developed by NASA EOSDIS to enable data discovery, search, comparison, visualization, and access across EOSDIS' Earth Science data holdings. It builds upon several public-facing services provided by EOSDIS, including the Common Metadata Repository (CMR) for data discovery and access, EOSDIS User Registration System (URS) authentication, the Global Imagery Browse Services (GIBS) for visualization, and a number of OPeNDAP services hosted by data providers. In addition to the main project, we have open sourced stand-alone components built for Earthdata Search as separate projects with the "edsc-" (Earthdata Search components) prefix.

datascience-box - Data Science Course in a Box

  •    HTML

This introductory data science course that is our (working) answer to these questions. The courses focuses on data acquisition and wrangling, exploratory data analysis, data visualization, and effective communication and approaching statistics from a model-based, instead of an inference-based, perspective. A heavy emphasis is placed on a consitent syntax (with tools from the tidyverse), reproducibility (with R Markdown) and version control and collaboration (with git/GitHub). We help ease the learning curve by avoiding local installation and supplementing out-of-class learning with interactive tools (like learnr tutorials). By the end of the semester teams of students work on fully reproducible data analysis projects on data they acquired, answering questions they care about. This repository serves as a "data science course in a box" containing all materials required to teach (or learn from) the course described above.

awesome-interactive-journalism - A list of awesome interactive journalism projects.


An opinionated list of best practice examples of data journalism and visualization projects. The goal of this list is not to list all projects out there, but to list the most outstanding examples of visual and interactive journalism. It is sorted by alphabet, so that there is no ranking.

f-stack - F-Stack is an user space network development kit with high performance based on DPDK, FreeBSD TCP/IP stack and coroutine API

  •    C

With the rapid development of Network Interface Cards the poor performance of data packet processing with the Linux kernel has become the bottleneck in modern network systems. Yet, the increasing demands of the Internet's growth demand a higher performant network processing solution. Kernel bypass has emerged to catch more and more attention. There are various similar technologies such as: DPDK, NETMAP and PF_RING. The main idea of kernel bypass is that Linux is only used to deal with control flow; all data streams are processed in user space. Therefore, kernel bypass can avoid performance bottlenecks caused by kernel packet copying, thread scheduling, system calls, and interrupts. Furthermore, kernel bypass can achieve higher performance with multi-optimizing methods. Within various techniques, DPDK has been widely used because of it's more thorough isolation from kernel scheduling and active community support. To deal with the increasingly severe DDoS attacks the authorized DNS server of Tencent Cloud DNSPod switched from Gigabit Ethernet to 10-Gigabit at the end of 2012. We faced several options: one is to continue to use the original network stack in the Linux kernel, another is to use kernel bypass techniques. After several rounds of investigation; we finally chose to develop our next generation of DNS server based on DPDK. The reason is DPDK provides ultra-high performance and can be seamlessly extended to 40G, or even 100G NIC, in the future.

scikit-plot - An intuitive library to add plotting functionality to scikit-learn objects.

  •    Python

Scikit-plot is the result of an unartistic data scientist's dreadful realization that visualization is one of the most crucial components in the data science process, not just a mere afterthought. Gaining insights is simply a lot easier when you're looking at a colored heatmap of a confusion matrix complete with class labels rather than a single-line dump of numbers enclosed in brackets. Besides, if you ever need to present your results to someone (virtually any time anybody hires you to do data science), you show them visualizations, not a bunch of numbers in Excel.

h2o-tutorials - Tutorials and training material for the H2O Machine Learning Platform

  •    Jupyter

This document contains tutorials and training materials for H2O-3. If you find any problems with the tutorial code, please open an issue in this repository. For general H2O questions, please post those to Stack Overflow using the "h2o" tag or join the H2O Stream Google Group for questions that don't fit into the Stack Overflow format.

StackExchange.DataExplorer - Stack Exchange Data Explorer

  •    Javascript

The Stack Exchange Data Explorer is a tool for executing arbitrary SQL queries against data from the various question and answer sites in the Stack Exchange network.The database can be brought up to date by running the migrate.local.bat file in the Migrations directory. This assumes an existing SQL Server database named DataExplorer with integrated security enabled. If your environment is configured differently, you will need to modify the connection string in your batch file and web.config file to reflect your setup.

stack-blog - Stack Overflow Blog

  •    HTML

This blog runs on Jekyll. Posts are written in markdown.If you are actively involved in improving the infrastructure of this project, you should read the documentation for these tools thoroughly (they're pretty short as it is). If you are simply contributing, this guide should be enough to get you going.

docker-elk - The ELK stack powered by Docker and Compose.

  •    Dockerfile

Run the latest version of the Elastic stack with Docker and Docker Compose. It will give you the ability to analyze any data set by using the searching/aggregation capabilities of Elasticsearch and the visualization power of Kibana.

longjohn - Long stack traces for node.js inspired by

  •    CoffeeScript

I wrote this while trying to add long-stack-traces to my server and realizing that there were issues with support of EventEmitter::removeListener. The node HTTP Server will begin to leak callbacks and any of your own code that relies on removing listeners would not work as anticipated. Longjohn collects a large amount of data in order to provide useful stack traces. While it is very helpful in development and testing environments, it is not recommended to use longjohn in production. The data collection puts a lot of strain on V8's garbage collector and can greatly slow down heavily-loaded applications.

python3_with_pleasure - A short guide on features of Python 3


Python became a mainstream language for machine learning and other scientific fields that heavily operate with data; it boasts various deep learning frameworks and well-established set of tools for data processing and visualization. However, Python ecosystem co-exists in Python 2 and Python 3, and Python 2 is still used among data scientists. By the end of 2019 the scientific stack will stop supporting Python2. As for numpy, after 2018 any new feature releases will only support Python3. - R Weekly

  •    R

R weekly provides weekly updates from the R community. You are welcome to contribute as long as you follow our code of conduct and our contributing guide. Update the draft post, and create a pull request.

Prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth

  •    Python

Prophet is a procedure for forecasting time series data. It is based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. It works best with daily periodicity data with at least one year of historical data. Prophet is robust to missing data, shifts in the trend, and large outliers.Prophet is open source software released by Facebook's Core Data Science team. It is available for download on CRAN and PyPI.