XGBOD - Supplementary material for IJCNN paper "XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning"

  •        53

Y. Zhao and M.K. Hryniewicki, "XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning," International Joint Conference on Neural Networks (IJCNN), IEEE, 2018. Accepted, to appear. XGBOD is a three-phase framework (see Figure below). In the first phase, it generates new data representations. Specifically, various unsupervised outlier detection methods are applied to the original data to get transformed outlier scores as new data representations. In the second phase, a selection process is performed on newly generated outlier scores to keep the useful ones. The selected outlier scores are then combined with the original features to become the new feature space. Finally, an XGBoost model is trained on the new feature space, and its output decides the outlier prediction result.




Related Projects

Pyod - A Python Toolkit for Scalable Outlier Detection (Anomaly Detection)

  •    Python

Important Notes: PyOD contains some neural network based models, e.g., AutoEncoders, which are implemented in keras. However, PyOD would NOT install keras and tensorflow automatically. This would reduce the risk of damaging your local installations. You are responsible for installing keras and tensorflow if you want to use neural net based models. An instruction is provided here. Anomaly detection resources, e.g., courses, books, papers and videos.

bell.js - No longer maintained. Use https://github.com/eleme/banshee instead please.

  •    Javascript

Bell.js is a real-time anomalies(outliers) detection system for periodic time series, built to be able to monitor a large quantity of metrics. It collects metrics form statsd, analyzes them with the 3-sigma, once enough anomalies were found in a short time it alerts us via sms/email etc.We eleme use it to monitor our website/rpc interfaces, including api called frequency, api response time(time cost per call) and exceptions count. Our services send these statistics to statsd, statsd aggregates them every 10 seconds and broadcasts the results to its backends including bell, bell analyzes current stats with history data, calculates the trending, and alerts us if the trending behaves anomalous.


  •    Java

A collection of tools for analysis in Pig and Hive.Over the next year we plan to release a handful of our internal user defined functions (UDFs) that have broad adoption across Netflix. The use cases for these functions are varied in nature (e.g. scoring predictive models, outlier detection, pattern matching, etc.) and together extend the analytical capabilities of big data.

NAB - The Numenta Anomaly Benchmark

  •    Python

Welcome. This repository contains the data and scripts comprising the Numenta Anomaly Benchmark (NAB). NAB is a novel benchmark for evaluating algorithms for anomaly detection in streaming, real-time applications. It is comprised of over 50 labeled real-world and artificial timeseries data files plus a novel scoring mechanism designed for real-time applications. Included are the tools to allow you to easily run NAB on your own anomaly detection algorithms; see the NAB entry points info. Competitive results tied to open source code will be posted in the wiki on the Scoreboard. Let us know about your work by emailing us at nab@numenta.org or submitting a pull request.

AnomalyDetection - Anomaly Detection with R

  •    R

AnomalyDetection is an open-source R package to detect anomalies which is robust, from a statistical standpoint, in the presence of seasonality and an underlying trend. The AnomalyDetection package can be used in wide variety of contexts. For example, detecting anomalies in system metrics after a new software release, user engagement post an A/B test, or for problems in econometrics, financial engineering, political and social sciences.

superviseddescent - C++11 implementation of the supervised descent optimisation method

  •    C++

superviseddescent is a C++11 implementation of the supervised descent method, which is a generic algorithm to perform optimisation of arbitrary functions. The library contains an implementation of the Robust Cascaded Regression facial landmark detection and features a pre-trained detection model.

Wazuh - Host and endpoint security

  •    C

Wazuh helps you to gain deeper security visibility into your infrastructure by monitoring hosts at an operating system and application level. This solution, based on lightweight multi-platform agents, provides the capabilities like Log management and analysis, File integrity monitoring, Intrusion and anomaly detection, Policy and compliance monitoring.

morgoth - Metric anomaly detection

  •    Go

Morgoth provides a framework for implementing the smaller pieces of an anomaly detection problem. The basic framework is that Morgoth maintains a dictionary of normal behaviors and compares new windows of data to the normal dictionary. If the new window of data is not found in the dictionary then it is considered anomalous.

Detectron - FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet

  •    Python

Detectron is Facebook AI Research's software system that implements state-of-the-art object detection algorithms, including Mask R-CNN. It is written in Python and powered by the Caffe2 deep learning framework. At FAIR, Detectron has enabled numerous research projects, including: Feature Pyramid Networks for Object Detection, Mask R-CNN, Detecting and Recognizing Human-Object Interactions, Focal Loss for Dense Object Detection, Non-local Neural Networks, Learning to Segment Every Thing, and Data Distillation: Towards Omni-Supervised Learning.

hed - code for Holistically-Nested Edge Detection

  •    C++

We develop a new edge detection algorithm, holistically-nested edge detection (HED), which performs image-to-image prediction by means of a deep learning model that leverages fully convolutional neural networks and deeply-supervised nets. HED automatically learns rich hierarchical representations (guided by deep supervision on side responses) that are important in order to resolve the challenging ambiguity in edge and object boundary detection. We significantly advance the state-of-the-art on the BSD500 dataset (ODS F-score of .790) and the NYU Depth dataset (ODS F-score of .746), and do so with an improved speed (0.4s per image). Detailed description of the system can be found in our paper. If you have downloaded the previous version (testing code) of HED, please note that we updated the code base to the new version of Caffe. We uploaded a new pretrained model with better performance. We adopted the python interface written for the FCN paper instead of our own implementation for training and testing. The evaluation protocol doesn't change.

semi-supervised-pytorch - Implementations of different VAE-based semi-supervised and generative models in PyTorch

  •    Python

A PyTorch-based package containing useful models for modern deep semi-supervised learning and deep generative models. Want to jump right into it? Look into the notebooks. 2018.04.17 - The Gumbel softmax notebook has been added to show how you can use discrete latent variables in VAEs. 2018.02.28 - The β-VAE notebook was added to show how VAEs can learn disentangled representations.

Apache Metron - Real-time Big Data Security

  •    Java

Metron integrates a variety of open source big data technologies in order to offer a centralized tool for security monitoring and analysis. Metron provides capabilities for log aggregation, full packet capture indexing, storage, advanced behavioral analytics and data enrichment, while applying the most current threat intelligence information to security telemetry within a single platform.

Deeplearning4J - Neural Net Platform in Java and Scala

  •    Java

Deeplearning4J is an open source, distributed neural net library written in Java and Scala. It integrates with Hadoop and Spark and runs on several backends that enable use of CPUs and GPUs. It provides versatile n-dimensional array class for Java and Scala.

sensey - :zap: [Android Library] Play with sensor events & detect gestures in a breeze.

  •    Java

The library is built for simplicity and ease of use. It eliminates most boilerplate code for dealing with setting up sensor based event and gesture detection on Android. Starting with 1.0.1, Changes exist in the releases tab.

Pigo - Go implementation of Pico face detection library (Pico)

  •    Go

Pigo is a pure Go face detection library based on Pixel Intensity Comparison-based Object detection paper. The only existing solution for face detection in the Go ecosystem is using bindings to OpenCV, but installing OpenCV on various platforms is sometimes daunting. This library does not require any third party modules to be installed. However in case you wish to try the real time, webcam based face detection you might need to have Python2 and OpenCV installed, but the core API does not require any third party module or external dependency.

ImageAI - A python library built to empower developers to build applications and systems with self-contained Computer Vision capabilities

  •    Python

A python library built to empower developers to build applications and systems with self-contained Deep Learning and Computer Vision capabilities using simple and few lines of code. Built with simplicity in mind, ImageAI supports a list of state-of-the-art Machine Learning algorithms for image prediction, custom image prediction, object detection, video detection, video object tracking and image predictions trainings. ImageAI currently supports image prediction and training using 4 different Machine Learning algorithms trained on the ImageNet-1000 dataset. ImageAI also supports object detection, video detection and object tracking using RetinaNet, YOLOv3 and TinyYOLOv3 trained on COCO dataset. Eventually, ImageAI will provide support for a wider and more specialized aspects of Computer Vision including and not limited to image recognition in special environments and special fields.

nips14-ssl - Code for reproducing results of NIPS 2014 paper "Semi-Supervised Learning with Deep Generative Models"

  •    Python

Code for reproducing some key results of our NIPS 2014 paper on semi-supervised learning (SSL) with deep generative models. Please cite this paper when using this code for your research.


  •    Perl

devialog is a behavior/anomaly-based syslog intrusion detection system which detects unknown attacks via anomalies in syslog. It can generate signatures for ease of management, act upon anomalies in a predefined fashion or perform as a standard log parser


  •    C

vSentinel is a customizable 3D mapping of your network monitoring or security data for real-time or trend-based attack and anomaly detection and analysis.

Jubatus - Framework and Library for Distributed Online Machine Learning

  •    C++

Jubatus is a distributed processing framework and streaming machine learning library. Jubatus includes these functionalities: Online Machine Learning Library: Classification, Regression, Recommendation (Nearest Neighbor Search), Graph Mining, Anomaly Detection, Clustering, Feature Vector Converter (fv_converter): Data Preprocess and Feature Extraction, Framework for Distributed Online Machine Learning with Fault Tolerance.