dlp-dataflow-deidentification - Data Tokenization PoC Using Dataflow/Beam and DLP API

  •        20

You can see there are some sensitive information in the blob. This program will inspect and deidentify for the all 4 info types in the example. This is useful for the use case where chat log or log files may contain sensitive information. Currently it only supports reading file from GCS bucket and output to another GCS bucket.

https://github.com/GoogleCloudPlatform/dlp-dataflow-deidentification

Tags
Implementation
License
Platform

   




Related Projects

DataflowJavaSDK - Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines

  •    Java

Google Cloud Dataflow SDK for Java is a distribution of Apache Beam designed to simplify usage of Apache Beam on Google Cloud Dataflow service. This artifact includes the parent POM for other Dataflow SDK artifacts.

scio - A Scala API for Apache Beam and Google Cloud Dataflow.

  •    Scala

Verb: I can, know, understand, have knowledge.Scio is a Scala API for Apache Beam and Google Cloud Dataflow inspired by Apache Spark and Scalding.

Cloakify - CloakifyFactory - Data Exfiltration & Infiltration In Plain Sight; Convert any filetype into list of everyday strings; Evade DLP/MLS Devices; Defeat Data Whitelisting Controls; Social Engineering of Analysts; Evade AV Detection

  •    Python

CloakifyFactory & the Cloakify Toolset - Data Exfiltration & Infiltration In Plain Sight; Evade DLP/MLS Devices; Social Engineering of Analysts; Defeat Data Whitelisting Controls; Evade AV Detection. Text-based steganography using lists. Convert any file type (e.g. executables, Office, Zip, images) into a list of everyday strings. Very simple tools, powerful concept, limited only by your imagination. For a quick start on CloakifyFactory, see the cleverly titled file "README_GETTING_STARTED.txt" in the project for a walkthrough.

Apache Beam - Unified model for defining both batch and streaming data-parallel processing pipelines

  •    Java

Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the pipeline. The pipeline is then executed by one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.

timely-dataflow - A modular implementation of timely dataflow in Rust

  •    Rust

Timely dataflow is a low-latency cyclic dataflow computational model, introduced in the paper Naiad: a timely dataflow system. This project is an extended and more modular implementation of timely dataflow in Rust. This project is something akin to a distributed data-parallel compute engine, which scales the same program up from a single thread on your laptop to distributed execution across a cluster of computers. The main goals are expressive power and high performance. It is probably strictly more expressive and faster than whatever you are currently using, assuming you aren't yet using timely dataflow.


differential-dataflow - An implementation of differential dataflow using timely dataflow on Rust.

  •    Rust

An implementation of differential dataflow over timely dataflow on Rust. Differential dataflow is a data-parallel programming framework designed to efficiently process large volumes of data and to quickly respond to arbitrary changes in input collections.

SSIS Dataflow Discoverer (DFLD)

  •    

Dataflow Discoverer (DFLD) is a command-line discovery utility that detects and writes the metadata of SSIS dataflow columns to a SQL Server lookup table. DFLD detects Dataflows nested within containers up to any level of nesting.

TPL DataFlow Debugger Visualizer

  •    

Graphic debugger visualizer to TPL DataFlow networks enable to see live state of blocks and relations Compatible with VS 2012 RC

F# Dataflow for ViewModels

  •    

F# library for creating bindable variables and dataflow computations. Computed variables can be ordered in a way that prevents unnecessary recomputations.

nodeeditor - Qt Node Editor. Dataflow programming framework

  •    C++

NodeEditor is conceived as a general-purpose Qt-based library aimed at graph-controlled data processing. Nodes represent algorithms with certain inputs and outputs. Connections transfer data from the output (source) of the first node to the input (sink) of the second one. NodeEditor framework is a Visual Dataflow Programming tool. A library client defines models and registers them in the data model registry. Further work is driven by events taking place in DataModels and Nodes. The model computing is triggered upon arriving of any new input data. The computed result is propagated to the output connections. Each new connection fetches available data and propagates is further.

pythonect - A general-purpose dataflow programming language based on Python, written in Python

  •    Python

A general-purpose dataflow programming language based on Python, written in Python

Amarok Framework Library

  •    DotNet

This framework library is an attempt to take advantage of the actor/agent programming model for standalone desktop applications. Most of the concepts are inspired by the actor model, Microsoft Robotics CCR and the TPL Dataflow library.

wireit - A javascript wiring library to create web wirable interfaces for dataflow applications, visual programming languages or graphical modeling

  •    Javascript

WireIt is an open-source javascript library to create web wirable interfaces for dataflow applications, visual programming languages, graphical modeling, or graph editors.

Magento Mass Importer

  •    PHP

This project is an attempt to provide a product importer for magento that can handle several thousands of products at a reasonable pace compared to magento dataflow. it may evolve to a dataflow alternative as new features are implemented.

javelin - Spreadsheet-like dataflow programming in ClojureScript.

  •    Clojure

Spreadsheet-like dataflow programming in ClojureScript. There are many more examples in the Javelin tests.

event-gateway - The Event Gateway combines both API Gateway and Pub/Sub functionality into a single event-driven experience

  •    Go

Use the Event Gateway right now, by running the Event Gateway Example Application locally, with the Serverless Framework. The Event Gateway is a L7 proxy and realtime dataflow engine, intended for use with Functions-as-a-Service on AWS, Azure, Google & IBM.

fractalide - Reusable Reproducible Composable Software

  •    Racket

Fractalide is a free and open source service programming platform using dataflow graphs. Graph nodes represent computations, while graph edges represent typed data (may also describe tensors) communicated between them. This flexible architecture can be applied to many different computation problems, initially the focus will be Microservices to be expanded out into the Internet of Things. Fractalide is in the same vein as the NSA’s Niagrafiles (now known as Apache-NiFi) or Google’s TensorFlow but stripped of all Java, Python and GUI bloat. Fractalide faces big corporate players like Ab Initio, a company that charges a lot of money for dataflow solutions.

rete - JavaScript framework for visual programming and creating node editor

  •    Javascript

Rete is a modular framework for visual programming. Rete allows you to create node-based editor directly in the browser. You can define nodes and workers that allow users to create instructions for processing data in your editor without a single line of code. Check the docs and learn about the components and capabilities.

SpillGuard

  •    C++

SpillGuard is a Data Loss/Leak Prevention (DLP) plugin for Microsoft Office designed to help prevent the opening, saving or printing of Microsoft Office files containing classification markings higher than the classification of the user's computer.