data-federation-project - A project focused on tools and best practices to supported federated data collection efforts

  •        7

Federated data refers to data that are aggregated across a number of organizations, departments or agencies at various scales. Managing the volume, quality and completeness of data coming in from multiple sources is a common challenge for government agencies charged with collecting and maintaining domain-specific information. This project aims to address this challenge by providing ways to ingest data more easily, and provide real-time user feedback so that data contributors can make corrections before it is submitted to a central repository. Federated data efforts are increasingly seen as an engine for transparency, economic growth, and accountability, yet collecting this kind of data remains a challenge. Despite the fact that efforts of this sort are increasing in frequency, each new effort is still improvising solutions in terms of processes, tooling, and compliance infrastructure.

https://github.com/18F/data-federation-project

Tags
Implementation
License
Platform

   




Related Projects

SQL Azure Federation Data Migration Wizard

  •    

SQL Azure Federation Data Migration Wizard simplifies the process of migrating data from a single database to multiple federation members in SQL Azure Federation.

kubernetes-cluster-federation - Kubernetes cluster federation tutorial

  •    

This tutorial will walk you through setting up a Kubernetes cluster federation composed of four Kubernetes clusters across multiple GCP regions.This guide is not for people looking for a fully automated command to bring up a Kubernetes cluster federation. If that's you then check out Setting up Cluster Federation with Kubefed.

DataNucleus

  •    Scala

DataNucleus provides Java data persistence and management platform allowing federation of data as well as JDO, JPA and web services interfaces. It supports heterogeneous datastores (RDBMS, MongoDB, LDAP, Excel, XML, NeoDatis, JSON, ODF, BigTable, HBase, Cassandra)

CGI Open Source Federation

  •    PHP

COSF is an acronym for The CGI Open Source Federation. Umbrella project for many CGI scripts members are always welcome. Visit our homepage http://cosf.sourceforge.net for more information.

STS Federation Metadata Editor

  •    

This is a federation metadata editor for Security Token Services (STS). STSs can be created on any platform (as long as it's based on the oasis standart). The original scope of this project was to edit the FederationMetadata.xml created by the Windows Identity Foundation (WIF).


Relying Party Federation Metadata Editor

  •    

This is a federation metadata editor for relying party trust applications. RPs can be created on any platform (as long as it's based on the oasis standart).

Pixelfed - Federated Image Sharing (WIP)

  •    PHP

PixelFed is a federated social image sharing platform, similar to Instagram. Federation is done using the ActivityPub protocol, which is used by Mastodon, PeerTube, Pleroma, and more. Through ActivityPub PixelFed can share and interact with these platforms, as well as other instances of PixelFed.

z-i - Register of Internet Addresses filtered in Russian Federation

  •    

Register of Internet Addresses filtered in Russian Federation

passport-azure-ad - Azure Active Directory Authentication Strategies using Node and Passportjs

  •    Javascript

passport-azure-ad is a collection of Passport Strategies to help you integrate with Azure Active Directory. It includes OpenID Connect, WS-Federation, and SAML-P authentication and authorization. These providers let you integrate your Node app with Microsoft Azure AD so you can use its many features, including web single sign-on (WebSSO), Endpoint Protection with OAuth, and JWT token issuance and validation. passport-azure-ad has been tested to work with both Microsoft Azure Active Directory and with Microsoft Active Directory Federation Services.

streamalert - StreamAlert is a serverless, realtime data analysis framework which empowers you to ingest, analyze, and alert on data from any environment, using datasources and alerting logic you define

  •    Python

StreamAlert is a serverless, realtime data analysis framework which empowers you to ingest, analyze, and alert on data from any environment, using datasources and alerting logic you define.

cstore_fdw - Columnar store for analytics with Postgres, developed by Citus Data

  •    C

Cstore_fdw is an open source columnar store extension for PostgreSQL. Columnar stores provide notable benefits for analytics use cases where data is loaded in batches. Cstore_fdw’s columnar nature delivers performance by only reading relevant data from disk, and it may compress data 6x-10x to reduce space requirements for data archival. Cstore_fdw is developed by Citus Data and can be used in combination with Citus, a postgres extension that intelligently distributes your data and queries across many nodes so your database can scale and your queries are fast. If you have any questions about how Citus can help you scale or how to use Citus in combination with cstore_fdw, please let us know.

Blueflood - A distributed system designed to ingest and process time series data

  •    Java

Blueflood is a high throughput, low latency, multi-tenant distributed metric processing system behind Rackspace Metrics, which is currently used in production by the Rackspace Monitoring team and Rackspace Public Cloud team to store metrics generated by their systems. Data from Blueflood can be used to construct dashboards, generate reports, graphs or for any other use involving time-series data. It focuses on near-realtime data, with data that is queryable mere milliseconds after ingestion.

Open Data Publisher

  •    

Open Data Publisher is a C# MVC application that allows you to quickly and easily publish data from SQL Server tables and views using the OData protocol, and present data from OData and static sources to the public for download and exploration.

OpenML - Open Machine Learning

  •    CSS

We are a group of people who are excited about open science, open data and machine learning. We want to make machine learning and data analysis simple, accessible, collaborative and open with an optimal division of labour between computers and humans. OpenML is an online machine learning platform for sharing and organizing data, machine learning algorithms and experiments. It is designed to create a frictionless, networked ecosystem, that you can readily integrate into your existing processes/code/environments, allowing people all over the world to collaborate and build directly on each other’s latest ideas, data and results, irrespective of the tools and infrastructure they happen to use.

Telepat - Awesome modern applications are real time

  •    Javascript

Telepat is an API centric backend that instantly delivers data, updates and messages to and from web, mobile or IoT apps. Telepat can ingest high speed data from a variety of sources and can display it in real time into complex and easy to use dashboards for organizations on all levels. It makes collaboration between people simple helping them see their input and changes faster than they can blink.

Pinot - A realtime distributed OLAP datastore

  •    Java

Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally, so that it can scale to larger data sets and higher query rates as needed.

Shark - Hive on Spark

  •    Scala

Shark is an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users. It runs Hive queries up to 100x faster in memory, or 10x on disk. it is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive.

reality - Comprehensive data proxy to knowledge about real world

  •    Ruby

Reality is experimental Ruby library/set of libraries to provide uniform query access to heterogenous web API, with accent on real-world knowledge ones. It emphasizes simplicity of data access and interoperability of data from various sources. The ultimate goal is to make the world inspectable and computable.

football

  •    Javascript

Free open public domain football data in the JSON (JavaScript Object Notation) data interchange format. Any leagues or tournaments missing? Contributions welcome! For starting your own repo from scratch see the League Quick Starter Kit.

MMLSpark - Microsoft Machine Learning for Apache Spark

  •    Scala

MMLSpark provides a number of deep learning and data science tools for Apache Spark, including seamless integration of Spark Machine Learning pipelines with Microsoft Cognitive Toolkit (CNTK) and OpenCV, enabling you to quickly create powerful, highly-scalable predictive and analytical models for large image and text datasets.MMLSpark requires Scala 2.11, Spark 2.1+, and either Python 2.7 or Python 3.5+. See the API documentation for Scala and for PySpark.