Displaying 1 to 5 from 5 results

AzureDataLake - Samples and Docs for Azure Data Lake Store and Analytics

  •    

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

usql - U-SQL Examples and Issue Tracking

  •    CSharp

U-SQL is a new language from Microsoft for processing big data. U-SQL combines the familiar syntax of SQL with the expressiveness of custom code written in C#, on top of a scale-out runtime that can handle any size data.

aws-dbs-refarch-datalake - Reference Architectures for Datalakes on AWS

  •    HTML

A datalake is a data repository that stores data in its raw format until it is used for analytics. It is designed to store massive amount of data at scale. A schema to the dataset in data lake is given as part of transformation while reading it. Below is a pictorial representation of a typical datalake on AWS cloud. Keeping track of all of the raw assets that are loaded into your datalake, and then tracking all of the new data assets and versions that are created by data transformation, data processing, and analytics can be a major challenge. An essential component of an Amazon S3 based data lake is a Data Catalog. A data catalog is designed to provide a single source of truth about the contents of the data lake, and rather than end users reasoning about storage buckets and prefixes, a data catalog lets them interact with more familiar structures of databases, tables, and partitions.