telemetry-server - Server for the Mozilla Telemetry project

  •        0

[Project Wiki][3] for more information.See the [TODO list]( for some outstanding tasks.Storage Format-----------------See [StorageFormat][1] for details.On-disk Storage Structure----------------------------See [StorageLayout][2] for details.Data Converter-----------------1. Use [RevisionCache](telemetry/ to load the correct Histograms.json for a given payload 1. Use `revision` if possible 2. Fall back to `appUpdateChannel` and `appBuildID` or `appVersion` as needed 3. Use the Mercurial history to export each version of Histograms.json with the date range it was in effect for each repo (mozilla-central, -aurora, -beta, -release) 4. Keep local cache of Histograms.json versions to avoid re-fetching2. Filter out bad submission data 1. Invalid histogram names 2. Histogram configs that don't match the expected parameters (histogram type, num buckets, etc) 3. Keep metrics for bad dataMapReduce------------We have implemented a lightweight [MapReduce framework][6] that uses the Operating System's support for parallelism. It relies on simple python functions for the Map, Combine, and Reduce phases.For data stored on multiple machines, each machine will run a combine phase, with the final reduce combining output for the entire cluster.Mongodb Importer----------------Telemetry data can be optionally imported into mongodb. The benefits of doing that isthe reduced time to run multiple map-reduce jobs on the same dataset, as mongodb keepsas much data as possible in memory.1. Start mongodb, e.g. `mongod --nojournal`2. Fetch a dataset from S3, e.g. `aws s3 cp s3://... /mnt/yourdataset --recursive`3. Import the dataset, e.g. `python3 -m mongodb.importer /mnt/yourdataset`4. Run a map-reduce job, e.g. `mongo localhost/telemetry mongodb/examples/osdistribution.js`Plumbing-----------Once we have the converter and MapReduce framework available, we can easily consume from the existing Telemetry data source. This will mark the first point that the new dashboards can be fed with live data.Integration with the existing pipeline is discussed in more detail on the [Bagheera Integration][7] page.Data Acquisition-------------------When everything is ready and productionized, we will route the client (Firefox) submissions directly into the [new pipeline][8].Code Overview=============These are the important parts of the Telemetry Server architecture.`http/server.js`-----------Contains the Node.js HTTP server for receiving payloads. The server's job issimply to write incoming submissions to disk as quickly as possible.It accepts single submissions using the same type of URLs supported by[Bagheera][7], and expects (but doesn't require) the [partition information][9]to be submitted as part of the URL.`telemetry/`------------Contains the `Converter` class, which is used to convert a JSON payload fromthe raw form submitted by Firefox to the more compact [storage format][1] foron-disk storage and processing.You can run the main method in this file to process a given data file (theexpected format is one record per line, each line containing an id followed bya tab character, followed by a json string).You can also use the `Converter` class to convert data in a more flexible way.`telemetry/`-----------Contains code to export data to Amazon S3.`telemetry/`------------Contains the `StorageLayout` class, which is used to save payloads to diskusing the directory structure as documented in the [storage layout][2] sectionabove.`telemetry/`-------------------Contains the `RevisionCache` class, which provides a mechanism for fetchingthe `Histograms.json` spec file for a given revision URL. Histogram data iscached locally on disk and in-memory as revisions are requested.`telemetry/`---------------------Contains the `TelemetrySchema` class, which encapsulates logic used by theStorageLayout and MapReduce code.`process_incoming/`------------------------Contains the multi-process version of the data-transformation code. This isused to download incoming data (as received by the HTTP server), validate andconvert it, then publish the results back to S3.`process_incoming/worker`----Contains the C++ data validation and conversion routines.Prerequisites----* Clang 3.1 or GCC 4.7.0 or Visual Studio 10* CMake (2.8.7+) -* Boost (1.54.0) -* zlib* OpenSSL* ProtobufOptional (used for documentation)----* Graphviz (2.28.0) -* Doxygen (1.8+)-



comments powered by Disqus

Related Projects

telemetry-server - Server for the Mozilla Telemetry project

Server for the Mozilla Telemetry project

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.

Tag Cloud >>