Remotery - Single C file, Realtime CPU/GPU Profiler with Remote Web Viewer

  •        71

A realtime CPU/GPU profiler hosted in a single C file with a viewer that runs in a web browser. Windows (MSVC) - add lib/Remotery.c and lib/Remotery.h to your program. Set include directories to add Remotery/lib path. The required library ws2_32.lib should be picked up through the use of the #pragma comment(lib, "ws2_32.lib") directive in Remotery.c.



Related Projects

GPUImage3 - GPUImage 3 is a BSD-licensed Swift framework for GPU-accelerated video and image processing using Metal

  •    Swift

GPUImage 3 is the third generation of the GPUImage framework, an open source project for performing GPU-accelerated image and video processing on Mac and iOS. The original GPUImage framework was written in Objective-C and targeted Mac and iOS, the second iteration rewritten in Swift using OpenGL to target Mac, iOS, and Linux, and now this third generation is redesigned to use Metal in place of OpenGL. The objective of the framework is to make it as easy as possible to set up and perform realtime video processing or machine vision against image or video sources. Previous iterations of this framework wrapped OpenGL (ES), hiding much of the boilerplate code required to render images on the GPU using custom vertex and fragment shaders. This version of the framework replaces OpenGL (ES) with Metal. Largely driven by Apple's deprecation of OpenGL (ES) on their platforms in favor of Metal, it will allow for exploring performance optimizations over OpenGL and a tighter integration with Metal-based frameworks and operations.

scalene - Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python

  •    Python

by Emery Berger, Sam Stern, and Juan Altmayer Pizzorno. Scalene is a high-performance CPU, GPU and memory profiler for Python that does a number of things that other Python profilers do not and cannot do. It runs orders of magnitude faster than other profilers while delivering far more detailed information.

mshadow - Matrix Shadow:Lightweight CPU/GPU Matrix and Tensor Template Library in C++/CUDA for (Deep) Machine Learning

  •    C++

MShadow is a lightweight CPU/GPU Matrix/Tensor Template Library in C++/CUDA. The goal of mshadow is to support efficient, device invariant and simple tensor library for machine learning project that aims for maximum performance and control, while also emphasize simplicity.MShadow also provides interface that allows writing Multi-GPU and distributed deep learning programs in an easy and unified way.

Arraymancer - A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU, OpenCL and embedded devices

  •    Nim

Arraymancer is a tensor (N-dimensional array) project in Nim. The main focus is providing a fast and ergonomic CPU, Cuda and OpenCL ndarray library on which to build a scientific computing and in particular a deep learning ecosystem. The library is inspired by Numpy and PyTorch. The library provides ergonomics very similar to Numpy, Julia and Matlab but is fully parallel and significantly faster than those libraries. It is also faster than C-based Torch.

MangoHud - A Vulkan and OpenGL overlay for monitoring FPS, temperatures, CPU/GPU load and more

  •    C

A Vulkan and OpenGL overlay for monitoring FPS, temperatures, CPU/GPU load and more. Once done, proceed to the installation.

cusignal - cuSignal - RAPIDS Signal Processing Library

  •    Python

The RAPIDS cuSignal project leverages CuPy, Numba, and the RAPIDS ecosystem for GPU accelerated signal processing. In some cases, cuSignal is a direct port of Scipy Signal to leverage GPU compute resources via CuPy but also contains Numba CUDA and Raw CuPy CUDA kernels for additional speedups for selected functions. cuSignal achieves its best gains on large signals and compute intensive functions but stresses online processing with zero-copy memory (pinned, mapped) between CPU and GPU. NOTE: For the latest stable ensure you are on the latest branch.

ShaderConductor - ShaderConductor is a tool designed for cross-compiling HLSL to other shading languages

  •    C++

ShaderConductor is a tool designed for cross-compiling HLSL to other shading languages. Note that this project is still in an early stage, and it is under active development.

herebedragons - A basic 3D scene implemented with various engines, frameworks or APIs.

  •    C

Hic sunt dracones. This repository contains multiple implementations of the same 3D scene, using different APIs and frameworks on various platforms. The goal is to provide a comparison between multiple rendering methods. This is inherently biased due to the variety of algorithms used and available CPU/GPU configurations, but can hopefully still provide interesting insights on 3D rendering.

emu - a language for programming GPUs, with a focus on ergonomics first and performance second

  •    Rust

⚠ Please note that while Emu 0.2.0 is quite usable, it suffers from 2 key issues. It firstly does nothing to minimize CPU-GPU data transfer and secondly it's compiler is not well-tested. These can be reasons not to use Emu 0.2.0. A new version of Emu is in the works, however, with significant improvements in the language, compiler, and compile-time checker. This new version of Emu should be released some time in Q4 of 2019. But unlike OpenCL/CUDA/Halide/Futhark, Emu is embedded in Rust. This lets it take advantage of the ecosystem in ways...

RustaCUDA - Rusty wrapper for the CUDA Driver API

  •    Rust

RustaCUDA helps you bring GPU-acceleration to your projects by providing a flexible, easy-to-use interface to the CUDA GPU computing toolkit. RustaCUDA makes it easy to manage GPU memory, transfer data to and from the GPU, and load and launch compute kernels written in any language. RustaCUDA is intended to provide a programmer-friendly library for working with the host-side CUDA Driver API. It is not intended to assist in compiling Rust code to CUDA kernels (though see rust-ptx-builder for that) or to provide device-side utilities to be used within the kernels themselves.

pathfinder - A fast, practical GPU rasterizer for fonts and vector graphics

  •    Rust

Pathfinder 3 is a fast, practical, GPU-based rasterizer for fonts and vector graphics using OpenGL 3.0+, OpenGL ES 3.0+, or Metal. Please note that Pathfinder is under heavy development and is incomplete in various areas.

cuda-api-wrappers - Thin C++-flavored wrappers for the CUDA Runtime API

  •    C++

nVIDIA's Runtime API for CUDA is intended for use both in C and C++ code. As such, it uses a C-style API, the lowest common denominator (with a few notable exceptions of templated function overloads). This library of wrappers around the Runtime API is intended to allow us to embrace many of the features of C++ (including some C++11) for using the runtime API - but without reducing expressivity or increasing the level of abstraction (as in, e.g., the Thrust library). Using cuda-api-wrappers, you still have your devices, streams, events and so on - but they will be more convenient to work with in more C++-idiomatic ways.

bgfx - Cross-platform, graphics API agnostic, "Bring Your Own Engine/Framework" style rendering library

  •    C++

Cross-platform, graphics API agnostic, "Bring Your Own Engine/Framework" style rendering library. AirMech is a free-to-play futuristic action real-time strategy video game developed and published by Carbon Games.

fgprof - 🚀 fgprof is a sampling Go profiler that allows you to analyze On-CPU as well as Off-CPU (e

  •    Go

fgprof is a sampling Go profiler that allows you to analyze On-CPU as well as Off-CPU (e.g. I/O) time together. Go's builtin sampling CPU profiler can only show On-CPU time, but it's better than fgprof at that. Go also includes tracing profilers that can analyze I/O, but they can't be combined with the CPU profiler.

renderdoc - RenderDoc is a stand-alone graphics debugging tool.

  •    C++

RenderDoc is a frame-capture based graphics debugger, currently available for Vulkan, D3D11, D3D12, OpenGL, and OpenGL ES development on Windows 7 - 10, Linux, and Android. It is completely open-source under the MIT license. If you have any questions, suggestions or problems or you can create an issue here on github, email me directly or come into IRC to discuss it.

neanderthal - Fast Clojure Matrix Library

  •    Clojure

Neanderthal is a Clojure library for fast matrix and linear algebra computations based on the highly optimized native libraries of BLAS and LAPACK computation routines for both CPU and GPU.. Read the documentation at Neanderthal Web Site.

Imogen - GPU Texture Generator

  •    Python

WIP of a GPU Texture generator using dear imgui for UI. Not production ready and a bit messy but really fun to code. Basically, add GPU and CPU nodes in a graph to manipulate and generate images. Nodes are hardcoded now but a discovery system is planned. Currently nodes can be written in GLSL or C or Python. Use CMake and VisualStudio to build it. Only Windows system supported for now.

stackimpact-python - StackImpact Python Profiler - Production-Grade Performance Profiler: CPU, memory allocations, blocking calls, exceptions, metrics, and more

  •    Python

StackImpact is a production-grade performance profiler built for both production and development environments. It gives developers continuous and historical code-level view of application performance that is essential for locating CPU, memory allocation and I/O hot spots as well as latency bottlenecks. Included runtime metrics and error monitoring complement profiles for extensive performance analysis. Learn more at Learn more on the features page (with screenshots).

gunrock - High-Performance Graph Primitives on GPUs

  •    Cuda

Gunrock is a CUDA library for graph-processing designed specifically for the GPU. It uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. For more details, please visit our website, read Why Gunrock, our TOPC 2017 paper Gunrock: GPU Graph Analytics, look at our results, and find more details in our publications. See Release Notes to keep up with the our latest changes.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.