rmm - RAPIDS Memory Manager

  •        22

Achieving optimal performance in GPU-centric workflows frequently requires customizing how host and device memory are allocated. For example, using "pinned" host memory for asynchronous host <-> device memory transfers, or using a device memory pool sub-allocator to reduce the cost of dynamic device memory allocation. For information on the interface RMM provides and how to use RMM in your C++ code, see below.

https://github.com/rapidsai/rmm

Tags
Implementation
License
Platform

   




Related Projects

cusignal - cuSignal - RAPIDS Signal Processing Library

  •    Python

The RAPIDS cuSignal project leverages CuPy, Numba, and the RAPIDS ecosystem for GPU accelerated signal processing. In some cases, cuSignal is a direct port of Scipy Signal to leverage GPU compute resources via CuPy but also contains Numba CUDA and Raw CuPy CUDA kernels for additional speedups for selected functions. cuSignal achieves its best gains on large signals and compute intensive functions but stresses online processing with zero-copy memory (pinned, mapped) between CPU and GPU. NOTE: For the latest stable README.md ensure you are on the latest branch.

HeapInspector-for-iOS - Find memory issues & leaks in your iOS app without instruments

  •    Objective-C

HeapInspector is a debug tool that monitors the memory heap with backtrace recording in your iOS app. You can discover memory leaks, no longer used objects, abandoned memory and more issues directly on your device without ever starting Instruments. Since ARC has been introduced we don't need to manage the retain & release anymore. ARC is very powerful and makes Objective-C more stable. ARC decreased the number of crashes and improves the memory footprint. ARC is technically doing a powerful job. It knows when to retain, autorelease and release. But ARC doesn't think about the overall architecture how to design for low memory usage. You should be aware that you can still do a lot of things wrong with your memory (even with ARC). You can still get memory pressures or peaks with ARC.

memory-allocators - Custom memory allocators in C++ to improve the performance of dynamic memory allocation

  •    C++

When applications need more memory this can be allocated in the heap (rather than in the stack) in runtime. This memory is called 'dynamic memory' because it can't be known at compile time and its need changes during the execution. Our programs can ask for dynamic memory usin 'malloc'. Malloc returns an address to a position in memory where we can store our data. Once we're done with that data, we can call 'free' to free the memory and let others processes use it. For this project I've implemented different ways to manage by ourselves dynamic memory in C++.This means that instead of using native calls like 'malloc' or 'free' we're going to use a custom memory allocator that will do this for us but in a more efficient way. The goal, then, is to understand how the most common allocators work, what they offer and compare them to see which one performs better.

scalene - Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python

  •    Python

by Emery Berger, Sam Stern, and Juan Altmayer Pizzorno. Scalene is a high-performance CPU, GPU and memory profiler for Python that does a number of things that other Python profilers do not and cannot do. It runs orders of magnitude faster than other profilers while delivering far more detailed information.


cuspatial - CUDA-accelerated GIS and spatiotemporal algorithms

  •    Python

NOTE: cuSpatial depends on cuDF and RMM from RAPIDS. The rest of steps assume the environment variable CUDF_HOME points to the root directory of your clone of the cuDF repo, and that the cudf_dev Anaconda environment created in step 3 is active.

scalloc - A Fast, Multicore-Scalable, Low-Fragmentation Memory Allocator

  •    C++

scalloc provides general-purpose memory allocation involving many threads on many cores can be done with high performance, multicore scalability, and low memory consumption. The main ideas behind the design of scalloc are: uniform treatment of small and big objects through so-called virtual spans, efficiently and effectively reclaiming free memory through fast and scalable global data structures.

OOMDetector - OOMDetector is a memory monitoring component for iOS which provides you with OOM monitoring, memory allocation monitoring, memory leak detection and other functions

  •    Objective-C++

OOMDetector is a memory monitoring component for iOS which provides you with OOM monitoring, memory allocation monitoring, memory leak detection and other functions.

php-memory-profiler - Memory usage profiler for PHP

  •    C

php-memprof profiles memory usage of PHP scripts, and especially can tell which function has allocated every single byte of memory currently allocated.In script 1, a before/after approach would designate file_get_contents() as huge memory consumer, while the memory it allocates is actually freed quickly after it returns. When dumping the memory usage after a() returns, the memprof approach would show that file_get_contents() is a small memory consumer since the memory it allocated has been freed at the time memprof_dump_array() is called.

mempool

  •    

mempool is a library written in C language to manage memory allocation. It may be used as a replacement of malloc and free for frequent memory allocation. mempool allocate memory block at initial time with predefined memory block size.

cudf - cuDF - GPU DataFrame Library

  •    C++

NOTE: For the latest stable README.md ensure you are on the main branch. Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.

Apache Mnemonic - Non-volatile hybrid memory storage oriented library

  •    Java

Apache Mnemonic is a non-volatile hybrid memory storage oriented library, it proposed a non-volatile/durable Java object model and durable computing service that bring several advantages to significantly improve the performance of massive real-time data processing/analytics. developers are able to use this library to design their cache-less and SerDe-less high performance applications.

scala-offheap - Experimental type-safe off-heap memory for Scala.

  •    Scala

Garbage collection is the standard memory management paradigm on the JVM. In theory, it lets one completely forget about the hurdles of memory management and delegate all of it to the underlying runtime. In practice, GC often leads to scalability issues on large heaps and latency-sensitive workloads. The goal of this project is to expose a completely different memory management paradigm to the developers: explicitly annotated region-based memory. This paradigm gives more control over memory management without the need to micro-manage allocations.

memreduct - Lightweight real-time memory management application to monitor and clean system memory on your computer

  •    C++

Lightweight real-time memory management application to monitor and clean system memory on your computer. The program used undocumented internal system features (Native API) to clear system cache (system working set, working set, standby page lists, modified page lists) with variable result ~10-50%. Application it is compatible with Windows XP SP3 and higher operating systems, but some general features available only since Windows Vista.

Memprof - A Ruby gem for memory profiling

  •    Ruby

Memprof is a Ruby level memory profiler that can help you find reference leaks in your application. Memprof can also do very lightweight function call tracing to help you figure out which system calls, and library calls your code causes. Ruby memory profiler similar to bleak_house, but without patches to the Ruby VM.

stackimpact-python - StackImpact Python Profiler - Production-Grade Performance Profiler: CPU, memory allocations, blocking calls, exceptions, metrics, and more

  •    Python

StackImpact is a production-grade performance profiler built for both production and development environments. It gives developers continuous and historical code-level view of application performance that is essential for locating CPU, memory allocation and I/O hot spots as well as latency bottlenecks. Included runtime metrics and error monitoring complement profiles for extensive performance analysis. Learn more at stackimpact.com. Learn more on the features page (with screenshots).

memory - STL compatible C++ memory allocator library using a new RawAllocator concept that is similar to an Allocator but easier to use and write

  •    C++

The C++ STL allocator model has various flaws. For example, they are fixed to a certain type, because they are almost necessarily required to be templates. So you can't easily share a single allocator for multiple types. In addition, you can only get a copy from the containers and not the original allocator object. At least with C++11 they are allowed to be stateful and so can be made object not instance based. But still, the model has many flaws. Over the course of the years many solutions have been proposed. for example EASTL. This library is another. But instead of trying to change the STL, it works with the current implementation. See example/ for more.

GlideBitmapPool - Glide Bitmap Pool is a memory management library for reusing the bitmap memory

  •    Java

Glide Bitmap Pool is a memory management library for reusing the bitmap memory. As it reuses bitmap memory , so no more GC calling again and again , hence smooth running application. It uses inBitmap while decoding the bitmap on the supported android versions. All the version use-cases has been handled to optimize it better. Glide Bitmap Pool can be included in any Android or Java application.

cymem - 💥 Cython memory pool for RAII-style memory management

  •    Python

cymem provides two small memory-management helpers for Cython. They make it easy to tie memory to a Python object's life-cycle, so that the memory is freed when the object is garbage collected. The Pool object saves the memory addresses internally, and frees them when the object is garbage collected. Typically you'll attach the Pool to some cdef'd class. This is particularly handy for deeply nested structs, which have complicated initialization functions. Just pass the Pool object into the initializer, and you don't have to worry about freeing your struct at all — all of the calls to Pool.alloc will be automatically freed when the Pool expires.

cuda-api-wrappers - Thin C++-flavored wrappers for the CUDA Runtime API

  •    C++

nVIDIA's Runtime API for CUDA is intended for use both in C and C++ code. As such, it uses a C-style API, the lowest common denominator (with a few notable exceptions of templated function overloads). This library of wrappers around the Runtime API is intended to allow us to embrace many of the features of C++ (including some C++11) for using the runtime API - but without reducing expressivity or increasing the level of abstraction (as in, e.g., the Thrust library). Using cuda-api-wrappers, you still have your devices, streams, events and so on - but they will be more convenient to work with in more C++-idiomatic ways.






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.