ArrayFire is a high performance software library for parallel computing with an easy-to-use API. Its array based function set makes parallel programming simple. ArrayFire's multiple backends (CUDA, OpenCL and native CPU) make it platform independent and highly portable. A few lines of code in ArrayFire can replace dozens of lines of parallel computing code, saving you valuable time and lowering development costs.
http://arrayfire.com/Tags | parallel-computing parallel cuda library |
Implementation | C++ |
License | Public |
Platform |
Arraymancer is a tensor (N-dimensional array) project in Nim. The main focus is providing a fast and ergonomic CPU, Cuda and OpenCL ndarray library on which to build a scientific computing and in particular a deep learning ecosystem. The library is inspired by Numpy and PyTorch. The library provides ergonomics very similar to Numpy, Julia and Matlab but is fully parallel and significantly faster than those libraries. It is also faster than C-based Torch.
tensor nim multidimensional-arrays cuda deep-learning machine-learning cudnn high-performance-computing gpu-computing matrix-library neural-networks parallel-computing openmp linear-algebra ndarray opencl gpgpu iot automatic-differentiation autogradttgLib is a C++ library for parallel resource-intensive programs creation for hybrid architectures like CPU+GPU. This library provides ttg::pipeline parallel primitive with wise load distribution over different computing API like as OpenMP or Intel TBB, NVidia CUDA and OpenCL.
Data.Array.Accelerate defines an embedded language of array computations for high-performance computing in Haskell. Computations on multi-dimensional, regular arrays are expressed in the form of parameterised collective operations (such as maps, reductions, and permutations). These computations are online-compiled and executed on a range of architectures. Chapter 6 of Simon Marlow's book Parallel and Concurrent Programming in Haskell contains a tutorial introduction to Accelerate.
haskell accelerate llvm cuda parallel-computing gpu-computingParallel Runtime Library is optimized library that provide Easy to use and High Performance Parallelism Computing. Parallel Runtime Library provide: Effective Parallel Runtime, Concurrent Data Structure, Task and Data Parallel, Producer and Consumer and Agent Model.
parallel-programmingFull documentation with github wiki under heavy construction. moderngpu is a productivity library for general-purpose computing on GPUs. It is a header-only C++ library written for CUDA. The unique value of the library is in its accelerated primitives for solving irregularly parallel problems.
Gosl is a Go library to develop Artificial Intelligence and High-Performance Scientific Computations. The library tries to be as general and easy as possible. Gosl considers the use of both Go concurrency routines and parallel computing using the message passing interface (MPI). Gosl has several modules (sub-packages) for a variety of tasks in scientific computing, image analysis, and data post-processing.
scientific-computing visualization linear-algebra differential-equations sparse-systems plotting mkl parallel-computations computational-geometry graph-theory tensor-algebra fast-fourier-transform eigenvalues eigenvectors hacktoberfest machine-learning artificial-intelligence optimization optimization-algorithms linear-programmingMPJ Express is an open source Java message passing library that allows application developers to write and execute parallel applications for multicore processors and compute clusters/clouds. It allows writing parallel Java applications using an MPI-like API.
parallel-programming parallel high-performance-computing hpcThe Stratosphere System is an open-source cluster/cloud computing framework for Big Data analytics. It comprises of An extensible higher level language (Meteor) to quickly compose queries for common and recurring use cases, A parallel programming model (PACT, an extension of MapReduce) to run user-defined operations, An efficient massively parallel runtime (Nephele) for fault tolerant execution of acyclic data flows.
cloud-framework cloud big-data parallel information-managementLWJGL (https://www.lwjgl.org) is a Java library that enables cross-platform access to popular native APIs useful in the development of graphics (OpenGL/Vulkan), audio (OpenAL) and parallel computing (OpenCL) applications. This access is direct and high-performance, yet also wrapped in a type-safe and user-friendly layer, appropriate for the Java ecosystem.LWJGL is an enabling technology and provides low-level access. It is not a framework and does not provide higher-level utilities than what the native libraries expose. As such, novice programmers are encouraged to try one of the frameworks or game engines that make use of LWJGL, before working directly with the library.
lwjgl kotlin opengl opencl openal vulkan bindings glfw vr opengl-es jvmA fast C++ header-only library to help you quickly build parallel programs with complex task dependencies. Cpp-Taskflow lets you quickly build parallel dependency graphs using modern C++17. It supports both static and dynamic tasking, and is by far faster, more expressive, and easier for drop-in integration than existing libraries.
taskflow task-based-programming cpp17 parallel-programming threadpool concurrent-programming header-only flowgraph high-performance-computing multicore-programming multi-threading taskparallelism multithreadingThe purpose of the future package is to provide a very simple and uniform way of evaluating R expressions asynchronously using various resources available to the user. In programming, a future is an abstraction for a value that may be available at some point in the future. The state of a future can either be unresolved or resolved. As soon as it is resolved, the value is available instantaneously. If the value is queried while the future is still unresolved, the current process is blocked until the future is resolved. It is possible to check whether a future is resolved or not without blocking. Exactly how and when futures are resolved depends on what strategy is used to evaluate them. For instance, a future can be resolved using a sequential strategy, which means it is resolved in the current R session. Other strategies may be to resolve futures asynchronously, for instance, by evaluating expressions in parallel on the current machine or concurrently on a compute cluster.
r cran parallel-processing parallel-computing distributed-computing hpc-clusters hpc promises futures asynchronous programming parallelizationThe goal of QCGPU is to provide a library for the simulation of quantum computers that is fast, efficient and portable. QCGPU is written in Rust and uses OpenCL to run code on the CPU, GPU or any other OpenCL supported devices. This library is meant to be used both independently and alongside established tools for example compilers or more general and high level frameworks. If you are interested in using QCGPU with IBM's QISKit framework or QISKit ACQUA, please see the repository qiskit-addon-qcgpu.
quantum-computing arrayfire quantum-computer-simulator cuda gate qubits quantumThis is an attempt at recreating the functionality of GNU Parallel, a work-stealer for the command-line, in Rust under a MIT license. The end goal will be to support much of the functionality of GNU Parallel and then to extend the functionality further for the next generation of command-line utilities written in Rust. While functionality is important, with the application being developed in Rust, the goal is to also be as fast and efficient as possible.See the to-do list for features and improvements that have yet to be done. If you want to contribute, pull requests are welcome. If you have an idea for improvement which isn't listed in the to-do list, feel free to email me and I will consider implementing that idea.
command-line-app parallel-computing parallelRecent generations of CPUs, and GPUs in particular, require data-parallel codes for full efficiency. Data parallelism requires that the same sequence of operations is applied to different input data. CPUs and GPUs can thus reduce the necessary hardware for instruction decoding and scheduling in favor of more arithmetic and logic units, which execute the same instructions synchronously. On CPU architectures this is implemented via SIMD registers and instructions. A single SIMD register can store N values and a single SIMD instruction can execute N operations on those values. On GPU architectures N threads run in perfect sync, fed by a single instruction decoder/scheduler. Each thread has local memory and a given index to calculate the offsets in memory for loads and stores. Current C++ compilers can do automatic transformation of scalar codes to SIMD instructions (auto-vectorization). However, the compiler must reconstruct an intrinsic property of the algorithm that was lost when the developer wrote a purely scalar implementation in C++. Consequently, C++ compilers cannot vectorize any given code to its most efficient data-parallel variant. Especially larger data-parallel loops, spanning over multiple functions or even translation units, will often not be transformed into efficient SIMD code.
vectorization parallel simd-vector simd-instructions simd avx c-plus-plus avx512 sse neon cpp portable cpp11 cpp14 cpp17 avx2 simd-programming data-parallel parallel-computingDask is a flexible parallel computing library for analytics. See documentation for more information. New BSD. See License File.
analyticsAsynchronous parallel SSH client library. Run SSH commands over many - hundreds/hundreds of thousands - number of servers asynchronously and with minimal system load on the client host.
ssh library async aio asynchronous python-library parallel ssh-client parallel-ssh libssh2 libev ssh2 non-blocking gevent libssh ssh-library parallelssh non-blocking-io ssh-client-librarynVIDIA's Runtime API for CUDA is intended for use both in C and C++ code. As such, it uses a C-style API, the lowest common denominator (with a few notable exceptions of templated function overloads). This library of wrappers around the Runtime API is intended to allow us to embrace many of the features of C++ (including some C++11) for using the runtime API - but without reducing expressivity or increasing the level of abstraction (as in, e.g., the Thrust library). Using cuda-api-wrappers, you still have your devices, streams, events and so on - but they will be more convenient to work with in more C++-idiomatic ways.
wrapper gpu modern-cpp cuda nvidia gpgpu api-wrapper gpu-memory gpu-computing cuda-toolkit cuda-device cuda-runtime-api gpgpu-computing cuda-api-wrappersOrangeFS is a scale-out network file system designed for use on high-end computing (HEC) systems that provides very high-performance access to multi-server-based disk storage, in parallel. The OrangeFS server and client are user-level code, making them very easy to install and manage. OrangeFS has optimized MPI-IO support for parallel and distributed applications, and it is leveraged in production installations and used as a research platform for distributed and parallel storage.
filesystem distributed-filesystem storage distributed-storage parallel-virtual-filesystem nasUACluster2 is set of manuals and tools to create and manage high performance computing cluster based on Microsoft Hyper-V virtual machines. It needs Microsoft HPC Server 2008 (Microsoft HPC Server 2008 R2) as a basis of cluster creation.
computing-cluster hpc mpi openmp parallel-computing parallel-programmingRayon is a data-parallelism library for Rust. It is extremely lightweight and makes it easy to convert a sequential computation into a parallel one. It also guarantees data-race freedom. Rayon makes it drop-dead simple to convert sequential iterators into parallel ones: usually, you just change your foo.iter() call into foo.par_iter(), and Rayon does the rest.
parallelism threads parallel parallel-iterator
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.