clojurecl - ClojureCL is a Clojure library for parallel computations with OpenCL.

  •        7

ClojureCL is a Clojure library for parallell computations with OpenCL. It supports the latest OpenCL 2.0 version and uses fast hand-writen JNI bindings provided by Marco Hutter's for communication with vendor's OpenCL platform drivers. Read the documentation at ClojureCL Web Site.



Related Projects

Arraymancer - A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU, OpenCL and embedded devices

  •    Nim

Arraymancer is a tensor (N-dimensional array) project in Nim. The main focus is providing a fast and ergonomic CPU, Cuda and OpenCL ndarray library on which to build a scientific computing and in particular a deep learning ecosystem. The library is inspired by Numpy and PyTorch. The library provides ergonomics very similar to Numpy, Julia and Matlab but is fully parallel and significantly faster than those libraries. It is also faster than C-based Torch.

neanderthal - Fast Clojure Matrix Library

  •    Clojure

Neanderthal is a Clojure library for fast matrix and linear algebra computations based on the highly optimized native libraries of BLAS and LAPACK computation routines for both CPU and GPU.. Read the documentation at Neanderthal Web Site.

accelerate - Embedded language for high-performance array computations

  •    Haskell

Data.Array.Accelerate defines an embedded language of array computations for high-performance computing in Haskell. Computations on multi-dimensional, regular arrays are expressed in the form of parameterised collective operations (such as maps, reductions, and permutations). These computations are online-compiled and executed on a range of architectures. Chapter 6 of Simon Marlow's book Parallel and Concurrent Programming in Haskell contains a tutorial introduction to Accelerate.

compute - A C++ GPU Computing Library for OpenCL

  •    C++

Boost.Compute is a GPU/parallel-computing library for C++ based on OpenCL. The core library is a thin C++ wrapper over the OpenCL API and provides access to compute devices, contexts, command queues and memory buffers.

lwjgl3 - LWJGL is a Java library that enables cross-platform access to popular native APIs useful in the development of graphics (OpenGL), audio (OpenAL) and parallel computing (OpenCL) applications

  •    Kotlin

LWJGL ( is a Java library that enables cross-platform access to popular native APIs useful in the development of graphics (OpenGL/Vulkan), audio (OpenAL) and parallel computing (OpenCL) applications. This access is direct and high-performance, yet also wrapped in a type-safe and user-friendly layer, appropriate for the Java ecosystem.LWJGL is an enabling technology and provides low-level access. It is not a framework and does not provide higher-level utilities than what the native libraries expose. As such, novice programmers are encouraged to try one of the frameworks or game engines that make use of LWJGL, before working directly with the library.

ArrayFire - Parallel Computing Library

  •    C++

ArrayFire is a high performance software library for parallel computing with an easy-to-use API. Its array based function set makes parallel programming simple. ArrayFire's multiple backends (CUDA, OpenCL and native CPU) make it platform independent and highly portable. A few lines of code in ArrayFire can replace dozens of lines of parallel computing code, saving you valuable time and lowering development costs.

qcgpu-rust - A High Performance, Hardware accelerated, Quantum computer simulator in Rust

  •    Rust

The goal of QCGPU is to provide a library for the simulation of quantum computers that is fast, efficient and portable. QCGPU is written in Rust and uses OpenCL to run code on the CPU, GPU or any other OpenCL supported devices. This library is meant to be used both independently and alongside established tools for example compilers or more general and high level frameworks. If you are interested in using QCGPU with IBM's QISKit framework or QISKit ACQUA, please see the repository qiskit-addon-qcgpu.

Anakin - High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices

  •    C++

Welcome to the Anakin GitHub. Anakin is a cross-platform, high-performance inference engine, which is originally developed by Baidu engineers and is a large-scale application of industrial products.

coriander - Build NVIDIA® CUDA™ code for OpenCL™ 1.2 devices

  •    LLVM

Build applications written in NVIDIA® CUDA™ code for OpenCL™ 1.2 devices. Other systems should work too, ideally. You will need at a minimum at least one OpenCL-enabled GPU, and appropriate OpenCL drivers installed, for the GPU. Both linux and Mac systems stand a reasonable chance of working ok.

tf-quant-finance - High-performance TensorFlow library for quantitative finance.

  •    Jupyter

This library provides high-performance components leveraging the hardware acceleration support and automatic differentiation of TensorFlow. The library will provide TensorFlow support for foundational mathematical methods, mid-level methods, and specific pricing models. The coverage is being rapidly expanded over the next few months. Foundational methods. Core mathematical methods - optimisation, interpolation, root finders, linear algebra, random and quasi-random number generation, etc.



ttgLib is a C++ library for parallel resource-intensive programs creation for hybrid architectures like CPU+GPU. This library provides ttg::pipeline parallel primitive with wise load distribution over different computing API like as OpenMP or Intel TBB, NVidia CUDA and OpenCL.

collenchyma - Extendable HPC-Framework for CUDA, OpenCL and common CPU

  •    Rust

Collenchyma is an extensible, pluggable, backend-agnostic framework for parallel, high-performance computations on CUDA, OpenCL and common host CPU. It is fast, easy to build and provides an extensible Rust struct to execute operations on almost any machine, even if it does not have CUDA or OpenCL capable devices. Collenchyma's abstracts over the different computation languages (Native, OpenCL, Cuda) and let's you run highly-performant code, thanks to easy parallelization, on servers, desktops or mobiles without the need to adapt your code for the machine you deploy to. Collenchyma does not require OpenCL or Cuda on the machine and automatically falls back to the native host CPU, making your application highly flexible and fast to build.


  •    C++

Baikal initiative has been started as a sample application demonstrating the usage of AMD® RadeonRays intersection engine, but evolved into a fully functional rendering engine aimed at graphics researchers, educational institutions and open-source enthusiasts in general. Baikal is fast and efficient GPU-based global illumination renderer implemented using OpenCL and relying on AMD® RadeonRays intersection engine. It is cross-platform and vendor independent. The only requirement it imposes on the hardware is OpenCL 1.2 support. Baikal maintains high level of performance across all vendors, but it is specifically optimized for AMD® GPUs and APUs.

gunrock - High-Performance Graph Primitives on GPUs

  •    Cuda

Gunrock is a CUDA library for graph-processing designed specifically for the GPU. It uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. For more details, please visit our website, read Why Gunrock, our TOPC 2017 paper Gunrock: GPU Graph Analytics, look at our results, and find more details in our publications. See Release Notes to keep up with the our latest changes.

ROCm - ROCm - Open Source Platform for HPC and Ultrascale GPU Computing


The ROCm Platform brings a rich foundation to advanced computing by seamlessly integrating the CPU and GPU with the goal of solving real-world problems. This software enables the high-performance operation of AMD GPUs for computationally oriented tasks in the Linux operating system. ROCm is focused on using AMD GPUs to accelerate computational tasks such as machine learning, engineering workloads, and scientific computing. In order to focus our development efforts on these domains of interest, ROCm supports a targeted set of hardware configurations which are detailed further in this section.

picongpu - Particle-in-cell simulations for the exascale era :sparkles:

  •    C++

PIConGPU is a fully relativistic, manycore, 3D3V particle-in-cell (PIC) code. The Particle-in-Cell algorithm is a central tool in plasma physics. It describes the dynamics of a plasma by computing the motion of electrons and ions in the plasma based on Maxwell's equations. As one of our supported compute platforms, GPUs provide a computational performance of several TFLOP/s at considerable lower invest and maintenance costs compared to multi CPU-based compute architectures of similar performance. The latest high-performance systems (TOP500) are enhanced by accelerator hardware that boost their peak performance up to the multi-PFLOP/s level. With its outstanding performance and scalability to more than 18'000 GPUs, PIConGPU was one of the finalists of the 2013 Gordon Bell Prize.

Julia - Language for Technical Computing

  •    Julia

Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. This computation is automatically distributed across all available compute nodes, and the result, reduced by summation (+), is returned at the calling node.