Displaying 1 to 20 from 54 results

parallel - Inspired by GNU Parallel, a command-line CPU load balancer written in Rust.

  •    Rust

This is an attempt at recreating the functionality of GNU Parallel, a work-stealer for the command-line, in Rust under a MIT license. The end goal will be to support much of the functionality of GNU Parallel and then to extend the functionality further for the next generation of command-line utilities written in Rust. While functionality is important, with the application being developed in Rust, the goal is to also be as fast and efficient as possible.See the to-do list for features and improvements that have yet to be done. If you want to contribute, pull requests are welcome. If you have an idea for improvement which isn't listed in the to-do list, feel free to email me and I will consider implementing that idea.

accelerate - Embedded language for high-performance array computations

  •    Haskell

Data.Array.Accelerate defines an embedded language of array computations for high-performance computing in Haskell. Computations on multi-dimensional, regular arrays are expressed in the form of parameterised collective operations (such as maps, reductions, and permutations). These computations are online-compiled and executed on a range of architectures. Chapter 6 of Simon Marlow's book Parallel and Concurrent Programming in Haskell contains a tutorial introduction to Accelerate.

ArrayFire - Parallel Computing Library

  •    C++

ArrayFire is a high performance software library for parallel computing with an easy-to-use API. Its array based function set makes parallel programming simple. ArrayFire's multiple backends (CUDA, OpenCL and native CPU) make it platform independent and highly portable. A few lines of code in ArrayFire can replace dozens of lines of parallel computing code, saving you valuable time and lowering development costs.




Vc - SIMD Vector Classes for C++

  •    C++

Recent generations of CPUs, and GPUs in particular, require data-parallel codes for full efficiency. Data parallelism requires that the same sequence of operations is applied to different input data. CPUs and GPUs can thus reduce the necessary hardware for instruction decoding and scheduling in favor of more arithmetic and logic units, which execute the same instructions synchronously. On CPU architectures this is implemented via SIMD registers and instructions. A single SIMD register can store N values and a single SIMD instruction can execute N operations on those values. On GPU architectures N threads run in perfect sync, fed by a single instruction decoder/scheduler. Each thread has local memory and a given index to calculate the offsets in memory for loads and stores. Current C++ compilers can do automatic transformation of scalar codes to SIMD instructions (auto-vectorization). However, the compiler must reconstruct an intrinsic property of the algorithm that was lost when the developer wrote a purely scalar implementation in C++. Consequently, C++ compilers cannot vectorize any given code to its most efficient data-parallel variant. Especially larger data-parallel loops, spanning over multiple functions or even translation units, will often not be transformed into efficient SIMD code.

hpat - A compiler-based big data framework in Python

  •    Python

High Performance Analytics Toolkit (HPAT) scales analytics/ML codes in Python to bare-metal cluster/cloud performance automatically. It compiles a subset of Python (Pandas/Numpy) to efficient parallel binaries with MPI, requiring only minimal code changes. HPAT is orders of magnitude faster than alternatives like Apache Spark. HPAT's documentation can be found here.

future - :rocket: R package: future: Unified Parallel and Distributed Processing in R for Everyone

  •    R

The purpose of the future package is to provide a very simple and uniform way of evaluating R expressions asynchronously using various resources available to the user. In programming, a future is an abstraction for a value that may be available at some point in the future. The state of a future can either be unresolved or resolved. As soon as it is resolved, the value is available instantaneously. If the value is queried while the future is still unresolved, the current process is blocked until the future is resolved. It is possible to check whether a future is resolved or not without blocking. Exactly how and when futures are resolved depends on what strategy is used to evaluate them. For instance, a future can be resolved using a sequential strategy, which means it is resolved in the current R session. Other strategies may be to resolve futures asynchronously, for instance, by evaluating expressions in parallel on the current machine or concurrently on a compute cluster.

UACluster2

  •    

UACluster2 is set of manuals and tools to create and manage high performance computing cluster based on Microsoft Hyper-V virtual machines. It needs Microsoft HPC Server 2008 (Microsoft HPC Server 2008 R2) as a basis of cluster creation.


Transactional Entity Framework

  •    C++

UNLEASH THE POWER OF PARALLEL COMPUTING WITH AUTOMATIC TRANSACTIONAL MEMORY

Arraymancer - A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU, OpenCL and embedded devices

  •    Nim

Arraymancer is a tensor (N-dimensional array) project in Nim. The main focus is providing a fast and ergonomic CPU, Cuda and OpenCL ndarray library on which to build a scientific computing and in particular a deep learning ecosystem. The library is inspired by Numpy and PyTorch. The library provides ergonomics very similar to Numpy, Julia and Matlab but is fully parallel and significantly faster than those libraries. It is also faster than C-based Torch.

ParallelAccelerator

  •    Julia

ParallelAccelerator is a Julia package for speeding up compute-intensive Julia programs. In particular, Julia code that makes heavy use of high-level array operations is a good candidate for speeding up with ParallelAccelerator. With the @acc macro that ParallelAccelerator provides, users may specify parts of a program to accelerate. ParallelAccelerator compiles these parts of the program to fast native code. It automatically eliminates overheads such as array bounds checking when it is safe to do so. It also parallelizes and vectorizes many data-parallel operations.

Paws.js - A JavaScript interpreter for Paws.

  •    CoffeeScript

Hello, friend. This is a JavaScript implementation of the Paws machine, intended both to be included into client-side code executed by browsers, and to be embedded into Node.js projects. Paws lends itself well to highly asynchronous programming, meaning it's designed for things involving the network requests (by design, web applications), and other tasks where concurrency is desirable. In addition, things built on top of Paws can distribute themselves across multiple environments and machines (this means your database, and your user's browsers, can all talk amongst one-another.) Finally, Paws is designed from the ground-up to be concurrency-aware, ensuring tasks can parallelize when they won't affect eachother negatively.

schwimmbad - A common interface to processing pools.

  •    Python

schwimmbad provides a uniform interface to parallel processing pools and enables switching easily between local development (e.g., serial processing or with multiprocessing) and deployment on a cluster or supercomputer (via, e.g., MPI or JobLib). See the installation instructions in the documentation for more information.

boxtree - Quad/octree building for FMMs in Python and OpenCL

  •    Python

boxtree is a package that, given some point locations in two or three dimensions, sorts them into an adaptive quad/octree of boxes, efficiently, in parallel, using PyOpenCL. It can also generate traversal lists needed for adaptive fast multipole methods and related algorithms and tree-based look-up tables for geometric proximity.

embb - Embedded Multicore Building Blocks (EMB²): Library for parallel programming of embedded systems

  •    C++

The Embedded Multicore Building Blocks (EMB²) are an easy to use yet powerful and efficient C/C++ library for the development of parallel applications. EMB² has been specifically designed for embedded systems and the typical requirements that accompany them, such as real-time capability and constraints on memory consumption. As a major advantage, low-level operations are hidden in the library which relieves software developers from the burden of thread management and synchronization. This not only improves productivity of parallel software development, but also results in increased reliability and performance of the applications. EMB² is independent of the hardware architecture (x86, ARM, ...) and runs on various platforms, from small devices to large systems containing numerous processor cores. It builds on MTAPI, a standardized programming interface for leveraging task parallelism in embedded systems containing symmetric or asymmetric (heterogeneous) multicore processors. A core feature of MTAPI is low-overhead scheduling of fine-grained tasks among the available cores during runtime. Unlike existing libraries, EMB² supports task priorities and affinities, which allows the creation of soft real-time systems. Additionally, the scheduling strategy can be optimized for non-functional requirements such as minimal latency and fairness.

openmp-examples - openmp examples

  •    C++

This repository contains OpenMP-examples which I create while learning OpenMP. This is a playground repository. I follow Tim Mattson's Introduction to OpenMP video playlist on youtube.

Bulk - A modern interface for writing bulk synchronous parallel programs.

  •    C++

Bulk does away with unnecessary boilerplate code and the unsafe API's that are found in for example MPI, or the BSPlib standard. It provides a unified syntax for parallel programming across different platforms and modalities. Our BSP interface supports and encourages the use of modern C++ features, enabling safer and more efficient distributed programming. We have a flexible backend architecture, so that programs written with Bulk work for both shared memory, distributed memory, or mixed systems. Distributed variables are the easiest way to communicate.

omega_h - Simplex mesh adaptivity for HPC

  •    C++

Omega_h is a C++11 library that implements tetrahedron and triangle mesh adaptativity, with a focus on scalable HPC performance using (optionally) MPI, OpenMP, or CUDA. It is intended to provided adaptive functionality to existing simulation codes. Mesh adaptivity allows one to minimize both discretization error and number of degrees of freedom live during the simulation, as well as enabling moving object and evolving geometry simulations. Omega_h will do this for you in a way that is fast, memory-efficient, and portable across many different architectures. For a bare minimum setup with no parallelism, you just need CMake, a C++11 compiler, and preferably ZLib installed.





We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.