This is an attempt at recreating the functionality of GNU Parallel, a work-stealer for the command-line, in Rust under a MIT license. The end goal will be to support much of the functionality of GNU Parallel and then to extend the functionality further for the next generation of command-line utilities written in Rust. While functionality is important, with the application being developed in Rust, the goal is to also be as fast and efficient as possible.See the to-do list for features and improvements that have yet to be done. If you want to contribute, pull requests are welcome. If you have an idea for improvement which isn't listed in the to-do list, feel free to email me and I will consider implementing that idea.
command-line-app parallel-computing parallelData.Array.Accelerate defines an embedded language of array computations for high-performance computing in Haskell. Computations on multi-dimensional, regular arrays are expressed in the form of parameterised collective operations (such as maps, reductions, and permutations). These computations are online-compiled and executed on a range of architectures. Chapter 6 of Simon Marlow's book Parallel and Concurrent Programming in Haskell contains a tutorial introduction to Accelerate.
haskell accelerate llvm cuda parallel-computing gpu-computingArrayFire is a high performance software library for parallel computing with an easy-to-use API. Its array based function set makes parallel programming simple. ArrayFire's multiple backends (CUDA, OpenCL and native CPU) make it platform independent and highly portable. A few lines of code in ArrayFire can replace dozens of lines of parallel computing code, saving you valuable time and lowering development costs.
parallel-computing parallel cuda libraryA list of computer-science related readings I'm planning on reading. Would love PR's!
reading science academia research computer-science compiler type-system concurrency parallel-computing operating-system static-analysis garbage-collectionRecent generations of CPUs, and GPUs in particular, require data-parallel codes for full efficiency. Data parallelism requires that the same sequence of operations is applied to different input data. CPUs and GPUs can thus reduce the necessary hardware for instruction decoding and scheduling in favor of more arithmetic and logic units, which execute the same instructions synchronously. On CPU architectures this is implemented via SIMD registers and instructions. A single SIMD register can store N values and a single SIMD instruction can execute N operations on those values. On GPU architectures N threads run in perfect sync, fed by a single instruction decoder/scheduler. Each thread has local memory and a given index to calculate the offsets in memory for loads and stores. Current C++ compilers can do automatic transformation of scalar codes to SIMD instructions (auto-vectorization). However, the compiler must reconstruct an intrinsic property of the algorithm that was lost when the developer wrote a purely scalar implementation in C++. Consequently, C++ compilers cannot vectorize any given code to its most efficient data-parallel variant. Especially larger data-parallel loops, spanning over multiple functions or even translation units, will often not be transformed into efficient SIMD code.
vectorization parallel simd-vector simd-instructions simd avx c-plus-plus avx512 sse neon cpp portable cpp11 cpp14 cpp17 avx2 simd-programming data-parallel parallel-computingKratos is free under BSD-4 license and can be used even in comercial softwares as it is. Many of its main applications are also free and BSD-4 licensed but each derived application can have its own propietary license. Kratos is multiplatform and available for Windows, Linux (several distros) and macOS.
c-plus-plus multi-platform openmp mpi parallel-computing fem bsd-license numerical-methods multiphysics dem kratos kratos-multiphysicsHigh Performance Analytics Toolkit (HPAT) scales analytics/ML codes in Python to bare-metal cluster/cloud performance automatically. It compiles a subset of Python (Pandas/Numpy) to efficient parallel binaries with MPI, requiring only minimal code changes. HPAT is orders of magnitude faster than alternatives like Apache Spark. HPAT's documentation can be found here.
big-data parallel-computing compilers machine-learning numpy pandasThe purpose of the future package is to provide a very simple and uniform way of evaluating R expressions asynchronously using various resources available to the user. In programming, a future is an abstraction for a value that may be available at some point in the future. The state of a future can either be unresolved or resolved. As soon as it is resolved, the value is available instantaneously. If the value is queried while the future is still unresolved, the current process is blocked until the future is resolved. It is possible to check whether a future is resolved or not without blocking. Exactly how and when futures are resolved depends on what strategy is used to evaluate them. For instance, a future can be resolved using a sequential strategy, which means it is resolved in the current R session. Other strategies may be to resolve futures asynchronously, for instance, by evaluating expressions in parallel on the current machine or concurrently on a compute cluster.
r cran parallel-processing parallel-computing distributed-computing hpc-clusters hpc promises futures asynchronous programming parallelizationUACluster2 is set of manuals and tools to create and manage high performance computing cluster based on Microsoft Hyper-V virtual machines. It needs Microsoft HPC Server 2008 (Microsoft HPC Server 2008 R2) as a basis of cluster creation.
computing-cluster hpc mpi openmp parallel-computing parallel-programmingUNLEASH THE POWER OF PARALLEL COMPUTING WITH AUTOMATIC TRANSACTIONAL MEMORY
actor-model hpc htc multicore parallel-computing stmArraymancer is a tensor (N-dimensional array) project in Nim. The main focus is providing a fast and ergonomic CPU, Cuda and OpenCL ndarray library on which to build a scientific computing and in particular a deep learning ecosystem. The library is inspired by Numpy and PyTorch. The library provides ergonomics very similar to Numpy, Julia and Matlab but is fully parallel and significantly faster than those libraries. It is also faster than C-based Torch.
tensor nim multidimensional-arrays cuda deep-learning machine-learning cudnn high-performance-computing gpu-computing matrix-library neural-networks parallel-computing openmp linear-algebra ndarray opencl gpgpu iot automatic-differentiation autogradParallelAccelerator is a Julia package for speeding up compute-intensive Julia programs. In particular, Julia code that makes heavy use of high-level array operations is a good candidate for speeding up with ParallelAccelerator. With the @acc macro that ParallelAccelerator provides, users may specify parts of a program to accelerate. ParallelAccelerator compiles these parts of the program to fast native code. It automatically eliminates overheads such as array bounds checking when it is safe to do so. It also parallelizes and vectorizes many data-parallel operations.
julia parallel-computingOfficial git repository of Elmer FEM software
finite-element-methods finite-elements fem multiphysics fluid-mechanics structural-mechanics electromagnetics mpi parallel-computing acoustics elmergui elmersolver elmergridHello, friend. This is a JavaScript implementation of the Paws machine, intended both to be included into client-side code executed by browsers, and to be embedded into Node.js projects. Paws lends itself well to highly asynchronous programming, meaning it's designed for things involving the network requests (by design, web applications), and other tasks where concurrency is desirable. In addition, things built on top of Paws can distribute themselves across multiple environments and machines (this means your database, and your user's browsers, can all talk amongst one-another.) Finally, Paws is designed from the ground-up to be concurrency-aware, ensuring tasks can parallelize when they won't affect eachother negatively.
programming-language distributed-systems parallel-computingschwimmbad provides a uniform interface to parallel processing pools and enables switching easily between local development (e.g., serial processing or with multiprocessing) and deployment on a cluster or supercomputer (via, e.g., MPI or JobLib). See the installation instructions in the documentation for more information.
multiprocessing mpi parallel-computingboxtree is a package that, given some point locations in two or three dimensions, sorts them into an adaptive quad/octree of boxes, efficiently, in parallel, using PyOpenCL. It can also generate traversal lists needed for adaptive fast multipole methods and related algorithms and tree-based look-up tables for geometric proximity.
opencl pyopencl parallel-computing shared-memory parallel-algorithm quadtree octree fmm fast-multipole-method scientific-computingDocumentation is now up-to-date at job_stream's GitHub page.
job-stream parallel-computing pipeline-processor distributed-computing easy-to-useThe Embedded Multicore Building Blocks (EMB²) are an easy to use yet powerful and efficient C/C++ library for the development of parallel applications. EMB² has been specifically designed for embedded systems and the typical requirements that accompany them, such as real-time capability and constraints on memory consumption. As a major advantage, low-level operations are hidden in the library which relieves software developers from the burden of thread management and synchronization. This not only improves productivity of parallel software development, but also results in increased reliability and performance of the applications. EMB² is independent of the hardware architecture (x86, ARM, ...) and runs on various platforms, from small devices to large systems containing numerous processor cores. It builds on MTAPI, a standardized programming interface for leveraging task parallelism in embedded systems containing symmetric or asymmetric (heterogeneous) multicore processors. A core feature of MTAPI is low-overhead scheduling of fine-grained tasks among the available cores during runtime. Unlike existing libraries, EMB² supports task priorities and affinities, which allows the creation of soft real-time systems. Additionally, the scheduling strategy can be optimized for non-functional requirements such as minimal latency and fairness.
embedded-systems multicore parallel-computing mtapi algorithms task-scheduler data-structures dataflow c-plus-plusThis repository contains OpenMP-examples which I create while learning OpenMP. This is a playground repository. I follow Tim Mattson's Introduction to OpenMP video playlist on youtube.
openmp parallel parallel-computing cpp multithreading learning-openmp alternating-least-squaresBulk does away with unnecessary boilerplate code and the unsafe API's that are found in for example MPI, or the BSPlib standard. It provides a unified syntax for parallel programming across different platforms and modalities. Our BSP interface supports and encourages the use of modern C++ features, enabling safer and more efficient distributed programming. We have a flexible backend architecture, so that programs written with Bulk work for both shared memory, distributed memory, or mixed systems. Distributed variables are the easiest way to communicate.
parallel-computing parallel-algorithm distributed-computing high-performance-computing
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.