•        60

UACluster2 is set of manuals and tools to create and manage high performance computing cluster based on Microsoft Hyper-V virtual machines. It needs Microsoft HPC Server 2008 (Microsoft HPC Server 2008 R2) as a basis of cluster creation.



Related Projects

future - :rocket: R package: future: Unified Parallel and Distributed Processing in R for Everyone

  •    R

The purpose of the future package is to provide a very simple and uniform way of evaluating R expressions asynchronously using various resources available to the user. In programming, a future is an abstraction for a value that may be available at some point in the future. The state of a future can either be unresolved or resolved. As soon as it is resolved, the value is available instantaneously. If the value is queried while the future is still unresolved, the current process is blocked until the future is resolved. It is possible to check whether a future is resolved or not without blocking. Exactly how and when futures are resolved depends on what strategy is used to evaluate them. For instance, a future can be resolved using a sequential strategy, which means it is resolved in the current R session. Other strategies may be to resolve futures asynchronously, for instance, by evaluating expressions in parallel on the current machine or concurrently on a compute cluster.

MPJ Express - Parallel Programming in Java

  •    Java

MPJ Express is an open source Java message passing library that allows application developers to write and execute parallel applications for multicore processors and compute clusters/clouds. It allows writing parallel Java applications using an MPI-like API.

Parallel Dwarfs

  •    CSharp

The Parallel Dwarfs project is a suite of 13 kernels (as VS projects in C++/C#/F#) parallelized using various technologies such as MPI, OpenMP, TPL, MPI.Net, etc. It also has a driver to run them, collect traces, and visualize the results using Vampir, Jumpshot, Xperf and Excel

StratoSphere - Cloud Computing Framework for Big Data Analytics

  •    Java

The Stratosphere System is an open-source cluster/cloud computing framework for Big Data analytics. It comprises of An extensible higher level language (Meteor) to quickly compose queries for common and recurring use cases, A parallel programming model (PACT, an extension of MapReduce) to run user-defined operations, An efficient massively parallel runtime (Nephele) for fault tolerant execution of acyclic data flows.

Shared Genomics Project MPI Codebase


The Shared Genomics project has developed parallelised statistical applications (MPI/OpenMP) which can analyse large genomic data-sets containing thousands of Single Nucleotide Polymorphisms (SNP). The code is based on the popular PLINK SNP-analysis program.

Automatic Translation from OPENMP to MPI


We intend to develop a tool that can automatically convert programs written in OpenMP sharedmemory parallel applications to MPI for execution in distributed memory systems.This will make it convenient to code in OpenMP and deploy the application to distributed system under MPI.

ArrayFire - Parallel Computing Library

  •    C++

ArrayFire is a high performance software library for parallel computing with an easy-to-use API. Its array based function set makes parallel programming simple. ArrayFire's multiple backends (CUDA, OpenCL and native CPU) make it platform independent and highly portable. A few lines of code in ArrayFire can replace dozens of lines of parallel computing code, saving you valuable time and lowering development costs.

Arraymancer - A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU, OpenCL and embedded devices

  •    Nim

Arraymancer is a tensor (N-dimensional array) project in Nim. The main focus is providing a fast and ergonomic CPU, Cuda and OpenCL ndarray library on which to build a scientific computing and in particular a deep learning ecosystem. The library is inspired by Numpy and PyTorch. The library provides ergonomics very similar to Numpy, Julia and Matlab but is fully parallel and significantly faster than those libraries. It is also faster than C-based Torch.


  •    C

The MPC (MultiProcessor Computing) framework provides a unified parallel runtime for clusters of large multiprocessor/multicore NUMA nodes. It supports mixed-mode programming with POSIX Threads, Intel TBB, OpenMP 2.5 and MPI 1.3 standards.

hpat - A compiler-based big data framework in Python

  •    Python

High Performance Analytics Toolkit (HPAT) scales analytics/ML codes in Python to bare-metal cluster/cloud performance automatically. It compiles a subset of Python (Pandas/Numpy) to efficient parallel binaries with MPI, requiring only minimal code changes. HPAT is orders of magnitude faster than alternatives like Apache Spark. HPAT's documentation can be found here.

Transactional Entity Framework

  •    C++


OrangeFS - Scale-out Network File System

  •    C

OrangeFS is a scale-out network file system designed for use on high-end computing (HEC) systems that provides very high-performance access to multi-server-based disk storage, in parallel. The OrangeFS server and client are user-level code, making them very easy to install and manage. OrangeFS has optimized MPI-IO support for parallel and distributed applications, and it is leveraged in production installations and used as a research platform for distributed and parallel storage.

cpp-taskflow - Fast C++ Parallel Programming with Task Dependency Graphs

  •    C++

A fast C++ header-only library to help you quickly build parallel programs with complex task dependencies. Cpp-Taskflow lets you quickly build parallel dependency graphs using modern C++17. It supports both static and dynamic tasking, and is by far faster, more expressive, and easier for drop-in integration than existing libraries.

accelerate - Embedded language for high-performance array computations

  •    Haskell

Data.Array.Accelerate defines an embedded language of array computations for high-performance computing in Haskell. Computations on multi-dimensional, regular arrays are expressed in the form of parameterised collective operations (such as maps, reductions, and permutations). These computations are online-compiled and executed on a range of architectures. Chapter 6 of Simon Marlow's book Parallel and Concurrent Programming in Haskell contains a tutorial introduction to Accelerate.

Julia - Language for Technical Computing

  •    Julia

Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. This computation is automatically distributed across all available compute nodes, and the result, reduced by summation (+), is returned at the calling node.

Parallel Runtime Library


Parallel Runtime Library is optimized library that provide Easy to use and High Performance Parallelism Computing. Parallel Runtime Library provide: Effective Parallel Runtime, Concurrent Data Structure, Task and Data Parallel, Producer and Consumer and Agent Model.

Vc - SIMD Vector Classes for C++

  •    C++

Recent generations of CPUs, and GPUs in particular, require data-parallel codes for full efficiency. Data parallelism requires that the same sequence of operations is applied to different input data. CPUs and GPUs can thus reduce the necessary hardware for instruction decoding and scheduling in favor of more arithmetic and logic units, which execute the same instructions synchronously. On CPU architectures this is implemented via SIMD registers and instructions. A single SIMD register can store N values and a single SIMD instruction can execute N operations on those values. On GPU architectures N threads run in perfect sync, fed by a single instruction decoder/scheduler. Each thread has local memory and a given index to calculate the offsets in memory for loads and stores. Current C++ compilers can do automatic transformation of scalar codes to SIMD instructions (auto-vectorization). However, the compiler must reconstruct an intrinsic property of the algorithm that was lost when the developer wrote a purely scalar implementation in C++. Consequently, C++ compilers cannot vectorize any given code to its most efficient data-parallel variant. Especially larger data-parallel loops, spanning over multiple functions or even translation units, will often not be transformed into efficient SIMD code.



ttgLib is a C++ library for parallel resource-intensive programs creation for hybrid architectures like CPU+GPU. This library provides ttg::pipeline parallel primitive with wise load distribution over different computing API like as OpenMP or Intel TBB, NVidia CUDA and OpenCL.


  •    C++

Brook is an ANSI C like general purpose stream programming language and is designed to incorporate the ideas of data parallel computing and arithmetic intensity into a familiar, efficient language. Has OpenMP CPU, OpenGL, DirectX 9 and AMD CTM backends.