Dambach Multi-Core Library

  •        53

The Dambach Multi-Core Library makes it easy to create .Net programs that run faster on multi-core machines than their traditionally programmed counterparts.




Related Projects

Multicore-TSNE - Parallel t-SNE implementation with Python and Torch wrappers.

  •    C++

This is a multicore modification of Barnes-Hut t-SNE by L. Van der Maaten with python and Torch CFFI-based wrappers. This code also works faster than sklearn.TSNE on 1 core. Barnes-Hut t-SNE is done in two steps.

ocaml-multicore - Multicore OCaml

  •    OCaml

OCaml is an implementation of the ML language, based on the Caml Light dialect extended with a complete class-based object system and a powerful module system in the style of Standard ML. OCaml comprises two compilers. One generates bytecode which is then interpreted by a C program. This compiler runs quickly, generates compact code with moderate memory requirements, and is portable to essentially any 32 or 64 bit Unix platform. Performance of generated programs is quite good for a bytecoded implementation. This compiler can be used either as a standalone, batch-oriented compiler that produces standalone programs, or as an interactive, toplevel-based system.

MPJ Express - Parallel Programming in Java

  •    Java

MPJ Express is an open source Java message passing library that allows application developers to write and execute parallel applications for multicore processors and compute clusters/clouds. It allows writing parallel Java applications using an MPI-like API.

Transactional Entity Framework

  •    C++


cpp-taskflow - Fast C++ Parallel Programming with Task Dependency Graphs

  •    C++

A fast C++ header-only library to help you quickly build parallel programs with complex task dependencies. Cpp-Taskflow lets you quickly build parallel dependency graphs using modern C++17. It supports both static and dynamic tasking, and is by far faster, more expressive, and easier for drop-in integration than existing libraries.

Multicore SWARM

  •    C

Multicore SWARM (Software and Algorithms for Running on Multicore Processors) is an open source library for developing efficient and portable implementations that make use of multi-core processors. David A. Bader (Georgia Tech) began SWARM in 1994.


  •    Python

PyMW is a Python module for parallel master-worker computing in a variety of environments. With the PyMW module, users can write a single program that scales from multicore machines to global computing platforms.

chapel - a Productive Parallel Programming Language

  •    Chapel

Chapel is a modern programming language designed for productive parallel computing at scale. Chapel's design and implementation have been undertaken with portability in mind, permitting Chapel to run on multicore desktops and laptops, commodity clusters, and the cloud, in addition to the high-end supercomputers for which it was originally undertaken. Chapel is developed and released under the terms of the Apache 2.0 license, though it also makes use of third-party packages under their own licensing terms. See the LICENSE file in this directory for details.



A suite of Ada 2012 generics to facilitate iterative and recursive parallelism for multicore systems and provide safer recursion for single and multicore systems. Generics include Ravenscar compatible versions for real-time systems. Also Includes paraffinalia, which is a set of useful generics for parallel quicksort, fast fourier transform, function integration, prefix sum, and Red-Black trees


  •    C

The MPC (MultiProcessor Computing) framework provides a unified parallel runtime for clusters of large multiprocessor/multicore NUMA nodes. It supports mixed-mode programming with POSIX Threads, Intel TBB, OpenMP 2.5 and MPI 1.3 standards.

mTCP - A Highly Scalable User-level TCP Stack for Multicore Systems

  •    C

mTCP is a high-performance user-level TCP stack for multicore systems. Scaling the performance of short TCP connections is fundamentally challenging due to inefficiencies in the kernel. mTCP addresses these inefficiencies from the ground up - from packet I/O and TCP connection management all the way to the application interface. It translates expensive system calls to shared memory access between two threads within the same CPU core.


  •    CSharp

Here you can find the resources required to start building with these new systems today. We have also provided a new forum where you can find more information and share your experiences with these new systems.

Go - Programming Language from Google

  •    C

Go is expressive, concise, clean, and efficient. Its concurrency mechanisms make it easy to write programs that get the most out of multicore and networked machines, while its novel type system enables flexible and modular program construction. Go compiles quickly to machine code yet has the convenience of garbage collection and the power of run-time reflection. It's a fast, statically typed, compiled language that feels like a dynamically typed, interpreted language.

Auto-parallelizing compiler for multicore systems using Phoenix framework


The objective of this project is to develop plugins for Phoenix compiler which will divide the intermediate code into various partitions. These partitions will be synthesized further in the later phases and will eventually be ready to run in parallel on chip multiple processor...

Vc - SIMD Vector Classes for C++

  •    C++

Recent generations of CPUs, and GPUs in particular, require data-parallel codes for full efficiency. Data parallelism requires that the same sequence of operations is applied to different input data. CPUs and GPUs can thus reduce the necessary hardware for instruction decoding and scheduling in favor of more arithmetic and logic units, which execute the same instructions synchronously. On CPU architectures this is implemented via SIMD registers and instructions. A single SIMD register can store N values and a single SIMD instruction can execute N operations on those values. On GPU architectures N threads run in perfect sync, fed by a single instruction decoder/scheduler. Each thread has local memory and a given index to calculate the offsets in memory for loads and stores. Current C++ compilers can do automatic transformation of scalar codes to SIMD instructions (auto-vectorization). However, the compiler must reconstruct an intrinsic property of the algorithm that was lost when the developer wrote a purely scalar implementation in C++. Consequently, C++ compilers cannot vectorize any given code to its most efficient data-parallel variant. Especially larger data-parallel loops, spanning over multiple functions or even translation units, will often not be transformed into efficient SIMD code.



MultiCore is a compute cloud wrapper written in c# and supports a simple db role, membership and profile provider. Also offers support for easier Simple DB access. Includes the latest amazon libraries. Azure support coming soon.

scalloc - A Fast, Multicore-Scalable, Low-Fragmentation Memory Allocator

  •    C++

scalloc provides general-purpose memory allocation involving many threads on many cores can be done with high performance, multicore scalability, and low memory consumption. The main ideas behind the design of scalloc are: uniform treatment of small and big objects through so-called virtual spans, efficiently and effectively reclaiming free memory through fast and scalable global data structures.


  •    C

SkyEye is a very fast full system simulator which takes llvm as IR of dynmic compiled framework.. It can simulate series ARM, Coldfire,Mips, Powerpc, Sparc, x86 and Blackfin DSP Processor. Also can simulate multicore system by the multicore of host.

mtcp - mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems

  •    C

mTCP is a highly scalable user-level TCP stack for multicore systems. mTCP source code is distributed under the Modified BSD License. For more detail, please refer to the LICENSE. The license term of io_engine driver and ported applications may differ from the mTCP’s. We require the following libraries to run mTCP.