Charm++

Charm++
Paradigm	Message-driven parallel programming, migratable objects, Object-oriented, asynchronous many-tasking
Designed by	Laxmikant Kale
Developer	Parallel Programming Laboratory
First appeared	late 1980s
Stable release	7.0.0 / October 25, 2021
Implementation language	C++, Python
Platform	Cray XC, XK, XE, IBM Blue Gene/Q, Infiniband, TCP, UDP, MPI, OFI
OS	Linux, Windows, macOS
Website	http://charmplusplus.org

Charm++ is a parallel object-oriented programming paradigm based on C++ and developed in the Parallel Programming Laboratory at the University of Illinois at Urbana–Champaign. Charm++ is designed with the goal of enhancing programmer productivity by providing a high-level abstraction of a parallel program while at the same time delivering good performance on a wide variety of underlying hardware platforms. Programs written in Charm++ are decomposed into a number of cooperating message-driven objects called chares. When a programmer invokes a method on an object, the Charm++ runtime system sends a message to the invoked object, which may reside on the local processor or on a remote processor in a parallel computation. This message triggers the execution of code within the chare to handle the message asynchronously.

Chares may be organized into indexed collections called chare arrays and messages may be sent to individual chares within a chare array or to the entire chare array simultaneously.

The chares in a program are mapped to physical processors by an adaptive runtime system. The mapping of chares to processors is transparent to the programmer, and this transparency permits the runtime system to dynamically change the assignment of chares to processors during program execution to support capabilities such as measurement-based load balancing, fault tolerance, automatic checkpointing, and the ability to shrink and expand the set of processors used by a parallel program.

Applications implemented using Charm++ include NAMD (molecular dynamics) and OpenAtom (quantum chemistry), ChaNGa and SpECTRE (astronomy), EpiSimdemics (epidemiology), Cello/Enzo-E (adaptive mesh refinement), and ROSS (parallel discrete event simulation). All of these applications have scaled up to a hundred thousand cores or more on petascale systems.

Adaptive MPI (AMPI)[1] is an implementation of the Message Passing Interface standard on top of the Charm++ runtime system and provides the capabilities of Charm++ in a more traditional MPI programming model. AMPI encapsulates each MPI process within a user-level migratable thread that is bound within a Charm++ object. By embedding each thread in a chare, AMPI programs can automatically take advantage of the features of the Charm++ runtime system with little or no changes to the MPI program.

Charm4py allows writing Charm++ applications in Python, supporting migratable Python objects and asynchronous remote method invocation.

Example

Here is some Charm++ code for demonstration purposes:[2]

Header file (hello.h)

class Hello : public CBase_Hello {
 public:
  Hello(); // C++ constructor

  void sayHi(int from); // Remotely invocable "entry method"
};

Charm++ Interface file (hello.ci)

module hello {
  array [1D] Hello {
    entry Hello();
    entry void sayHi(int);
  };
};

Source file (hello.cpp)

# include "hello.decl.h"
# include "hello.h"

extern CProxy_Main mainProxy;
extern int numElements;

Hello::Hello() {
  // No member variables to initialize in this simple example
}

void Hello::sayHi(int from) {

  // Have this chare object say hello to the user.
  CkPrintf("Hello from chare # %d on processor %d (told by %d)\n",
           thisIndex, CkMyPe(), from);

  // Tell the next chare object in this array of chare objects
  // to also say hello. If this is the last chare object in
  // the array of chare objects, then tell the main chare
  // object to exit the program.
  if (thisIndex < (numElements - 1)) {
    thisProxy[thisIndex + 1].sayHi(thisIndex);
  } else {
    mainProxy.done();
  }
}

# include "hello.def.h"

Adaptive MPI (AMPI)

Adaptive MPI is an implementation of MPI (like MPICH, OpenMPI, MVAPICH, etc.) on top of Charm++'s runtime system. Users can take pre-existing MPI applications, recompile them using AMPI's compiler wrappers, and begin experimenting with process virtualization, dynamic load balancing, and fault tolerance. AMPI implements MPI "ranks" as user-level threads (rather than operating system processes). These threads are fast to context switch between, and so multiple of them can be co-scheduled on the same core based on the availability of messages for them. AMPI ranks, and all the data they own, are also migratable at runtime across the different cores and nodes of a job. This is useful for load balancing and for checkpoint/restart-based fault tolerance schemes. For more information on AMPI, see the manual: http://charm.cs.illinois.edu/manuals/html/ampi/manual.html

Charm4py

Charm4py[3] is a Python parallel computing framework built on top of the Charm++ C++ runtime, which it uses as a shared library. Charm4py simplifies the development of Charm++ applications and streamlines parts of the programming model. For example, there is no need to write interface files (.ci files) or to use SDAG, and there is no requirement to compile programs. Users are still free to accelerate their application-level code with technologies like Numba. Standard ready-to-use binary versions can be installed on Linux, macOS and Windows with pip.

It is also possible to write hybrid Charm4py and MPI programs.[4] An example of a supported scenario is a Charm4py program using mpi4py libraries for specific parts of the computation.

References

"Parallel Programming Laboratory". charm.cs.illinois.edu. Retrieved 2018-12-12.
"Array "Hello World": A Slightly More Advanced "Hello World" Program: Array "Hello World" Code". PPL - UIUC PARALLEL PROGRAMMING LABORATORY. Retrieved 2017-05-08.
"Charm4py — Charm4py 1.0.0 documentation". charm4py.readthedocs.io. Retrieved 2019-09-11.
"Running hybrid mpi4py and Charm4py programs (mpi interop)". Charm++ and Charm4py Forum. 2018-11-30. Retrieved 2018-12-11.

External links

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] "Parallel Programming Laboratory". charm.cs.illinois.edu. Retrieved 2018-12-12.

[2] "Array "Hello World": A Slightly More Advanced "Hello World" Program: Array "Hello World" Code". PPL - UIUC PARALLEL PROGRAMMING LABORATORY. Retrieved 2017-05-08.

[3] "Charm4py — Charm4py 1.0.0 documentation". charm4py.readthedocs.io. Retrieved 2019-09-11.

[4] "Running hybrid mpi4py and Charm4py programs (mpi interop)". Charm++ and Charm4py Forum. 2018-11-30. Retrieved 2018-12-11.

Parallel computing
General	Distributed computing Parallel computing Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Speculative (SpMT) Preemptive Cooperative Clustered multi-thread (CMT) Hardware scout
Theory	PRAM model PEM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array
Coordination	Multiprocessing Memory coherence Cache coherence Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD Array processing (SIMT) Pipelined processing Associative processing MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Beowulf cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost Chapel HPX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC Parallel Extensions PVM pthreads RaftLib ROCm UPC TBB ZPL
Problems	Automatic parallelization Deadlock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: Parallel computing