CuPy

CuPy is an open source library for GPU-accelerated computing with Python programming language, providing support for multi-dimensional arrays, sparse matrices, and a variety of numerical algorithms implemented on top of them.[3] CuPy shares the same API set as NumPy and SciPy, allowing it to be a drop-in replacement to run NumPy/SciPy code on GPU. CuPy supports NVIDIA CUDA GPU platform, and AMD ROCm GPU platform starting in v9.0.[4][5]

CuPy
Original author(s)Seiya Tokui
Developer(s)Community, Preferred Networks, Inc.
Initial releaseSeptember 2, 2015 (2015-09-02).[1]
Stable release
v10.5.0[2] / May 26, 2022 (2022-05-26)[2]
Preview release
v11.0.0b3[2] / May 26, 2022 (2022-05-26)[2]
Repositorygithub.com/cupy/cupy
Written inPython, Cython, CUDA
Operating systemLinux, Windows
PlatformCross-platform
TypeNumerical analysis
LicenseMIT
Websitecupy.dev

CuPy has been initially developed as a backend of Chainer deep learning framework, and later established as an independent project in 2017.[6]

CuPy is a part of the NumPy ecosystem array libraries[7] and is widely adopted to utilize GPU with Python,[8] especially in high-performance computing environments such as Summit,[9] Perlmutter,[10] EULER,[11] and ABCI.[12]

CuPy is a NumFOCUS affiliated project.[13]

Features

CuPy implements NumPy/SciPy-compatible APIs, as well as features to write user-defined GPU kernels or access low-level APIs.[14][15]

NumPy-compatible APIs

The same set of APIs defined in the NumPy package (numpy.*) are available under cupy.* package.

SciPy-compatible APIs

The same set of APIs defined in the SciPy package (scipy.*) are available under cupyx.scipy.* package.

User-defined GPU kernels

  • Kernel templates for element-wise and reduction operations
  • Raw kernel (CUDA C/C++)
  • Just-in-time transpiler (JIT)
  • Kernel fusion

Distributed computing

  • Distributed communication package (cupyx.distributed), providing collective and peer-to-peer primitives

Low-level CUDA features

  • Stream and event
  • Memory pool
  • Profiler
  • Host API binding
  • CUDA Python support[16]

Interoperability

Examples

Array creation

>>> import cupy as cp
>>> x = cp.array([1, 2, 3])
>>> x
array([1, 2, 3])
>>> y = cp.arange(10)
>>> y
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Basic operations

>>> import cupy as cp
>>> x = cp.arange(12).reshape(3, 4).astype(cp.float32)
>>> x
array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.]], dtype=float32)
>>> x.sum(axis=1)
array([ 6., 22., 38.], dtype=float32)

Raw CUDA C/C++ kernel

>>> import cupy as cp
>>> kern = cp.RawKernel(r'''
... extern "C" __global__
... void multiply_elemwise(const float* in1, const float* in2, float* out) {
...     int tid = blockDim.x * blockIdx.x + threadIdx.x;
...     out[tid] = in1[tid] * in2[tid];
... }
... ''', 'multiply_elemwise')
>>> in1 = cp.arange(16, dtype=cp.float32).reshape(4, 4)
>>> in2 = cp.arange(16, dtype=cp.float32).reshape(4, 4)
>>> out = cp.zeros((4, 4), dtype=cp.float32)
>>> kern((4,), (4,), (in1, in2, out))  # grid, block and arguments
>>> out
array([[  0.,   1.,   4.,   9.],
       [ 16.,  25.,  36.,  49.],
       [ 64.,  81., 100., 121.],
       [144., 169., 196., 225.]], dtype=float32)

Applications

See also

References

  1. "Release v1.3.0 – chainer/chainer". Retrieved 25 June 2022 via GitHub.
  2. "Releases – cupy/cupy". Retrieved 18 June 2022 via GitHub.
  3. Okuta, Ryosuke; Unno, Yuya; Nishino, Daisuke; Hido, Shohei; Loomis, Crissman (2017). CuPy: A NumPy-Compatible Library for NVIDIA GPU Calculations (PDF). Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS).
  4. "CuPy 9.0 Brings AMD GPU Support To This Numpy-Compatible Library - Phoronix". Phoronix. 29 April 2021. Retrieved 21 June 2022.
  5. "AMD Leads High Performance Computing Towards Exascale and Beyond". 28 June 2021. Retrieved 21 June 2022. Most recently, CuPy, an open-source array library with Python, has expanded its traditional GPU support with the introduction of version 9.0 that now offers support for the ROCm stack for GPU-accelerated computing.
  6. "Preferred Networks released Version 2 of Chainer, an Open Source framework for Deep Learning - Preferred Networks, Inc". 2 June 2017. Retrieved 18 June 2022.
  7. "NumPy". numpy.org. Retrieved 21 June 2022.
  8. Gorelick, Micha; Ozsvald, Ian (April 2020). High Performance Python: Practical Performant Programming for Humans (2nd ed.). O'Reilly Media, Inc. p. 190. ISBN 9781492055020.
  9. Oak Ridge Leadership Computing Facility. "Installing CuPy". OLCF User Documentation. Retrieved 21 June 2022.
  10. National Energy Research Scientific Computing Center. "Using Python on Perlmutter". NERSC Documentation. Retrieved 21 June 2022.
  11. ETH Zurich. "CuPy". ScientificComputing. Retrieved 21 June 2022.
  12. National Institute of Advanced Industrial Science and Technology. "Chainer". ABCI 2.0 User Guide. Retrieved 21 June 2022.
  13. "Affiliated Projects - NumFOCUS". Retrieved 18 June 2022.
  14. "Overview". CuPy documentation. Retrieved 18 June 2022.
  15. "Comparison Table". CuPy documentation. Retrieved 18 June 2022.
  16. "CUDA Python | NVIDIA Developer". Retrieved 21 June 2022.
  17. "Welcome to DLPack's documentation!". DLPack 0.6.0 documentation. Retrieved 21 June 2022.
  18. "CUDA Array Interface (Version 3)". Numba 0.55.2+0.g2298ad618.dirty-py3.7-linux-x86_64.egg documentation. Retrieved 21 June 2022.
  19. "NEP 13 — A mechanism for overriding Ufuncs — NumPy Enhancement Proposals". numpy.org. Retrieved 21 June 2022.
  20. "NEP 18 — A dispatch mechanism for NumPy's high level array functions — NumPy Enhancement Proposals". numpy.org. Retrieved 21 June 2022.
  21. Charles R Harris; K. Jarrod Millman; Stéfan J. van der Walt; et al. (16 September 2020). "Array programming with NumPy" (PDF). Nature. 585 (7825): 357–362. arXiv:2006.10256. doi:10.1038/S41586-020-2649-2. ISSN 1476-4687. PMC 7759461. PMID 32939066. Wikidata Q99413970.
  22. "2021 report - Python Data APIs Consortium" (PDF). Retrieved 21 June 2022.
  23. "Purpose and scope". Python array API standard 2021.12 documentation. Retrieved 21 June 2022.
  24. "Install spaCy". spaCy Usage Documentation. Retrieved 21 June 2022.
  25. Patel, Ankur A.; Arasanipalai, Ajay Uppili (May 2021). Applied Natural Language Processing in the Enterprise (1st ed.). O'Reilly Media, Inc. p. 68. ISBN 9781492062578.
  26. "Python Package Introduction". xgboost 1.6.1 documentation. Retrieved 21 June 2022.
  27. "UCBerkeleySETI/turbo_seti: turboSETI -- python based SETI search algorithm". GitHub. Retrieved 21 June 2022.
  28. "Open GPU Data Science | RAPIDS". Retrieved 21 June 2022.
  29. "API Docs". RAPIDS Docs. Retrieved 21 June 2022.
  30. "Efficient Data Sharing between CuPy and RAPIDS". Retrieved 21 June 2022.
  31. "10 Minutes to cuDF and CuPy". Retrieved 21 June 2022.
  32. Alex, Rogozhnikov (2022). Einops: Clear and Reliable Tensor Manipulations with Einstein-like Notation. International Conference on Learning Representations.
  33. "arogozhnikov/einops: Deep learning operations reinvented (for pytorch, tensorflow, jax and others)". GitHub. Retrieved 21 June 2022.
  34. Tokui, Seiya; Okuta, Ryosuke; Akiba, Takuya; Niitani, Yusuke; Ogawa, Toru; Saito, Shunta; Suzuki, Shuji; Uenishi, Kota; Vogel, Brian; Vincent, Hiroyuki Yamazaki (2019). Chainer: A Deep Learning Framework for Accelerating the Research Cycle. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. doi:10.1145/3292500.3330756.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.