Apache SINGA

Apache SINGA is an Apache top-level project for developing an open source machine learning library. It provides a flexible architecture for scalable distributed training, is extensible to run over a wide range of hardware, and has a focus on health-care applications.

Apache SINGA
Developer(s)Apache Software Foundation
Initial releaseOctober 8, 2015 (2015-10-08)
Stable release
4.0.0 / April 7, 2023 (2023-04-07)
Repository
Written inC++, Python, Java
Operating systemLinux, macOS, Windows
LicenseApache License 2.0
Websitesinga.apache.org

History

The SINGA project was initiated by the DB System Group at National University of Singapore in 2014, in collaboration with the database group of Zhejiang University, in order to support complex analytics at scale, and make database systems more intelligent and autonomic.[1] It focused on distributed deep learning by partitioning the model and data onto nodes in a cluster and parallelize the training.[2][3] The prototype was accepted by Apache Incubator in March 2015, and graduated as a top-level project in October 2019. Seven versions have been released as shown in the following table. Since V1.0, SINGA is general to support traditional machine learning models such as logistic regression.

Version Original release date Latest version Release date
Current stable version: 4.0.0 2023-04-07 4.0.0 2023-04-07
Older version, yet still maintained: 3.3.0 2022-06-07 3.3.0 2022-06-07
Older version, yet still maintained: 3.2.0 2021-08-15 3.2.0 2021-08-15
Older version, yet still maintained: 3.1.0 2020-10-30 3.1.0 2020-10-30
Older version, yet still maintained: 3.0.0 2020-04-20 3.0.0 2020-04-20
Older version, yet still maintained: 2.0.0 2019-04-20 2.0.0 2019-04-20
Older version, yet still maintained: 1.2.0 2018-06-06 1.2.0 2018-06-06
Older version, yet still maintained: 1.1.0 2017-02-12 1.1.0 2017-02-12
Older version, yet still maintained: 1.0.0 2016-09-08 1.0.0 2016-09-08
Old version, no longer maintained: 0.3.0 2016-04-20 0.1.0 2016-04-20
Old version, no longer maintained: 0.2.0 2016-01-14 0.2.0 2016-01-14
Old version, no longer maintained: 0.1.0 2015-10-08 0.1.0 2015-10-08
Legend:
Old version
Older version, still maintained
Latest version
Latest preview version
Future release

Software Stack

SINGA's software stack includes three major components, namely, core, IO and model. The following figure illustrates these components together with the hardware. The core component provides memory management and tensor operations; IO has classes for reading (and writing) data from (to) disk and network; The model component provides data structures and algorithms for machine learning models, e.g., layers for neural network models, optimizers/initializer/metric/loss for general machine learning models.

Apache Singa software stack
Apache Singa software stack

SINGA-Auto

SINGA-Auto (aka. Rafiki[4] in VLDB2018) is a subsystem of Apache SINGA to provide the training and inference service of machine learning models. SINGA-Auto frees users from constructing the machine learning models, tuning the hyper-parameters, and optimizing the prediction accuracy and speed. Users can simply upload their datasets, configure the service to conduct training, and then deploy the model for inference. As a cloud service system, SINGA-Auto manages the hardware resources, failure recovery, etc. For ease of use, it provides a model zoo, which is a set of built-in machine-learning models for popular tasks such as structured data (e.g., EMR data) analytics, image recognition, and text processing.

In the training service, a general framework for distributed hyper-parameter tuning is proposed and a collaborative tuning scheme is designed specifically for deep learning models. In the inference service, a scheduling algorithm is proposed based on reinforcement learning to optimize the overall accuracy and reduce latency. It can adapt to the changes of request rates.

SINGA-Easy

SINGA-Easy[5](ACM Multimedia 2021) is an easy-to-use deep learning framework built as a component of Apache SINGA to facilitate the adoption of deep learning algorithms and inference services by domain-specific domain application users (e.g., multimedia, medical image analysis). It provides distributed hyper-parameter tuning at the training stage, dynamic computational cost control at the inference stage, and intuitive user interactions with multimedia contents facilitated by model explanation. To improve accuracy, it supports regularization methods for image and structured data regularizations (ACM SIGMOD 2023). To support the acceptance of domain users on the training results, SINGA-Easy provides an option for users to evaluate model performance from the model explanation perspective based on LIME[6] and Grad-CAM.[7]

MLCask

MLCask[8](IEEE ICDE 2021) is a pipeline management subsystem that manages machine learning pipelines, from data cleaning to data analytics, to ease the maintenance of evolving and versioning of machine learning pipelines for collaborative analytics. It serves to reduce the cost and facilitate adoption. MLCask supports a Git-like end-to-end ML life-cycle management. By leveraging the version history of pipeline components and workspace, MLCask can skip unchanged preprocessing steps to address the frequent retraining challenges. Its non-linear version control semantics and merge operation facilitate effective collaborative development of the pipeline.

Applications

Apache SINGA[9] is in use at organizations such as NetEase,[10] Carnegie Technologies, CBRE, Citigroup, JurongHealth Hospital, National University of Singapore, National University Hospital, Noblis, Shentilium Technologies, Singapore General Hospital, Tan Tock Seng Hospital, YZBigData, and others. Apache SINGA is used across applications in banking, education, finance,healthcare, real estate, software development, and other categories.

Apache SINGA and Social Good

The Ng Teng Fong General Hospital[11] collaborated with the Apache SINGA team to develop an application for people diagnosed with pre-diabetes, a condition where blood glucose levels are higher than normal, but not high enough to be classified as diabetes.

The application called JurongHealth Food Log (JHFoodLg) app, uses Apache SINGA to match photos of food to a database of local dishes - including nasi padang, laksa and char siew rice - and utilises nutrition data from the Health Promotion Board, JurongHealth Campus, and the Australian Food and Nutrient Database. After comprehensive data cleaning (e.g., consistent formatting, deduplication, foodness classification, human calibration), the database contains 209, 861 images, covering 13 food groups and 233 food categories.

The app allows users from the hospital's Lifestyle Intervention (Liven) programme to set weight loss and exercise goals. A six-month study shows that almost all 20 patients who used the app lost between 4 and 5 percent of their initial bodyweight.

See also

References

  1. Wei, Wang; Meihui, Zhang; Gang, Chen; H.V., Jagadish; Beng Chin, Ooi; Kian-Lee, Tan (June 2016). "Database Meets Deep Learning: Challenges and Opportunities". SIGMOD Record. 45 (2): 17–22. arXiv:1906.08986. doi:10.1145/3003665.3003669. S2CID 6526411.
  2. Ooi, Beng Chin; Tan, Kian-Lee; Sheng, Wang; Wang, Wei; Cai, Qingchao; Chen, Gang; Gao, Jinyang; Luo, Zhaojing; Tung, Anthony K. H.; Wang, Yuan; Xie, Zhongle; Zhang, Meihui; Zheng, Kaiping (2015). "SINGA" (PDF). Proceedings of the 23rd ACM international conference on Multimedia. pp. 685–688. doi:10.1145/2733373.2807410. ISBN 9781450334594. S2CID 1840240. Retrieved 8 September 2016.
  3. Wei, Wang; Chen, Gang; Anh Dinh, Tien Tuan; Gao, Jinyang; Ooi, Beng Chin; Tan, Kian-Lee; Sheng, Wang (2015). "SINGA" (PDF). Proceedings of the 23rd ACM international conference on Multimedia. pp. 25–34. doi:10.1145/2733373.2806232. ISBN 9781450334594. S2CID 7169465. Retrieved 8 September 2016.
  4. Wang, Wei; Gao, Jinyang; Zhang, Meihui; Sheng, Wang; Chen, Gang; Khim Ng, Teck; Ooi, Beng Chin; Shao, Jie; Reyad, Moaz (2018). "Rafiki" (PDF). Proceedings of the VLDB Endowment. 12 (2): 128–140. arXiv:1804.06087. Bibcode:2018arXiv180406087W. doi:10.14778/3282495.3282499. S2CID 4898729. Retrieved 9 January 2019.
  5. Xing, Naili; Yeung, Sai Ho; Cai, Chenghao; Ng, Teck Khim; Wang, Wei; Yang, Kaiyuan; Yang, Nan; Zhang, Meihui; Chen, Gang; Ooi, Beng Chin (2021). "Generative Adversarial Networks for face generation: A survey" (PDF). ACM Computing Surveys. arXiv:2108.02572. doi:10.1145/1122445.1122456. S2CID 251468334. Retrieved 17 October 2021.
  6. Ribeiro, Marco Tulio; Singh, Sameer; Guestrin, Carlos (2017). ""Why Should I Trust You?": Explaining the Predictions of Any Classifier" (PDF). Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. pp. 97–101. arXiv:1602.04938. doi:10.18653/v1/N16-3020. Retrieved 1 August 2016.
  7. Selvaraju, Ramprasaath R.; Cogswell, Michael; Das, Abhishek; Vedantam, Ramakrishna; Parikh, Devi; Batra, Dhruv (2017). "Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization" (PDF). 2017 IEEE International Conference on Computer Vision (ICCV). pp. 618–626. arXiv:1610.02391. doi:10.1109/ICCV.2017.74. ISBN 978-1-5386-1032-9.
  8. {IEEE ICDE 2021Luo, Zhaojing; Yeung, Sai Ho; Zhang, Meihui; Zheng, Kaiping; Zhu, Lei; Chen, Gang; Fan, Feiyi; Lin, Qian; Ngiam, Kee Yuan; Ooi, Beng Chin (2021). "MLCask: Efficient Management of Component Evolution in Collaborative Data Analytics Pipelines" (PDF). 2021 IEEE 37th International Conference on Data Engineering (ICDE). pp. 1655–1666. arXiv:2010.10246. doi:10.1109/ICDE51399.2021.00146. ISBN 978-1-7281-9184-3. S2CID 224802796. Retrieved 22 June 2021.
  9. "THE APACHE SOFTWARE FOUNDATION ANNOUNCES APACHE SINGA AS A TOP-LEVEL PROJECT". news.apache.org. 4 November 2019. Retrieved 4 November 2019.
  10. 网易 (2 June 2017). "网易携手Apache SINGA角逐人工智能新战场_网易科技". tech.163.com. Retrieved 2017-06-03.
  11. "New app allows pre-diabetics to use photos of their meal to check if it is healthy". The Straits Times. 24 January 2019. Retrieved 6 April 2019.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.