Here’s a short tutorial on how to install and configure Apache SINGA, the deep learning library that has been built for training machine learning models.
Apache SINGA is an open source deep learning library developed for training large-scale machine learning models efficiently across distributed systems. It is part of the Apache Software Foundation and focuses on scalability, flexibility, and ease of use for both research and production environments. It involves setting up dependencies, building from source (or installing via package managers), and configuring it for your specific use case (e.g., single-node or distributed training).

Why Apache SINGA?
Distributed training capabilities
SINGA excels in distributed training across multiple GPUs/nodes, making it ideal for large-scale datasets or models (e.g., deep neural networks, transformers). It supports both data parallelism and model parallelism, and integrates with communication backends like MPI, gRPC, and NCCL for efficient inter-node coordination.
Fault tolerance
It automatically recovers from node failures during distributed training, ensuring robustness in production environments.
Its key features are listed below.
Horizontal scaling
Efficiently distributes training across multiple nodes using both data parallelism (splitting data) and model parallelism (splitting model layers).
Synchronous/asynchronous training
Supports flexible synchronisation strategies for distributed environments.
Fault tolerance
Checkpointing and recovery mechanisms to handle node failures during long-running tasks.
Diverse neural networks
Built-in support for CNNs, RNNs, GANs, and reinforcement learning models.
Customisable layers
Low-level APIs (C++/Python) for fine-grained control, plus high-level APIs (like Singa-Easy) for rapid prototyping.
Dynamic and static graphs
Hybrid computation graph support for both ease of use and optimisation.
Edge and cloud deployment
Lightweight export options for mobile/IoT devices and cloud platforms.
Model compression
Techniques like quantization to reduce model size for resource-constrained environments.
Installing Apache SINGA
First, ensure your system meets the following requirements.
Operating system
Linux (Ubuntu 20.04/22.04 recommended) or macOS.
Dependencies
- Python 3.6+ and pip
- CMake 3.1+
- GCC 7+ or Clang 5.0+
- OpenMPI 4.0+ (for distributed training)
- CUDA 10.2/11.x and cuDNN 8.x (for GPU support)
- OpenCV (optional, for image processing)
- Protocol Buffers (protobuf)
- BLAS (e.g., OpenBLAS, Intel MKL)
Install dependencies on Ubuntu with the following code:
#sudo apt update #sudo apt install -y build-essential cmake python3-dev python3-pip \ libopenblas-dev libopencv-dev protobuf-compiler libprotobuf-dev \ openmpi-bin libopenmpi-dev |
Option 1 to install SINGA
Install via pip (CPU-only).
For a quick CPU-only installation, type:
pip install singa |
Option 2 to install SINGA
Build from source (GPU support).
Clone the repository:
git clone https: //github .com /apache/singa .git cd singa |
Configure the build (enable CUDA, OpenMPI, etc):
mkdir build cd build cmake -DCMAKE_INSTALL_PREFIX= /usr/local \ -DENABLE_CUDA=ON \ -DENABLE_DIST=ON \ -DENABLE_TEST=ON \ |
Build and install using the following code:
make -j$(nproc) # Use all CPU cores sudo make install |
Install Python bindings:
cd .. /python pip install . |
Configure environment variables as follows:
Add the following to your ~/.bashrc or ~/.zshrc: export SINGA_HOME= /path/to/singa # Path to SINGA source directory (if built from source) export LD_LIBRARY_PATH=$LD_LIBRARY_PATH: /usr/local/lib export PATH=$PATH: /usr/local/bin |
Reload the shell:
source ~/.bashrc |
Verifying the installation
Test SINGA in Python as follows:
import singa print(singa.__version__) # Should output the installed version |
Test GPU support:
import singa print(singa.gpu_devices()) # List available GPUs |
Configuring distributed training
For multi-node training, set up MPI. Ensure OpenMPI is installed and nodes can communicate via SSH without passwords.
Create a hostfile (e.g., hostfile) listing all worker nodes:
worker1 slots=2 # 2 GPUs on worker1 worker2 slots=1 # 1 GPU on worker2 |
Run a distributed job:
mpirun -np 3 -hostfile hostfile python train.py |
Example configuration for training
Create a simple neural network using SINGA’s APIs:
import singa from singa import tensor, opt, autograd # Define a model class MLP(singa.Module): def __init__(self): super().__init__() self.w0 = tensor.Tensor((784, 512), singa.float32) self.w1 = tensor.Tensor((512, 10), singa.float32) self.w0.gaussian(0, 0.01) self.w1.gaussian(0, 0.01) def forward(self, x): x = autograd.matmul(x, self.w0) x = autograd.relu(x) x = autograd.matmul(x, self.w1) return x # Initialize model and optimizer model = MLP() sgd = opt.SGD(lr=0.01) # Training loop for epoch in range(10): for x, y in dataloader: # Replace with your data loader x = tensor.Tensor(x) y = tensor.Tensor(y) out = model(x) loss = autograd.softmax_cross_entropy(out, y) sgd.backward_and_update(loss) |
Apache SINGA is now configured for your machine! You can adjust the configurations based on your hardware (CPU/GPU) and use case (single-node/distributed).
Apache SINGA is ideal for developers and researchers who need scalable, efficient, and flexible deep learning with strong support for distributed environments. Its unique blend of performance optimisations, multi-modal data handling, and fault tolerance makes it a powerful alternative to mainstream frameworks like TensorFlow and PyTorch, especially in large-scale or resource-constrained scenarios.