This repo is a tutorial on how to train a CNN model in a distributed fashion using Batch AI. The scenario covered is image classification, but the solution can be generalized for other deep learning scenarios such as segmentation and object detection. Image classification is a common task in computer vision applications and is often tackled by training a convolutional neural network (CNN). For particularly large models with large datasets, the training process can take weeks or months on a single GPU. In some situations, the models are so large that it isn’t possible to fit reasonable batch sizes onto the GPU. Using distributed training in these situations helps shorten the training time. In this specific scenario, a ResNet50 CNN model is trained using Horovod on the ImageNet dataset as well as on synthetic data. The tutorial demonstrates how to accomplish this using three of the most popular deep learning frameworks: TensorFlow, Keras, and PyTorch. There are number of ways to train a deep learning model in a distributed fashion, including data parallel and model parallel approaches based on synchronous and asynchronous updates. Currently the most common scenario is data parallel with synchronous updates—it’s the easiest to implement and sufficient for the majority of use cases. In data parallel distributed training with synchronous updates the model is replicated across N hardware devices and a mini-batch of training samples is divided into N micro-batches (see Figure 2). Each device performs the forward and backward pass for a micro-batch and when it finishes the process it shares the updates with the other devices. These are then used to calculate the updated weights of the entire mini-batch and then the weights are synchronized across the models. This is the scenario that is covered in the GitHub repository. The same architecture though can be used for model parallel and asynchronous updates.