Deep neural networks (DNNs) have unleashed a new wave of applications on mobile devices, such as various intelligent personal assistants. Most of these applications rely on the use of cloud resources to perform deep learning. With increasingly more powerful mobile devices, users can perform more deep learning tasks on the devices. In addition, learning on the devices has important advantages, such as personalization, privacy, and responsiveness; however, a good understanding of the capabilities of modern mobile devices in supporting deep learning is generally lacking. To address this gap in knowledge, this paper presents a comprehensive study on performing training and inference on mobile devices. It develops TensorFlow+, an extension of the widely used TensorFlow framework, to enable training DNNs on devices and use the available GPUs to accelerate the learning tasks. The study focuses on four aspects: 1) the performance impact of the network architecture; 2) the effectiveness of using accelerators for learning on mobile devices; 3) the resource and battery usages of training and inference; and 4) the performance impact on other applications running on the devices. The results show that the size (width and depth) of a network as well as the types of layers that it uses are important to not only meeting the device's capability but also to the performance of learning. The study also shows that hardware acceleration is important to both improving the speed of learning and reducing the impact on other applications on the device.