The deployment of deep neural network (DNN) models is generally hindered by their training time. DNN training throughput is commonly limited by the fully-connected layers. This is due to their large size and low data reuse. Large batch sizes are often used to mitigate some of the effects. Increasing batch size can however hurt model accuracy, creating a tradeoff between accuracy and efficiency. We tackle the problem of training DNNs in on-chip memory, allowing us to train models without the use of batching. Pruning and quantizing dense layers can greatly reduce network size, allowing models to fit on the chip, but can only be applied after training. We propose a fully-connected but sparse layer that reduces the memory requirements of DNNs without sacrificing accuracy. We replace a dense matrix with a sparse matrix product with a predetermined topology. This allows us to: (1) train significantly smaller networks without a loss in accuracy, and (2) store weights without having to store connection indices. We therefore achieve significant training speedups due to the fast access to on-chip weights, smaller network size, and a reduced amount of computation per epoch.