Recent breakthroughs in deep neural networks have led to the proliferation of its use in image and speech applications. Conventional deep neural networks (DNNs) are fully-connected multi-layer networks with hundreds or thousands of neurons in each layer. Such a network requires a very large weight memory to store the connectivity between neurons. In this paper, we propose a hardware-centric methodology to design low power neural networks with significantly smaller memory footprint and computation resource requirements. We achieve this by judiciously dropping connections in large blocks of weights. The corresponding technique, termed coarse-grain sparsification (CGS), introduces hardware-aware sparsity during the DNN training, which leads to efficient weight memory compression and significant computation reduction during classification without losing accuracy. We apply the proposed approach to DNN design for keyword detection and speech recognition. When the two DNNs are trained with 75% of the weights dropped and classified with 5-6 bit weight precision, the weight memory requirement is reduced by 95% compared to their fully-connected counterparts with double precision, while maintaining similar performance in keyword detection accuracy, word error rate, and sentence error rate. To validate this technique in real hardware, a time-multiplexed architecture using a shared multiply and accumulate (MAC) engine was implemented in 65nm and 40nm low power (LP) CMOS. In 40nm at 0.6V, the keyword detection network consumes 7μW and the speech recognition network consumes 103μW, making this technique highly suitable for mobile and wearable devices.