Deep neural networks (DNNs), especially convolutional neural networks (CNNs), have been effective in various data-driven applications. Yet, DNNs suffer from several major challenges; in particular, in many applications where the input data is relatively sparse, DNNs face the problems of overfitting to the input data and poor generalizability. This brings up several critical questions: "Are all inputs equally important" "Can we selectively focus on parts of the input data in a way that reduces overfitting to irrelevant observations" Recently, attention networks showed some success in helping the overall process focus onto parts of the data that carry higher importance in the current context. Yet, we note that the current attention network design approaches are not sufficiently informed about the key data characteristics in identifying salient regions in the data. We propose an innovative robust feature learning framework, scale-invariant attention networks (SAN), that identifies salient regions in the input data for the CNN to focus on. Unlike the existing attention networks, SAN concentrates attention on parts of the data where there is major change across space and scale. We argue, and experimentally show, that the salient regions identified by SAN lead to better network performance compared to state-of-the-art (attentioned and non-attentioned) approaches, including architectures such as LeNet, VGG, ResNet, and LSTM, with common benchmark datasets, MNIST, FMNIST, CIFAR10/20/100, GTSRB, ImageNet, Mocap, Aviage, and GTSDB for tasks such as image/time series classification, time series forecasting and object detection in images.