We propose a novel technique based on compressive sensing for expression-invariant face recognition. We view the different images of the same subject as an ensemble of intercorrelated signals and assume that changes due to variation in expressions are sparse with respect to the whole image. We exploit this sparsity using distributed compressive sensing theory, which enables us to grossly represent the training images of a given subject by only two feature images: one that captures the holistic (common) features of the face, and the other that captures the different expressions in all training samples. We show that a new test image of a subject can be fairly well approximated using only the two feature images from the same subject. Hence we can drastically reduce the storage space and operational dimensionality by keeping only these two feature images or their random measurements. Based on this, we design an efficient expression-invariant classifier. Furthermore, we show that substantially low dimensional versions of the training features, such as (i) ones extracted from critically-downsampled training images, or (ii) low-dimensional random projection of original feature images, still have sufficient information for good classification. Extensive experiments with publically-available databases show that, on average, our approach performs better than the state-of-the-art despite using only such super-compact feature representation.