Advances in sensory technologies are enabling the capture of a diverse spectrum of real-world data streams. Increasing availability of such data, especially in the form of multivariate time series, allows for new opportunities for applications that rely on identifying and leveraging complex temporal patterns A particular challenge such algorithms face is that complex patterns consist of multiple simpler patterns of varying scales (temporal length). While several recent works (such as multi-head attention networks) recognized the fact complex patterns need to be understood in the form of multiple simpler patterns, we note that existing works lack the ability of represent the interactions across these constituting patterns. To tackle this limitation, in this paper, we propose a novel Multi-scale Multi-head Attention with Cross-Talk (XM2A) framework designed to represent multi-scale patterns that make up a complex pattern by configuring each attention head to learn a pattern at a particular scale and accounting for the co-existence of patterns at multiple scales through a cross-talking mechanism among the heads. Experiments show that XM2A outperforms state-of-the-art attention mechanisms, such as Transformer and MSMSA, on benchmark datasets, such as SADD, AUSLAN, and MOCAP.