Spatial event forecasting from social media is an important problem but encounters critical challenges, such as dynamic patterns of features (keywords) and geographic heterogeneity (e.g., spatial correlations, imbalanced samples, and different populations in different locations). Most existing approaches (e.g., LASSO regression, dynamic query expansion, and burst detection) are designed to address some of these challenges, but not all of them. This paper proposes a novel multi-task learning framework which aims to concurrently address all the challenges. Specifically, given a collection of locations (e.g., cities), we propose to build forecasting models for all locations simultaneously by extracting and utilizing appropriate shared information that effectively increases the sample size for each location, thus improving the forecasting performance. We combine both static features derived from a predefined vocabulary by domain experts and dynamic features generated from dynamic query expansion in a multi-task feature learning framework; we investigate different strategies to balance homogeneity and diversity between static and dynamic terms. Efficient algorithms based on Iterative Group Hard Thresholding are developed to achieve efficient and effective model training and prediction. Extensive experimental evaluations on Twitter data from four different countries in Latin America demonstrated the effectiveness of our proposed approach.