Rumour detection on Twitter is an important problem. Existing studies mainly focus on high detection accuracy, which often requires large volumes of data on contents, source credibility or propagation. In this paper we focus on early detection of rumours when data for information sources or propagation is scarce. We observe that tweets attract immediate comments from the public who often express uncertain and questioning attitudes towards rumour tweets. We therefore propose to learn user attitude distribution for Twitter posts from their comments, and then combine it with content analysis for early detection of rumours. Specifically we propose convolutional neural network (CNN) CNN and BERT neural network language models to learn attitude representation for user comments without human annotation via transfer learning based on external data sources for stance classification. We further propose CNN-BiLSTM- and BERT-based deep neural models to combine attitude representation and content representation for early rumour detection. Experiments on real-world rumour datasets show that our BERT-based model can achieve effective early rumour detection and significantly outperform start-of-the-art rumour detection models.