In order to meet the real-time performance requirements, intelligent decisions in many IoT applications must take place right here right now at the network edge. The conventional cloud-based learning approach would not be able to keep up with the demands in achieving edge intelligence in these applications. Nevertheless, pushing the artificial intelligence (AI) frontier to achieve edge intelligence is highly nontrivial due to the constrained computing resources and limited training data at the network edge. To tackle these challenges, we develop a distributionally robust optimization (DRO)-based edge learning algorithm, where the uncertainty model is constructed to foster the synergy of cloud knowledge transfer and local training. Specifically, the knowledge transferred from the cloud is in the form of a Dirichlet process prior distribution for the edge model parameters, and the edge device further constructs an uncertainty set centered around the empirical distribution of its local samples to capture the information of local data processing. The edge learning DRO problem, subject to the above two distributional uncertainty constraints, is then recast as an equivalent single-layer optimization problem using a duality approach. We then use an Expectation-Maximization (EM) algorithm-inspired method to derive a convex relaxation, based on which we devise algorithms to learn the edge model parameters. Finally, extensive experiments are implemented to showcase the performance gain over standard learning approaches using local edge data only.