The security of active distribution systems is critical to grid modernization along with deep renewable penetration, where the protection plays a vital role. Among various security issues in protection, conventional protection clears only 17.5% of staged high impedance faults (HIFs) due to the limited electrical data utilization. For resolving this problem, a detection and location scheme based on μ-PMUs is presented to enhance data processing capability for HIF detection through machine learning and big data analytics. To detect HIFs with reduced cost on data labeling, we choose expectation-maximization (EM) algorithm for semi-supervised learning (SSL) since it is capable of expressing complex relationships between the observed and target variables by fitting Gaussian models. As one of the generative models, EM algorithm is compared with two discriminative models to highlight its detection performance. To make HIF location robust to HIF impedance variation, we adopt a probabilistic model embedding parameter learning into the physical line modeling. The location accuracy is validated at multiple locations of a distribution line. Numerical results show that the proposed EM algorithm greatly saves labeling cost and outperforms other SSL methods. Hardware-in-the-loop simulation proves a superior HIF location accuracy and detection time to complement the HIF's probabilistic model. With outstanding performance, we develop software for our utility partner to integrate the proposed scheme.