Analysis of motion expertise is an important problem in many domains including sports and surgery. In recent years, surgical simulation has emerged at the forefront of new technologies for improving the education and training of surgical residents. In simulation-based surgical training, a key task is to rate the performance of the operators, which is done currently by senior surgeons. This paper introduces a novel solution to this problem through employing vision-based techniques. We develop an automatic, video-based approach to analyzing the motion skills of a surgeon in simulation-based surgical training, where a surgical action is captured by multiple video cameras with little or no calibration, resulting in multiple video streams of heterogeneous properties. Typical multiple-view vision techniques are inadequate for processing such data. We propose a novel approach that employs both canonical correlation analysis (CCA) and the bag-of-words model to classify the expertise level of the subject based on the heterogeneous video streams capturing both the motion of the subject's hands and the resultant motion of the tools. Experiments were designed and performed to validate the proposed approach using realistic data captured from resident surgeons in local hospitals. The results suggest that the proposed approach may provide a promising practical solution to the real world problem evaluating motion skills in simulation-based surgical training.