Many organizations are developing autonomous driving systems, which are expected to be deployed at a large scale in the near future. Despite this, there is a lack of agreement on appropriate methods to test, debug, and certify the performance of these systems. One of the main challenges is that many autonomous driving systems have machine learning (ML) components, such as deep neural networks, for which formal properties are difficult to characterize. We present a testing framework that is compatible with test case generation and automatic falsification methods, which are used to evaluate cyber-physical systems. We demonstrate how the framework can be used to evaluate closed-loop properties of an autonomous driving system model that includes the ML components, all within a virtual environment. We demonstrate how to use test case generation methods, such as covering arrays, as well as requirement falsification methods to automatically identify problematic test scenarios. The resulting framework can be used to increase the reliability of autonomous driving systems.