In this paper, we study the problem of 'test-driving' a detector, i.e. allowing a human user to get a quick sense of how well the detector generalizes to their specific requirement. To this end, we present the first system that estimates detector performance interactively without extensive ground truthing using a human in the loop. We approach this as a problem of estimating proportions and show that it is possible to make accurate inferences on the proportion of classes or groups within a large data collection by observing only 5 - 10% of samples from the data. In estimating the false detections (for precision), the samples are chosen carefully such that the overall characteristics of the data collection are preserved. Next, inspired by its use in estimating disease propagation we apply pooled testing approaches to estimate missed detections (for recall) from the dataset. The estimates thus obtained are close to the ones obtained using ground truth, thus reducing the need for extensive labeling which is expensive and time consuming.