This paper proposes a stochastic voting for testing a large number of Web Services (WS) under group testing. In the future, a large number of WS will be available and they need to be tested and evaluated in real time. While numerous test input generation techniques are available to generate test inputs, the oracle or the expected output of these test input is often difficult to obtain. One way to obtain the oracle in this case is to give the same input to multiple WS and to establish the oracle by a majority voting. This is based on the assumption that faulty WS often will not produce consistent results, and thus if a majority can be reached, the oracle can be established statistically. However, even correct WS may still produce slightly different outputs, and thus the majority-voting scheme must be carefully designed to distinguish correct but slightly variant output from truly incorrect output. This paper proposes a hierarchical classification based on simulated annealing and multi-dimensional Chisquare statistical techniques to analyze data to see if a majority can be reached. The algorithm is evaluated by a comprehensive simulated data as well as actual data. The data show that the proposed algorithm is effective even in a difficult situation where clusters of WS produce clusters of output.