The observation that a significant class of data processing and analysis applications can be expressed in terms of a small set of primitives that are easy to parallelize has resulted in increasing popularity of batch-oriented, highly-parallelizable cluster frameworks to support data analysis services. These frameworks, however, are known to have shortcomings for certain application domains. For example, in many data analysis applications, the utility of a given data element to the particular analysis task depends on the way the data is collected (e.g. its precision) or interpreted. However, since existing batch oriented data processing frameworks do not consider variations in data utility, they are not able to focus on the best results. Even if the user is interested in obtaining a relatively small subset of the best result instances, these systems often need to enumerate entire result sets, even if these sets contain low-utility results. RanKloud is an efficient and scalable utility-aware parallel processing system for ranked query processing over large data sets. In this paper, we focus on the uSplit data partitioning and work-allocation strategies of RanKloud for processing top-k join queries to support data analysis services. In particular, we describe how uSplit adaptively samples data from "upstream" operators to help allocate resources in a work-balanced and wasted-work avoiding manner for top-k join processing. Experimental results show that the proposed sampling, data partitioning, and join processing strategies enable uSplit to return top-k results with high confidence and low-overhead (up to ∼ 9× faster than alternative schemes on 10 servers).