TY - GEN
T1 - AlphaSum
T2 - 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT'09
AU - Candan, Kasim
AU - Cao, Huiping
AU - Qi, Yan
AU - Sapino, Maria Luisa
N1 - Copyright:
Copyright 2009 Elsevier B.V., All rights reserved.
PY - 2009
Y1 - 2009
N2 - Consider a scientist who wants to explore multiple data sets to select the relevant ones for further analysis. Since the visualization real estate may put a stringent constraint on how much detail can be presented to this user in a single page, effective table summarization techniques are needed to create summaries that are both sufficiently small and effective in communicating the available content. In this paper, we first argue that table summarization can benefit from knowledge about acceptable value clustering alternatives for clustering the values in the database. We formulate the problem of table summarization with the help of value lattices. We then provide a framework to express alternative clustering strategies and to account for various utility measures (such as information loss) in assessing different summarization alternatives. Based on this interpretation, we introduce three preference criteria, max-min-util (cautious), max-sum-util (cumulative), and pareto-util, for the problem of table summarization. To tackle with the inherent complexity, we rely on the properties of the fuzzy interpretation to further develop a novel ranked set cover based evaluation mechanism (RSC). These are brought together in an AlphaSum, table summarization system. Experimental evaluations showed that RSC improves both execution times and the summary qualities in AlphaSum, by pruning the search space more effectively than the existing solutions.
AB - Consider a scientist who wants to explore multiple data sets to select the relevant ones for further analysis. Since the visualization real estate may put a stringent constraint on how much detail can be presented to this user in a single page, effective table summarization techniques are needed to create summaries that are both sufficiently small and effective in communicating the available content. In this paper, we first argue that table summarization can benefit from knowledge about acceptable value clustering alternatives for clustering the values in the database. We formulate the problem of table summarization with the help of value lattices. We then provide a framework to express alternative clustering strategies and to account for various utility measures (such as information loss) in assessing different summarization alternatives. Based on this interpretation, we introduce three preference criteria, max-min-util (cautious), max-sum-util (cumulative), and pareto-util, for the problem of table summarization. To tackle with the inherent complexity, we rely on the properties of the fuzzy interpretation to further develop a novel ranked set cover based evaluation mechanism (RSC). These are brought together in an AlphaSum, table summarization system. Experimental evaluations showed that RSC improves both execution times and the summary qualities in AlphaSum, by pruning the search space more effectively than the existing solutions.
UR - http://www.scopus.com/inward/record.url?scp=70349156149&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70349156149&partnerID=8YFLogxK
U2 - 10.1145/1516360.1516373
DO - 10.1145/1516360.1516373
M3 - Conference contribution
AN - SCOPUS:70349156149
SN - 9781605584225
T3 - Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT'09
SP - 96
EP - 107
BT - Proceedings of the 12th International Conference on Extending Database Technology
Y2 - 24 March 2009 through 26 March 2009
ER -