TY - JOUR
T1 - GTT
T2 - Leveraging data characteristics for guiding the tensor train decomposition
AU - Li, Mao Lin
AU - Candan, K. Selçuk
AU - Sapino, Maria Luisa
N1 - Funding Information:
This work is an extended version of Mao-Lin Li, K. Selçuk Candan, Maria Luisa Sapino. “GTT: Guiding the Tensor Train Decomposition” published in the International Conference on Similarity Search and Applications (SISAP) 2020. This work is supported by NSF#1610282 “ DataStorm: A Data Enabled System for End-to-End Disaster Planning and Response ”, NSF#1633381 “ BIGDATA: Discovering Context-Sensitive Impact in Complex Systems ”, NSF#1909555 “ pCAR: Discovering and Leveraging Plausibly Causal (p-causal) Relationships to Understand Complex Dynamic Systems ”, and “ FourCmodeling ”: EUH2020 Marie Sklodowska-Curie grant agreement No 690817 . Results were obtained using the ChameleonCloud resources supported by the NSF .
Publisher Copyright:
© 2022
PY - 2022/9
Y1 - 2022/9
N2 - The demand for searching, querying multimedia data such as image, video and audio is omnipresent, how to effectively access data for various applications is a critical task. Nevertheless, these data usually are encoded as multi-dimensional arrays, or tensor, and traditional data mining techniques might be limited due to the curse of dimensionality. Tensor decomposition is proposed to alleviate this issue. Commonly used tensor decomposition algorithms include CP-decomposition (which seeks a diagonal core) and Tucker-decomposition (which seeks a dense core). Naturally, Tucker maintains more information, but due to the denseness of the core, it also is subject to exponential memory growth with the number of tensor modes. Tensor train (TT) decomposition addresses this problem by seeking a sequence of three-mode cores: but unfortunately, currently, there are no guidelines to select the decomposition sequence. In this paper, we propose a GTT method for guiding the tensor train in selecting the decomposition sequence. GTT leverages the data characteristics (including number of modes, length of the individual modes, density, distribution of mutual information, and distribution of entropy) as well as the target decomposition rank to pick a decomposition order that will preserve information. Experiments with various data sets demonstrate that GTT effectively guides the TT-decomposition process towards decomposition sequences that better preserve accuracy.
AB - The demand for searching, querying multimedia data such as image, video and audio is omnipresent, how to effectively access data for various applications is a critical task. Nevertheless, these data usually are encoded as multi-dimensional arrays, or tensor, and traditional data mining techniques might be limited due to the curse of dimensionality. Tensor decomposition is proposed to alleviate this issue. Commonly used tensor decomposition algorithms include CP-decomposition (which seeks a diagonal core) and Tucker-decomposition (which seeks a dense core). Naturally, Tucker maintains more information, but due to the denseness of the core, it also is subject to exponential memory growth with the number of tensor modes. Tensor train (TT) decomposition addresses this problem by seeking a sequence of three-mode cores: but unfortunately, currently, there are no guidelines to select the decomposition sequence. In this paper, we propose a GTT method for guiding the tensor train in selecting the decomposition sequence. GTT leverages the data characteristics (including number of modes, length of the individual modes, density, distribution of mutual information, and distribution of entropy) as well as the target decomposition rank to pick a decomposition order that will preserve information. Experiments with various data sets demonstrate that GTT effectively guides the TT-decomposition process towards decomposition sequences that better preserve accuracy.
KW - Low-rank embedding
KW - Order selection
KW - Tensor train decomposition
UR - http://www.scopus.com/inward/record.url?scp=85132574157&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85132574157&partnerID=8YFLogxK
U2 - 10.1016/j.is.2022.102047
DO - 10.1016/j.is.2022.102047
M3 - Article
AN - SCOPUS:85132574157
SN - 0306-4379
VL - 108
JO - Information Systems
JF - Information Systems
M1 - 102047
ER -