Sparse Structure Identification from High-Dimensional Epigenomic

Project: Research project

Description

Evidence is accumulating to support the hypothesis that different combinations of histone modifications confer different functional specificities. Identification of various histone modification patterns and linking them with functional elements of the genome is of great interest in epigenetics. High-throughput experimental techniques, such as ChIP-chip and ChIP-Seq, lead to a rich amount of histone modification data. However, current experimental and computational methods have only been able to explore these data to a very limited extent. We propose novel statistical methods for sparse structure identification from histone modification data. Imposing sparsity is an ideal way for handling extremely high-dimensional data with noisy information and small sample size. Compared with existing research, our proposed methods have several transformative features: (1) our methods aim to handle the complete set of histone marks on the entire human genome super-dimensional datasets, rather than just focusing on only a few histone marks in limited regions; (2) our methods focus on discovering novel chromatin signatures which will lead to identification of a wide array of previously unknown functional elements on the genome, rather than confirming chromatin modification patterns at a handful of known functional sites; (3) our methods explore the complex relationship between histone marks and how they collaborate to deliver a specific regulation, rather than just using the histone mark profiles as predictors for known functional elements; (4) our methods integrate other genetic data sources with the histone modification data, such as the DNA motif data, to uncover the underlying fundamental molecular mechanism of histone modification, rather than superficial collection of epigenomic data.
StatusFinished
Effective start/end date9/1/108/31/15

Funding

  • HHS-NIH: National Institute of General Medical Sciences (NIGMS): $288,880.00

Fingerprint

Genes
Computational methods
Statistical methods
DNA
Throughput