Support Estimation with Sampling Artifacts and Errors

Eli Chien, Olgica Milenkovic, Angelia Nedich

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

The problem of estimating the support of a distribution is of great importance in many areas of machine learning, computer science and molecular biology. Almost all of the existing work in this area has used perfectly accurate sampling assumptions, which is seldom true in practice. Here we introduce the first known theoretical approach to support estimation in the presence of sampling artifacts, where each sample is assumed to be observed through a Poisson channel that simultaneously captures repetitions and deletions. The proposed estimator is based on regularized weighted Chebyshev approximations, with weights governed by evaluations of Touchard (Bell) polynomials. The supports in the presence of sampling artifacts are calculated via discretized semi-infinite programming methods. The newly proposed estimation approach is tested on synthetic and textual data, as well as on GISAID data for the purpose of estimating the mutational diversity of genes in the SARS-Cov-2 viral genome. For all experiments performed, we observed significant improvements of our integrated method compared to adequately modified known noiseless support estimation methods.

Original languageEnglish (US)
Title of host publication2021 IEEE International Symposium on Information Theory, ISIT 2021 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages244-249
Number of pages6
ISBN (Electronic)9781538682098
DOIs
StatePublished - Jul 12 2021
Externally publishedYes
Event2021 IEEE International Symposium on Information Theory, ISIT 2021 - Virtual, Melbourne, Australia
Duration: Jul 12 2021Jul 20 2021

Publication series

NameIEEE International Symposium on Information Theory - Proceedings
Volume2021-July
ISSN (Print)2157-8095

Conference

Conference2021 IEEE International Symposium on Information Theory, ISIT 2021
Country/TerritoryAustralia
CityVirtual, Melbourne
Period7/12/217/20/21

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Information Systems
  • Modeling and Simulation
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Support Estimation with Sampling Artifacts and Errors'. Together they form a unique fingerprint.

Cite this