Abstract

Universal source coding at short blocklengths is considered for an i.i.d. exponential family of distributions. The Type Size code has previously been shown to be optimal up to the third-order rate for universal compression of all memoryless sources over finite alphabets. The Type Size code assigns sequences ordered based on their type class sizes to binary strings ordered lexicographically. To generalize this type class approach for parametric sources, a natural scheme is to define two sequences to be in the same type class if and only if they are equiprobable under any model in the parametric class. This natural approach, however, is shown to be suboptimal. A variation of the Type Size code is introduced, where type classes are defined based on neighborhoods of minimal sufficient statistics. The asymptotics of the overflow rate of this variation are derived and a converse result establishes its optimality up to the third-order term.

Original languageEnglish (US)
Article number8638791
Pages (from-to)2442-2458
Number of pages17
JournalIEEE Transactions on Information Theory
Volume65
Issue number4
DOIs
StatePublished - Apr 1 2019

Fingerprint

coding
statistics
Statistics

Keywords

  • Data compression
  • finite blocklength
  • parametric statistics
  • universal source coding

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Cite this

Fine Asymptotics for Universal One-to-One Compression of Parametric Sources. / Iri, Nematollah; Kosut, Oliver.

In: IEEE Transactions on Information Theory, Vol. 65, No. 4, 8638791, 01.04.2019, p. 2442-2458.

Research output: Contribution to journalArticle

@article{0090fbc7ad83416ebbb487aa82284f79,
title = "Fine Asymptotics for Universal One-to-One Compression of Parametric Sources",
abstract = "Universal source coding at short blocklengths is considered for an i.i.d. exponential family of distributions. The Type Size code has previously been shown to be optimal up to the third-order rate for universal compression of all memoryless sources over finite alphabets. The Type Size code assigns sequences ordered based on their type class sizes to binary strings ordered lexicographically. To generalize this type class approach for parametric sources, a natural scheme is to define two sequences to be in the same type class if and only if they are equiprobable under any model in the parametric class. This natural approach, however, is shown to be suboptimal. A variation of the Type Size code is introduced, where type classes are defined based on neighborhoods of minimal sufficient statistics. The asymptotics of the overflow rate of this variation are derived and a converse result establishes its optimality up to the third-order term.",
keywords = "Data compression, finite blocklength, parametric statistics, universal source coding",
author = "Nematollah Iri and Oliver Kosut",
year = "2019",
month = "4",
day = "1",
doi = "10.1109/TIT.2019.2898659",
language = "English (US)",
volume = "65",
pages = "2442--2458",
journal = "IEEE Transactions on Information Theory",
issn = "0018-9448",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "4",

}

TY - JOUR

T1 - Fine Asymptotics for Universal One-to-One Compression of Parametric Sources

AU - Iri, Nematollah

AU - Kosut, Oliver

PY - 2019/4/1

Y1 - 2019/4/1

N2 - Universal source coding at short blocklengths is considered for an i.i.d. exponential family of distributions. The Type Size code has previously been shown to be optimal up to the third-order rate for universal compression of all memoryless sources over finite alphabets. The Type Size code assigns sequences ordered based on their type class sizes to binary strings ordered lexicographically. To generalize this type class approach for parametric sources, a natural scheme is to define two sequences to be in the same type class if and only if they are equiprobable under any model in the parametric class. This natural approach, however, is shown to be suboptimal. A variation of the Type Size code is introduced, where type classes are defined based on neighborhoods of minimal sufficient statistics. The asymptotics of the overflow rate of this variation are derived and a converse result establishes its optimality up to the third-order term.

AB - Universal source coding at short blocklengths is considered for an i.i.d. exponential family of distributions. The Type Size code has previously been shown to be optimal up to the third-order rate for universal compression of all memoryless sources over finite alphabets. The Type Size code assigns sequences ordered based on their type class sizes to binary strings ordered lexicographically. To generalize this type class approach for parametric sources, a natural scheme is to define two sequences to be in the same type class if and only if they are equiprobable under any model in the parametric class. This natural approach, however, is shown to be suboptimal. A variation of the Type Size code is introduced, where type classes are defined based on neighborhoods of minimal sufficient statistics. The asymptotics of the overflow rate of this variation are derived and a converse result establishes its optimality up to the third-order term.

KW - Data compression

KW - finite blocklength

KW - parametric statistics

KW - universal source coding

UR - http://www.scopus.com/inward/record.url?scp=85063278685&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063278685&partnerID=8YFLogxK

U2 - 10.1109/TIT.2019.2898659

DO - 10.1109/TIT.2019.2898659

M3 - Article

AN - SCOPUS:85063278685

VL - 65

SP - 2442

EP - 2458

JO - IEEE Transactions on Information Theory

JF - IEEE Transactions on Information Theory

SN - 0018-9448

IS - 4

M1 - 8638791

ER -