CIF: Small: A Framework for Low Latency Universal Compression with Privacy Guarantees

Project: Research project


CIF: Small: A Framework for Low Latency Universal Compression with Privacy Guarantees Project Summary. In contrast to the traditional data communications models in which large blocks of data are compressed, the evolving information generation, access, and storage con- texts require compressing relatively smaller blocks of data asynchronously and concurrently from a large number of sources, and often, with additional security and privacy constraints. The range of applications with these requirements are varied and include electronic medical records (EMRs), online retailers, search engines, and social network sites that are continuously collecting, storing, and processing user data for a variety of purposes. Traditional compression methods including the ubiquitous Lempel-Ziv-Welsh scheme that achieve compression rates close to the source entropy for largele sizes are not well suited for these new applications as they are based on design princi- ples intended for the asymptotic blocklength regime. In contrast, designing and analyzing coding schemes that can achieve optimal performance for low latency application contexts requires non- asymptotic tools. This proposal addresses this growing need by developing a rigorous framework for universal lossy and lossless compression algorithms in thenite blocklength regime with strong theoretical guarantees. The proposed approach also incorporates the need for security and privacy guarantees in low latency applications via new privacy metrics and mechanisms appropriate for the nite blocklength regime. Intellectual Merit. The proposed research will study, for therst time, universal and - nite blocklength source coding. Sophisticated and rigorous tools including limit theorems and the method of types, will be applied to develop fundamental performance bounds as well as a path toward practical codes. A key technical contribution of the proposed work is a non-prex Type Size code which outperforms all prex-free codes for short blocklengths by carefully choosing code- word lengths optimally based on type class size. Building on this work, the key features of the proposed research include: (a) quantiable and computable bounds on the compression rates achiev- able bynite blocklength schemes; (b) a rigorous approach to deriving third-order approximations for compression rates for universal coding; (c)nite blocklength metrics and universal mechanisms for privacy that allow meaningful comparisons with source-agnostic formalisms such as dierential privacy; and (d) develop practical compression schemes and privacy mechanisms. The compres- sion schemes and privacy mechanisms developed can potentially increase compression eciency in numerous applications, as well as provide solid privacy assurances in this modern environment in which users are increasingly concerned about their own sensitive data in various electronic forms. Broader Impact. The `retail compression' challenge of low latency asynchronous process- ing and storage of an explosive number of data sources are immediate and ever-growing. This project can have signicant impact on universal lossless and lossy compression schemes in thenite blocklength regime, both by quantifying fundamental performance bounds and developing practi- cal (feasible) compression schemes. A potential impact of the proposed work will be in developing much-needednite blocklength compression alternatives to Lempel-Ziv-based schemes. Another valuable impact of the proposed work would be meaningful privacy metrics and mechanisms to guarantee privacy in a growing number of `retail applications' with data mining capabilities. Fur- thermore, computational tools to evaluate universal schemes can enable widespread community participation and be applied to a large class ofnite blocklength schemes, designed for either communication, compression, and/or security constraints. The PIs will also integrate the research outcomes into curriculum and engage both undergraduate and graduate students in their research. Key words:nite blocklength universal codes; privacy mechanisms; computational tools. 1
Effective start/end date9/1/148/31/18


  • National Science Foundation (NSF): $498,213.00


Electronic medical equipment
Search engines
Data mining