ConceptDoppler: A weather tracker for internet censorship

Jedidiah R. Crandall, Daniel Zinn, Michael Byrd, Earl Barr, Rich East

Research output: Chapter in Book/Report/Conference proceedingConference contribution

65 Scopus citations

Abstract

The text of this paper has passed across many Internet routers on its way to the reader, but some routers will not pass it along unfettered because of censored words it contains. We present two sets of results: 1) Internet measurements of keyword filtering by the Great "Firewall" of China (GFC); and 2) initial results of using latent semantic analysis as an efficient way to reproduce a blacklist of censored words via probing. Our Internet measurements suggest that the GFCs keyword filtering is more a panopticon than a firewall, i.e., it need not block every illicit word, but only enough to promote self-censorship. Chinas largest ISP, ChinaNET, performed 83.3% of all filtering of our probes, and 99.1% of all filtering that occurred at the first hop past the Chinese border. Filtering occurred beyond the third hop for 11.8% of our probes, and there were sometimes as many as 13 hops past the border to a filtering router. Approximately 28.3% of the Chinese hosts we sent probes to were reachable along paths that were not filtered at all. While more tests are needed to provide a definitive picture of the GFCs implementation, our results disprove the notion that GFC keyword filtering is a firewall strictly at the border of Chinas Internet. While evading a firewall a single time defeats its purpose, it would be necessary to evade a panopticon almost every time. Thus, in lieu of evasion, we propose ConceptDoppler, an architecture for maintaining a censorship "weather report" about what keywords are filtered over time. Probing with potentially filtered keywords is arduous due to the GFCs complexity and can be invasive if not done efficiently. Just as an understanding of the mixing of gases preceded effective weather reporting, understanding of the relationship between keywords and concepts is essential for tracking Internet censorship. We show that LSA can effectively pare down a corpus of text and cluster filtered keywords for efficient probing, present 122 keywords we discovered by probing, and underscore the need for tracking and studying censorship blacklists by discovering some surprising blacklisted keywords such as l-. (conversion rate),-.K.(Mein Kampf), andE0(fT-- (International geological scientific federation (Beijing)).

Original languageEnglish (US)
Title of host publicationCCS'07 - Proceedings of the 14th ACM Conference on Computer and Communications Security
Pages352-365
Number of pages14
DOIs
StatePublished - 2007
Externally publishedYes
Event14th ACM Conference on Computer and Communications Security, CCS'07 - Alexandria, VA, United States
Duration: Oct 29 2007Nov 2 2007

Publication series

NameProceedings of the ACM Conference on Computer and Communications Security
ISSN (Print)1543-7221

Conference

Conference14th ACM Conference on Computer and Communications Security, CCS'07
Country/TerritoryUnited States
CityAlexandria, VA
Period10/29/0711/2/07

Keywords

  • Blacklist
  • ConceptDoppler
  • Firewall ruleset discovery
  • Great firewall of China
  • Internet censorship
  • Internet measurement
  • Keyword filtering
  • Latent semantic analysis
  • Latent semantic indexing
  • LSA
  • Panopticon

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'ConceptDoppler: A weather tracker for internet censorship'. Together they form a unique fingerprint.

Cite this