Hand Collecting and Coding Versus Data-Driven Methods in Technical and Professional Communication Research

Claire Lauer, Eva Brumberger, Aaron Beveridge

Research output: Contribution to journalArticle

Abstract

Background: Qualitative technical communication research often produces datasets that are too large to manage effectively with hand-coded approaches. Text-mining methods, used carefully, may uncover patterns and provide results for larger datasets that are more easily reproduced and scaled. Research questions: 1. To what degree can hand collection results be replicated by automated data collection? 2. To what degree can hand-coded results be replicated by machine coding? 3. What are the affordances and limitations of each method? Literature review: We introduce the stages of data collection and analysis that researchers typically discuss in the literature, and show how researchers in technical communication and other fields have discussed the affordances and limitations of hand collection and coding versus automated methods throughout each stage. Research methodology: We utilize an existing dataset that was hand-collected and hand-coded. We discuss the collection and coding processes, and demonstrate how they might be replicated with web scraping and machine coding. Results/discussion: We found that web scraping demonstrated an obvious advantage of automated data collection: speed. Machine coding was able to provide comparable outputs to hand coding for certain types of data; for more nuanced and verbally complex data, machine coding was less useful and less reliable. Conclusions: Our findings highlight the importance of considering the context of a particular project when weighing the affordances and limitations of hand collecting and coding over automated approaches. Ultimately, a mixed-methods approach that relies on a combination of hand coding and automated coding should prove to be the most productive for current and future kinds of technical communication work, in which close attention to the nuances of language is critical, but in which processing large amounts of data would yield significant benefits as well.

Original languageEnglish (US)
JournalIEEE Transactions on Professional Communication
DOIs
StateAccepted/In press - Jan 1 2018

Fingerprint

Communication
Weighing
Processing
Professional communication
Technical communication
Data collection
World Wide Web
Language
Text mining
Mixed methods
Literature review

Keywords

  • Coding
  • data analysis
  • Data collection
  • Databases
  • Encoding
  • machine reading
  • Natural language processing
  • natural language processing (NLP)
  • Reliability
  • Text mining
  • text mining
  • web scraping

ASJC Scopus subject areas

  • Industrial relations
  • Electrical and Electronic Engineering

Cite this

@article{b4231514bb6947cf85da6f775405cb40,
title = "Hand Collecting and Coding Versus Data-Driven Methods in Technical and Professional Communication Research",
abstract = "Background: Qualitative technical communication research often produces datasets that are too large to manage effectively with hand-coded approaches. Text-mining methods, used carefully, may uncover patterns and provide results for larger datasets that are more easily reproduced and scaled. Research questions: 1. To what degree can hand collection results be replicated by automated data collection? 2. To what degree can hand-coded results be replicated by machine coding? 3. What are the affordances and limitations of each method? Literature review: We introduce the stages of data collection and analysis that researchers typically discuss in the literature, and show how researchers in technical communication and other fields have discussed the affordances and limitations of hand collection and coding versus automated methods throughout each stage. Research methodology: We utilize an existing dataset that was hand-collected and hand-coded. We discuss the collection and coding processes, and demonstrate how they might be replicated with web scraping and machine coding. Results/discussion: We found that web scraping demonstrated an obvious advantage of automated data collection: speed. Machine coding was able to provide comparable outputs to hand coding for certain types of data; for more nuanced and verbally complex data, machine coding was less useful and less reliable. Conclusions: Our findings highlight the importance of considering the context of a particular project when weighing the affordances and limitations of hand collecting and coding over automated approaches. Ultimately, a mixed-methods approach that relies on a combination of hand coding and automated coding should prove to be the most productive for current and future kinds of technical communication work, in which close attention to the nuances of language is critical, but in which processing large amounts of data would yield significant benefits as well.",
keywords = "Coding, data analysis, Data collection, Databases, Encoding, machine reading, Natural language processing, natural language processing (NLP), Reliability, Text mining, text mining, web scraping",
author = "Claire Lauer and Eva Brumberger and Aaron Beveridge",
year = "2018",
month = "1",
day = "1",
doi = "10.1109/TPC.2018.2870632",
language = "English (US)",
journal = "IEEE Transactions on Professional Communication",
issn = "0361-1434",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Hand Collecting and Coding Versus Data-Driven Methods in Technical and Professional Communication Research

AU - Lauer, Claire

AU - Brumberger, Eva

AU - Beveridge, Aaron

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Background: Qualitative technical communication research often produces datasets that are too large to manage effectively with hand-coded approaches. Text-mining methods, used carefully, may uncover patterns and provide results for larger datasets that are more easily reproduced and scaled. Research questions: 1. To what degree can hand collection results be replicated by automated data collection? 2. To what degree can hand-coded results be replicated by machine coding? 3. What are the affordances and limitations of each method? Literature review: We introduce the stages of data collection and analysis that researchers typically discuss in the literature, and show how researchers in technical communication and other fields have discussed the affordances and limitations of hand collection and coding versus automated methods throughout each stage. Research methodology: We utilize an existing dataset that was hand-collected and hand-coded. We discuss the collection and coding processes, and demonstrate how they might be replicated with web scraping and machine coding. Results/discussion: We found that web scraping demonstrated an obvious advantage of automated data collection: speed. Machine coding was able to provide comparable outputs to hand coding for certain types of data; for more nuanced and verbally complex data, machine coding was less useful and less reliable. Conclusions: Our findings highlight the importance of considering the context of a particular project when weighing the affordances and limitations of hand collecting and coding over automated approaches. Ultimately, a mixed-methods approach that relies on a combination of hand coding and automated coding should prove to be the most productive for current and future kinds of technical communication work, in which close attention to the nuances of language is critical, but in which processing large amounts of data would yield significant benefits as well.

AB - Background: Qualitative technical communication research often produces datasets that are too large to manage effectively with hand-coded approaches. Text-mining methods, used carefully, may uncover patterns and provide results for larger datasets that are more easily reproduced and scaled. Research questions: 1. To what degree can hand collection results be replicated by automated data collection? 2. To what degree can hand-coded results be replicated by machine coding? 3. What are the affordances and limitations of each method? Literature review: We introduce the stages of data collection and analysis that researchers typically discuss in the literature, and show how researchers in technical communication and other fields have discussed the affordances and limitations of hand collection and coding versus automated methods throughout each stage. Research methodology: We utilize an existing dataset that was hand-collected and hand-coded. We discuss the collection and coding processes, and demonstrate how they might be replicated with web scraping and machine coding. Results/discussion: We found that web scraping demonstrated an obvious advantage of automated data collection: speed. Machine coding was able to provide comparable outputs to hand coding for certain types of data; for more nuanced and verbally complex data, machine coding was less useful and less reliable. Conclusions: Our findings highlight the importance of considering the context of a particular project when weighing the affordances and limitations of hand collecting and coding over automated approaches. Ultimately, a mixed-methods approach that relies on a combination of hand coding and automated coding should prove to be the most productive for current and future kinds of technical communication work, in which close attention to the nuances of language is critical, but in which processing large amounts of data would yield significant benefits as well.

KW - Coding

KW - data analysis

KW - Data collection

KW - Databases

KW - Encoding

KW - machine reading

KW - Natural language processing

KW - natural language processing (NLP)

KW - Reliability

KW - Text mining

KW - text mining

KW - web scraping

UR - http://www.scopus.com/inward/record.url?scp=85055047208&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055047208&partnerID=8YFLogxK

U2 - 10.1109/TPC.2018.2870632

DO - 10.1109/TPC.2018.2870632

M3 - Article

JO - IEEE Transactions on Professional Communication

JF - IEEE Transactions on Professional Communication

SN - 0361-1434

ER -