TY - GEN
T1 - Language without words
T2 - 2012 Joint 6th International Conference on Soft Computing and Intelligent Systems, SCIS 2012 and 13th International Symposium on Advanced Intelligence Systems, ISIS 2012
AU - Song, Peiyou
AU - Shu, Anhei
AU - Phipps, David
AU - Tiwari, Mohit
AU - Wallach, Dan S.
AU - Crandall, Jedidiah R.
AU - Luger, George F.
PY - 2012
Y1 - 2012
N2 - This paper explores two separate questions: Can we perform natural language processing tasks without a lexicon?; and, Should we? Existing natural language processing techniques are either based on words as units or use units such as grams only for basic classification tasks. How close can a machine come to reasoning about the meanings of words and phrases in a corpus without using any lexicon, based only on grams? Our own motivation for posing this question is based on our efforts to find popular trends in words and phrases from online Chinese social media. This form of written Chinese uses so many neologisms, creative character placements, and combinations of writing systems that it has been dubbed the 'Martian Language.' Readers must often use visual queues, audible queues from reading out loud, and their knowledge and understanding of current events to understand a post. For analysis of popular trends, the specific problem is that it is difficult to build a lexicon when the invention of new ways to refer to a word or concept is easy and common. For natural language processing in general, we argue in this paper that new uses of language in social media will challenge machines' abilities to operate with words as the basic unit of understanding, not only in Chinese but potentially in other languages.
AB - This paper explores two separate questions: Can we perform natural language processing tasks without a lexicon?; and, Should we? Existing natural language processing techniques are either based on words as units or use units such as grams only for basic classification tasks. How close can a machine come to reasoning about the meanings of words and phrases in a corpus without using any lexicon, based only on grams? Our own motivation for posing this question is based on our efforts to find popular trends in words and phrases from online Chinese social media. This form of written Chinese uses so many neologisms, creative character placements, and combinations of writing systems that it has been dubbed the 'Martian Language.' Readers must often use visual queues, audible queues from reading out loud, and their knowledge and understanding of current events to understand a post. For analysis of popular trends, the specific problem is that it is difficult to build a lexicon when the invention of new ways to refer to a word or concept is easy and common. For natural language processing in general, we argue in this paper that new uses of language in social media will challenge machines' abilities to operate with words as the basic unit of understanding, not only in Chinese but potentially in other languages.
UR - http://www.scopus.com/inward/record.url?scp=84877798248&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84877798248&partnerID=8YFLogxK
U2 - 10.1109/SCIS-ISIS.2012.6505413
DO - 10.1109/SCIS-ISIS.2012.6505413
M3 - Conference contribution
AN - SCOPUS:84877798248
SN - 9781467327428
T3 - 6th International Conference on Soft Computing and Intelligent Systems, and 13th International Symposium on Advanced Intelligence Systems, SCIS/ISIS 2012
SP - 11
EP - 15
BT - 6th International Conference on Soft Computing and Intelligent Systems, and 13th International Symposium on Advanced Intelligence Systems, SCIS/ISIS 2012
Y2 - 20 November 2012 through 24 November 2012
ER -