Integrating Machine Learning and Knowledge Representation for Discovery of Social Goals of Groups and Group Members from their Language Usage.

Project: Research project


A. Innovative claims of the proposed research The overall goal of the SCIL program is to help analysts identify goals and motivations of groups and members of groups by providing them with understanding of the culture-specific social dynamics of members of a group. Such understanding will require developing correlations (and if possible, causal relations) between goals of groups and group members and language features in documents and interactions between the members of the group and using these correlations together with relevant sociolinguistic theories and other data. With respect to the language features, one needs to go beyond the literal meaning of the words and sentences and take into account associated sociolinguistic aspects. Towards the above overall goal of the SCIL program, the goal of our proposed research is to develop software components that will provide analysts with: (a) language features and relationships between these features and culture-specific social phenomena so that the analysts can make conclusions regarding the goals and motivations of members of social groups; and if asked. (b) conclusions regarding the social goals and motivations of members of social groups and explanationsabout how the conclusions were reachedin terms of language features and associated background knowledge. In following our aim we will be guided by the following principles and assumptions: 1. The documents and interactions that the system will analyze may have ungrammatical sentences; as is often the case in blogs, discussions, chats and other interactions. 2. Although we will initially focus on documents and interactions in one or two particular languages, the methodology we develop should be such that it can be easily adapted to additional languages. 3. Similarly, our methodology should be adaptable to other indicators of socio-cultural phenomenon; beyond the ones we initially focus on. 4. The relationships that will be learned should not be black boxes; they should be knowledge in a format that can be understood on their own by people. 5. Hand coding of knowledge is time consuming and thus should only be done for a few cases that have wide applicability; thus a lot of knowledge should be learnt. 6. When making conclusions, the system should be able to give explanations. 7. Conclusions may not always be yes/no type; but may have associated weights or probabilities. The two main components that will be needed are: a learning system and a knowledge representation and reasoning system. We will need a learning system that can learn the relationship between the language indicators and associated social phenomena. This learning system will be provided with annotated examples of text and interactions. Besides sociolinguistic markups, the annotation may include some kind of parsing of the text as well as some semantic mark-ups. The learning system will also be provided with some background knowledge. In addition it may be provided with a (partial) linguistic theory. The system is expected to learn the relationship between language indicators and associated social phenomena.
Effective start/end date8/24/0910/23/11


  • ODNI: Intelligence Advanced Research Projects Activity (IARPA): $1,420,173.00


language usage
group membership