ecifically, the proposed work is on developing new methodologies and algorithms for integration of low-level visual data and high-level knowledge for video-based activity/event recognition, which is a key capability of many video-based automated systems. Many long-standing challenges in vision-based activity/event recognition still remain to be solved. Among others, one prominent challenge is how to construct vision models that can effectively encode the domain knowledge about the events/activities under consideration. As a result, most existing approaches typically assume some simplistic models such as a classifier for mapping visual data to a set of action labels. In such a scheme, the task effectively becomes only learning the parameters of the classifier, with little room for explicit utilization of the domain knowledge that is supposed to be the key in solving the problem. For example, in a couple of most recent methods, a support vector machine is trained (Hoai et al 2012) and a simple linear model is assumed for pedestrian dynamics (Zhou et al 2012). Such simplistic models, while useful for regression or classification, do not readily support more sophisticated analysis like recognizing the plans of the agents, which may be critical for understanding complex events/activities. Understanding human activity from sensory data has a lot of applications. A video-based approach to this problem has many potential advantages, such as being able to work with uncooperative human subjects (e.g. not requiring them to wear sensors), and not requiring a constrained environment (e.g. not confined to a smart environment equipped with abundant ambient sensors). Additionally, automating video-based understanding of human activity can significantly increase the overall autonomy of robotic agents that are deployed in field scenarios as parts of a co-operative team. Consider for example reconnaissance scenarios where humanrobot teams work towards gathering information related to a set of objectives, with the robot acting as an assistant to the human commander. Such scenarios are fast emerging in fields ranging from space travel and exploration to urban search and rescue, and surveillance. Endowing the robotic agent with the capability to process the incoming video data from the scene using a partial model of its own enables the recognition of intermediate, smaller-scale objectives that humans may not be adept at making explicit. This in turn empowers the robot to make preparations to assist with the course of action that the commander may be pursuing, without the need for setting up costly communication protocols, in addition to the act itself.. The ultimate goal of the project is to develop a systematic approach to video-based action recognition that can overcome significant challenges identified in Section 1. The key idea is to develop new methods and algorithms that can support both learning activity models and selecting most effective features via semantic feedback from a high-level planning and plan recognition module that can deal with incomplete models. This demands tight integration of lowlevel visual data and high-level knowledge. To this end, three technical objectives are defined: 1. To develop a general framework for action modeling that supports fusion of low-level, feature-driven tasks (like segmentation) and high-level reasoning (like goal-driven plan recognition) in interpreting action units for activity recognition. 2. To develop specific learning and inference algorithms, under the general framework, that support update of initial partial models and selection of features. 3. To evaluate the framework and algorithms in the domain of video-based action recognition, with a focus on supporting learning with partially-labeled data from on-line sources.
|Effective start/end date||8/1/14 → 7/31/17|
- DOD-ARMY-ARL: Army Research Office (ARO): $295,724.00