Activity recognition has become a very active area in the past few years. It is a challenging problem that has attracted the attention of the research community due to its practical and real-world applications, such as human computer interfaces, content based video indexing, video surveillance and robotics, among others. A definition for such task can be described as labeling video segments containing human motion with activity classes. For instance, we can define an activity as a composition of one or more actions organized temporally.
Basically, the literature divides the activity recognition task on three main steps: (i) data representation (feature extraction), allowing the image/video content to be represented in a more discriminative space rich enough to allow a proper recognition; (ii) activity segmentation, producing atomic movements by identifying suitable break points resulting into segments. These segments could be used to describe the action as a whole or even the task to find the spatial and temporal location of the action; and (iii) activity classification, which the purpose is to learn a function that can assign (discrete) labels to the images/videos.