A new doctor in the group: Carlos Antônio Caetano Júnior defended his dissertation

Carlos Antônio Caetano Júnior defended his dissertation “Motion-based Representations for Activity Recognition” and obtained the title of Doctor from the Graduate Program in Computer Science at the Federal University of Minas Gerais (UFMG), with a sandwich period at the Centre de Recherche Inria Sophia Antipolis – Méditerranée (Advisor: François Brémond).

Human activity recognition (HAR) plays a key role in a number of real-world applications, ranging from searching for videos that contain specific activities, to surveillance systems in environments that require a high level of security. As a result, the recognition of activities has become a topic that has been extensively researched by the scientific community. Over the past decade, a significant portion of progress in activity recognition has been achieved thanks to the development of discriminative representations known as feature descriptors. Such representations are generally based on appearance, movement analysis or pose information. Currently, efforts have been directed towards the development of convolutional neural networks for learning such representations. These approaches learn hierarchical layers of representations to perform pattern recognition and have demonstrated effective results in the task of activity recognition.

Four different representations for activity recognition based on movement information were proposed in the dissertation: The first is a spatio-temporal feature descriptor that extracts a robust set of statistical measures to describe motion patterns. This descriptor measures meaningful properties in co-occurrence matrices to capture spatio-temporal characteristics of movement through the neighboring optical flow magnitude and orientation; The second is a new compact mid-level representation based on co-occurrence matrices of codewords. This representation expresses the distribution of the features at a given offset using a pre-computed codebook and encodes global structures in various local region-based features; The third is a new temporal stream for two-stream convolutional networks that is based on images computed from the magnitude and orientation of the optical flow. The method applies non-linear transformations to the vertical and horizontal components of the optical flow to generate input images for the temporal stream; Finally, the fourth is a skeleton image representation to be used as input to convolutional networks. The approach encodes the temporal dynamics by explicitly calculating the magnitude and orientation values ​​of the skeleton joints. This representation has the advantage of combining the use of “reference joints” and a skeleton tree algorithm, incorporating different spatial relationships between the joints and preserving important spatial relationships. 

The experiments carried out in challenging well-known activity recognition datasets (KTH, UCF Sports, HMDB51, UCF101 NTU RGB + D 60 and NTU RGB + D 120) demonstrate that the proposed representations obtained better or similar results compared to the state of art, indicating the suitability of the approaches as video representations.

Examining committee

Professor William Robson Schwartz, coordinator of the Smart Sense Laboratory, supervised Carlos Caetano’s doctoral research. The examining committee was composed of professors Prof. Jefersson Alex dos Santos – Co-supervisor (DCC – UFMG), Prof. Erickson Rangel do Nascimento (DCC – UFMG), Prof. João Paulo Papa (FC – Unesp), Prof. David Menotti Gomes (DInf – UFPR) and Prof. Anderson de Rezende Rocha (IC – UNICAMP).

Researcher’s curriculum

PhD in Computer Science at the Federal University of Minas Gerais (UFMG) and researcher at the Smart Surveillance Interest Group – SSIG / DCC / ICEx / UFMG. He developed part of his doctoral studies at the Centre de Recherche INRIA Sophia Antipolis, France (CNPq scholarship), as a researcher on the STARS team (under the guidance of Dr. François Brémond). Master in Computer Science from the Federal University of Minas Gerais (UFMG). Bachelor of Information Systems from the Pontifical Catholic University of Minas Gerais (PUC Minas). During his master’s degree, he was part of the Nucleus for Digital Image Processing (NPDI / DCC / ICEx / UFMG) as a researcher. During graduation, he was part of the scientific initiation program at PUC Minas as a CNPq fellow, participating as a researcher at the Audio-Visual Information Processing Laboratory (VIPLAB). He has research experience in Computer Vision, Digital Image and Video Processing, Image and Video Classification, Image and Video Character Descriptors, Image Representation and Automatic Video Summarization.