Carlos Antônio Caetano Júnior
Doutor em Ciência da Computação pela Universidade Federal de Minas Gerais (UFMG). Desenvolveu parte dos estudos do doutorado no Centre de Recherche INRIA Sophia Antipolis, França (bolsa CNPq), como pesquisador no time STARS (sob orientação do Dr. François Brémond). Mestre em Ciência da Computação pela Universidade Federal de Minas Gerais (UFMG). Bacharel em Sistemas de Informação pela Pontifícia Universidade Católica de Minas Gerais (PUC Minas). Possui experiência de pesquisa em visão computacional, vigilância inteligente e aprendizado de máquina, com foco no reconhecimento padrões visuais.
Tese de doutorado
In this dissertation we propose four different representations based on motion information for activity recognition. The first is a spatiotemporal local feature descriptor that extracts a robust set of statistical measures to describe motion patterns. This descriptor measures meaningful properties of co-occurrence matrices and captures local space-time characteristics of the motion through the neighboring optical flow magnitude and orientation. The second, is the proposal of a compact novel mid-level representation based on co-occurrence matrices of codewords. This representation expresses the distribution of the features at a given offset over feature codewords from a pre-computed codebook and encodes global structures in various local region-based features. The third representation, is the proposal of a novel temporal stream for two-stream convolutional networks that employs images computed from the optical flow magnitude and orientation to learn the motion in a better and richer manner. The method applies simple non-linear transformations on the vertical and horizontal components of the optical flow to generate input images for the temporal stream. Finally, the forth is a novel skeleton image representation to be used as input of convolutional neural networks (CNNs). The proposed approach encodes the temporal dynamics by explicitly computing the magnitude and orientation values of the skeleton joints. Moreover, the representation has the advantage of combining the use of reference joints and a tree structure skeleton, incorporating different spatial relationships between the joints and preserving important spatial relations. The experimental evaluations carried out on challenging well-known activity recognition datasets (KTH, UCF Sports, HMDB51, UCF101, NTU RGB+D 60 and NTU RGB+D 120) demonstrated that the proposed representations achieved better or similar accuracy results in comparison to the state of the art, indicating the suitability of our approaches as video representations.