Unlike fixed cameras, which are limited to a static view of the scene, active cameras are capable of changing their view by means of, for instance, rotation (pan and tilt) and zoom. Cameras with these particular capabilities are known as PTZ (pan-tilt-zoom) cameras, and are widely employed in surveillance for viewing and tracking under dynamic conditions. By shifting to a target and zooming on it, PTZ cameras can capture fine-grained detail, such as face and hands. However, by focusing on a particular target, the overall view of the scene can be lost. This can be addressed by looking at multiple views, combining both fixed and active cameras. Coordinating these cameras in a robust manner remains a challenging problem.
Camera coordination can be tackled in a robust manner by means of machine learning. This approach learns a mapping from each region on the view of the fixed camera and sets it to a pan, tilt and zoom on the active camera. In this manner, when a target is detected on the scene overview observed by the fixed camera, the active camera can be instructed to provide a high-resolution zoom on that target, since an adequate pan, tilt and zoom for this framing was learned during its training stage. This data-driven approach only requires training data, dispensing more laborious and error-prone tasks such as camera calibration.