Scene Dynamics Analysis

A scene is always dynamic. It is important for the autonomous agents to be able to detect objects and recognise their moving patterns in both 2D and 3D. As a fundamental task, I have been working on multiple object detection and tracking in challenging scenarios, in particular in the direction of unsupervised domain adaptation to ease the deployment at any setting, via the EU-funded project PROTECTOR. Research on this can also benefit other projects for social good, such as Social Distancing Monitoring whose full details can be found below.

Visual Metric Inference for Social Distancing

The worldwide pandemic emergency has reshaped the daily-life social bahaviours with constraints, among which, maintaining physical distance between people is imposed as an effective measure to reduce the virus spread. In this project, we addressed the problem of Visual Social Distancing, i.e. infer the pairwise metric distance among detected persons, given a single uncalibrated image in unconstrained scenarios. Our solution achieves the state of the art performance and is one of the earliest ones with published implementations.

This project was a collaborative research project during my stay in PAVIS/VGM, IIT.

Related Publications

Single Image Human Proxemics Estimation for Visual Social Distancing

M. Aghaei, M. Bustreo, Y. Wang, P. Morerio, A. Del Bue, WACV, Waikoloa, US, Jan 2021 [Paper] [Code]

In this work, we propose a semi-automatic solution to approximate the homography matrix between the scene ground and image plane. With the estimated homography, we then leverage an off-the-shelf pose detector to detect body poses on the image and to reason upon their inter-personal distances using the length of their body-parts. Inter-personal distances are further locally inspected to detect possible violations of the social distancing rules.

End-to-end pairwise human proxemics from uncalibrated single images

P. Morerio, M. Bustreo, Y. Wang, A. Del Bue, ICIP, Anchorage, US, Sep 2021 [Paper]

In this work, we estimate the pairwise metric distances between people using only a single uncalibrated image. We propose an end-to-end model, DeepProx, that takes as inputs two skeletal joints as a set of 2D image coordinates and outputs the metric distance between them. We show that an increased performance is achieved by a geometrical loss over simplified camera parameters provided at training time. Further, DeepProx achieves a remarkable generalisation over novel viewpoints through domain generalisation techniques.