Active Visual Search

Active object search aims at the automatic search and localisation of specific objects (highlighted in red) in an environment. The agent is driven by a motion policy based on partially observed scene to visually detect the target object located in the unobserved part of the scene (highlighted in gray area) with the shortest travelled path (highlighted in red), to avoid longer trajectories (in yellow) or missing entirely the target (in black). We propose solutions based on the Partially Observable Monte Carlo Planning (POMCP) framework, to allow training-free online policy learning in any environment.

This project is an ongoing collaborative research project with PAVIS/VGM, IIT and University of Verona.

Related publications

POMP: Pomcp-based Online Motion Planning for active visual search in indoor environments

Y. Wang, F. Giuliari, R. Berra, A. Castellini, A. Del Bue, A. Farinelli, M. Cristani, F. Setti, BMVC, Sep 2020 [Paper]

POMP addresses AVS with POMCP for known environments whose floormaps are known as a priori. POMP uses as input the current pose of an agent (e.g. a robot) and a RGB-D frame, and plans the next move that brings the agent closer to the target object. We model this problem as a Partially Observable Markov Decision Process solved by a Monte-Carlo planning approach. This allows us to make decisions on the next moves by iterating over the known scenario at hand, exploring the environment and searching for the object at the same time, requiring no extensive and expensive (in time and computation) labelled data, thus being very agile in solving AVS in small and medium real scenarios.

POMP++: Pomcp-based Active Visual Search in unknown indoor environments

F. Giuliari, A. Castellini, R. Berra, A. Del Bue, A. Farinelli, M. Cristani, F. Setti, Y. Wang, IROS, Prague, CR, Sep 2021 [Paper]

In this work, we address AVS in unknown indoor environments. We propose POMP++ that introduces a novel formulation on top of the classic Partially Observable Monte Carlo Planning (POMCP) framework, to allow training-free online policy learning in unknown environments. We present a new belief reinvigoration strategy that enables the use of POMCP with a dynamically growing state space to address the online generation of the floor map. We evaluate our method on two public benchmark datasets, AVD that is acquired by real robotic platforms and Habitat ObjectNav that is rendered from real 3D scene scans, achieving the best success rate with an improvement of more than 10% over the state-of-the-art methods.

Spatial Commonsense Graph for Object Localisation in Partial Scenes

F. Giuliari, G. Skender, M. Cristani, Y. Wang, A. Del Bue, CVPR, New Orleans, US, June 2022 [Project] [Code] [Paper]

We solve object localisation in partial scenes, a new problem of estimating the unknown position of an object (e.g. where is the bag?) given a partial 3D scan of a scene. The proposed solution is based on a novel scene graph model, the Spatial Commonsense Graph (SCG), where objects are the nodes and edges define pairwise distances between them, enriched by concept nodes and relationships from a commonsense knowledge base. This allows SCG to better generalise its spatial inference to unknown 3D scenes. The SCG is used to estimate the unknown position of the target object by first predicting pair-wise proximity among objects and then localising the target object based on circular intersection.