Reprojection Errors as Prompts for Efficient Scene Coordinate Regression

Ting-Ru Liu1, Hsuan-Kung Yang2, Jou-Min Liu1*, Chun-Wei Huang1*,
Tsung-Chih Chiang1*, Quan Kong2, Norimasa Kobori2, Chun-Yi Lee1
1National Tsing Hua University    2Woven by Toyota, Inc.
ECCV 2024

*Indicates Equal Contribution

Abstract

Scene coordinate regression (SCR) methods have emerged as a promising area of research due to their potential for accurate visual localization. However, many existing SCR approaches train on samples from all image regions, including dynamic objects and texture-less areas. Utilizing these areas for optimization during training can potentially hamper the overall performance and efficiency of the model. In this study, we first perform an in-depth analysis to validate the adverse impacts of these areas. Drawing inspiration from our analysis, we then introduce an error-guided feature selection (EGFS) mechanism, in tandem with the use of the Segment Anything Model (SAM). This mechanism seeds low reprojection areas as prompts and expands them into error-guided masks, and then utilizes these masks to sample points and filter out problematic areas in an iterative manner. The experiments demonstrate that our method outperforms existing SCR approaches that do not rely on 3D information on the Cambridge Landmarks and Indoor6 datasets.

Overview

overview

The proposed training framework iteratively samples features to train a scene-specific MLP, which consists of a scene coordinate head and a confidence head. In each iteration, the model is trained for k epochs. During the initial iteration, features are randomly sampled from all parts of images in order to derive the first set of reprojection errors. In subsequent iterations, features are selected based on error-guided feature selection (EGFS) masks generated according to reprojection errors and a confidence map.

Results

Comparision: w/ and w/o EFGS masks

We depict the point clouds reconstructed from scene coordinates of the training sequence. After applying the EGFS masks, the point clouds become clearer and contain less noise.

w/o EGFS masks

w/ EGFS masks

Visualization

visualization

(c) illustrates the point prompts selected from (b) with low reprojection errors, while (d) presents an error-guided mask expanded from the prompted points in (c) using SAM. (f) displays the proposed error-guided feature selection (EGFS), which refines the mask from (d) with the predicted confidence map (e).

BibTeX

@inproceedings{liu2024reprojection,
  title={Reprojection Errors as Prompts for Efficient Scene Coordinate Regression},
  author={Ting-Ru Liu and Hsuan-Kung Yang and Jou-Min Liu and Chun-Wei Huang and Tsung-Chih Chiang and Quan Kong and Norimasa Kobori and Chun-Yi Lee},
  booktitle={ECCV},
  year={2024}
}