About this challenge

We introduce the first multi-view layout estimation challenge, where the goal is to encourage the community to develop solutions for room layout estimation which only use sequences of registered 360-images as unique input.

Unlike most existing layout datasets MP3D [1], Zillow [2], this challenge provides abundant multi-view images within more complex scenes with larger rooms, more rooms per scene, and more non-Manhattan spaces, which brings new challenges for room layout estimation without using labeled data.

This challenge is part of the Omnidirectional Computer Vision (OmniCV) workshop at CVPR'23.

Why multi-view for layout estimation?

Using a multi-view setting offers an accessible and intuitive constraint that has been broadly used in many Computer Vision solutions like SfM (COLMAP) [3], and SLAM [4], among others. Although significant progress has been made in multi-view layout estimation recently [5] [6] [7], leveraging more than two pairs of views efficiently is still unexplored until now.

We believe there is still a need for room layout solutions that leverage multi-view settings robustly and efficiently. Additionally, we assert that an unlabeled multi-view setting may bring important discussions concerning self-training, self-supervision, domain adaptation, fine tuning, etc., not only for layout geometries but other geometry tasks too.

Leaderboard

Dataset

We include the following two datasets in this competition.

  • MP3D-FPE is a synthetic dataset proposed in 360-DFPE [8], collected from Matterport3D [9] dataset using MINOS [10] simulator, with 50 scenes and 687 rooms in total. For this competition, only RGB images and their registered camera poses are released. In total, we include 20k and 2.2k samples for training and testing, respectively.
  • HM3D-MVL is a newly collected dataset from HM3D [11] dataset using Habitat [12] simulator. Similar to MP3D-FPE, we imitate users scanning behavior to collect more realistic camera movements. In total, we include 20k and 2.2k samples for training and testing, respectively.

We provide a toolkit to make data processing easier, allowing participants to download and use dataset splits, register frames by camera poses, and format the layout estimation results for the challenge submission.

Note that the dataset is for non-commercial academic use only and is protected by the Matterport Open-Source License.

Challenge

  • Our competition will be held in EvalAI.
  • Participants will have access to the following data: training and testing warm-up, training and testing challenge, and pilot splits. All splits will be provided in the form of multiple 360-images per room, registered by camera poses without layout annotations, except for the pilot split.
  • The training and testing warm-up splits will be released on the warm-up phase opening for evaluation in EvalAI with unlimited number of submissions.
  • The training and testing challenge splits will be released later on the challenge phase opening. For this phase, each participant will have limited number of submission per day.
  • For reference purpose, we release and evaluate the 360-MLC and HorizonNet as baselines within this competition.
  • The winner will be selected by the best average score evaluated on the challenge set. For more details please refer to Evaluation.

Rules

  • To participate in the competition, individuals and teams (unlimited size) should fill out this registration form.
  • In total, each person/team is limited up to 3 submissions per day during the challenge phase period.
  • Each person/team has unlimited number of submission during the warm-up phase period.
  • Freely and publicly available external data is allowed, including pre-trained models, and other datasets.
  • There are no limits on training time or network capacity.
  • Participant are required to submit a report of minimum one page long, describing the algorithms, and details of the procedures used in their final submission.
  • For more information see Terms and Conditions.

The Prize

The first-place winner will receive a USD 1,000 prize. Please note that only one cash prize will be awarded to each registered team. For more information see Terms and Conditions.

Timeline

  • Competition Begins - March 20, 2023
  • Warm-up Phase Opening - March 20, 2023
  • Challenge Phase Opening - May 1, 2023.
  • Challenge Phase Deadline - June 2, 2023
  • Winner informed - June 6, 2023 (Zhijie Shen from BJTU)
  • Winner presentation - June 19, 2023
  • All deadlines are at 11:59 PM CST on the corresponding day unless otherwise noted. The competition organizers reserve the right to update the contest timeline if they deem it necessary.

Community

For public queries, discussion, and free access tutorials please join us in our Slack workspace. For private queries, please send an email to enrique.solarte.nthu@gapp.nthu.edu.tw

References

  1. Zou, Chuhang, et al. "Manhattan Room Layout Reconstruction from a Single 360 ∘ 360∘ Image: A Comparative Study of State-of-the-Art Methods." International Journal of Computer Vision 129 (2021): 1410-1431.
  2. Cruz, Steve, et al. "Zillow indoor dataset: Annotated floor plans with 360deg panoramas and 3d room layouts." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
  3. Schonberger, Johannes L., and Jan-Michael Frahm. "Structure-from-motion revisited." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  4. Cadena, Cesar, et al. "Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age." IEEE Transactions on robotics 32.6 (2016): 1309-1332.
  5. Solarte, B., Wu, C. H., Liu, Y. C., Tsai, Y. H., & Sun, M. "360-MLC: Multi-view Layout Consistency for Self-training and Hyper-parameter Tuning." In Advances in Neural Information Processing Systems, 2022.
  6. Hutchcroft, Will, et al. "CoVisPose: Co-visibility Pose Transformer for Wide-Baseline Relative Pose Estimation in 360∘ Indoor Panoramas." Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII. Cham: Springer Nature Switzerland, 2022.
  7. Su, Jheng-Wei, et al. "GPR-Net: Multi-view Layout Estimation via a Geometry-aware Panorama Registration Network." arXiv preprint arXiv:2210.11419 (2022).
  8. Solarte, B., Liu, Y. C., Wu, C. H., Tsai, Y. H., & Sun, M. "360-DFPE: Leveraging Monocular 360-Layouts for Direct Floor Plan Estimation." IEEE Robotics and Automation Letters, 2022.
  9. A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, Y. Zhang. "Matterport3D: Learning from RGB-D Data in Indoor Environments." International Conference on 3D Vision (3DV 2017)
  10. Manolis Savva, Angel X. Chang, Alexey Dosovitskiy, Thomas Funkhouser, Vladlen Koltun. "MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments." arXiv:1712.03931 (2017)
  11. Santhosh Kumar Ramakrishnan, et al. "Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI." Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021.
  12. Savva, M., Kadian, A., Maksymets, O., Zhao, Y., Wijmans, E., Jain, B., ... & Batra, D. "Habitat: A Platform for Embodied AI Research." In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.