AutoSweep: Recovering 3D Editable Objects
from a Single Photograph

Xin Chen1,  Yuwei Li1,  Xi Luo1,  Tianjia Shao2,  Jingyi Yu1,  Kun Zhou3,  Youyi Zheng3

1ShanghaiTech University,   2University of Leeds,   3Zhejiang University



This paper presents a fully automatic framework for extracting editable 3D objects directly from a single photograph. Unlike previous methods which recover either depth maps, point clouds, or mesh surfaces, we aim to recover 3D objects with semantic parts and can be directly edited. We base our work on the assumption that most human-made objects are constituted by parts and these parts can be well represented by generalized primitives. Our work makes an attempt towards recovering two types of primitive-shaped objects, namely, generalized cuboids and generalized cylinders. Qualitative and quantitative experiments show that our algorithm can recover high quality 3D models and outperforms existing methods in both instance segmentation and 3D reconstruction.

[Paper] [Video] [BibTex]


figure The whole pipeline. Our method takes as input a single photograph and extracts its semantic part masks labeled as cylinder profile, cuboid profile, cylinder body, etc., which are then used in a sweeping procedure to construct a textured 3D model.

figure The network structure. The structure of our GeoNet is composed by an instance segmentation network (Mask R-CNN) and a deformable convolutional network. The net outputs instance masks labeled as semantic parts (profiles, bodies).



You can download our dataset from the Google Drive link or the Onedrive link.

Part 1: Image

This folder in our dataset is including 11657 images with cubes and cylinders. The real dataset contains about 6000 unannotated images from ImageNet, 774 annotated images from Xiao et al., and 4883 images collected from the Internet.

Part 2: Annotation

Annotation of each image by segmentation label methods: We use color to encode the instance and label information.

Example is like below:

Label Color instance ID
Cylinder - top face (10, 10, 200) 1
Cylinder - top face (20, 20, 200) 2
Cylinder - body (10, 0, 200) 1
Cube - top face (10, 10, 255) 1
Cube - body (10, 0, 255) 1
Grip (10, 0, 150) 1

Part 3: ImageSets

This is further separated into 8183 training images and 3474 testing images.


You can visit the Github Page for code.

The code consists of two modules, as mentioned in our paper, the learning module (image to mask) and the graphics module (mask to 3d mesh). The first module follows the framework of FCIS and Mask RCNN. A common learning framework with Python. The second module is built based on Unity3D and our own framework. The purpose of the second module is to sweep the profiles with a dynamic demo.

The code might no longer be maintained. However, we still hope part of our work can give you help or inspiration. If you have any questions, please feel free to ask me.

Code scripts for second module:







Please cite this paper in your publications if it helps your research:

title={AutoSweep: Recovering 3D Editable Objects from a Single Photograph},
author={Xin, Chen and Li, Yuwei and Luo, Xi and Shao, Tianjia and Yu, Jingyi and Zhou, Kun and Zheng, Youyi},
journal={IEEE transactions on visualization and computer graphics},


AutoSweep dataset is freely available for free non-commercial use. For commercial purpose, please email to Xin Chen or Youyi Zheng.