Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views

Download as .zip Download as .tar.gz View on GitHub

Created by Hao Su, Charles R. Qi, Yangyan Li, Leonidas J. Guibas from Stanford University.


Our work was initially described in an arXiv tech report and will appear as an ICCV 2015 paper. Render for CNN is a scalable image synthesis pipeline for generating millions of training images for high-capacity models such as deep CNNs. We demonstrated how to use this pipeline, together with specially designed network architecture, to train CNNs to learn viewpoints of objects from millions of synthetic images and real images. In this repository, we provide both the rendering pipeline codes and off-the-shelf viewpoint estimator for PASCAL3D+ objects.


Render for CNN is released under the MIT License (refer to the LICENSE file for details).

Citing Render for CNN

If you find Render for CNN useful in your research, please consider citing:

    Title={Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views},
    Author={Su, Hao and Qi, Charles R. and Li, Yangyan and Guibas, Leonidas J.},
    Booktitle={The IEEE International Conference on Computer Vision (ICCV)},
    month = {December},
    Year= {2015}


  1. Render for CNN Image Synthesis Pipeline
  2. Off-the-shelf Viewpoint Estimator
  3. Testing on VOC12 val
  4. Training your Own Models

Render for CNN Image Synthesis Pipeline


  1. Blender (tested with Blender 2.71 on 64-bit Linux). You can get it from Blender website for free.

  2. MATLAB (tested with 2014b on 64-bit Linux). You also need to compile the external kde package in render_pipeline/kde/matlab_kde_package by following the README.txt file in that folder.

  3. Datasets (ShapeNet, PASCAL3D+, SUN2012) [not required for the demo]. If you already have the same datasets (as in urls specified in the shell scripts) downloaded, you can build soft links to the datasets with the same pathname as specified in the shell scripts. Otherwise, just do the following steps under project root folder:

    bash dataset/
    bash dataset/
    bash dataset/

Set up paths

All data and code paths should be set in We have provided you an example version You only need to copy or rename the example file and modify the Blender and MATLAB path in it (in default the paths are set to blend and matlab). All other paths are relative to the project root folder and should be fine.


After setting Blender and MATLAB paths in, run script to set up MATLAB global variable file.


Demo of synthesis pipeline

This small demo at demo_render shows how we get cropped, background overlaid images of objects from a 3D model. It also helps verity that you have all enviroment set up. To run the demo, cd into project root folder and follow steps below.

cd demo_render

Running large scale synthesis

  1. Estimate viewpoint and truncation distributions with KDE (kernal density estimation). If you haven't compiled the kde package, go to render_pipeline/kde/matlab_kde_package/mex, open MATLAB and run makemex in MATLAB to generate mex files.

    cd render_pipeline/kde

    Open MATLAB and run the following command (expect to see plots popping up). Viewpoint and truncation statistics will be saved to data/view_statistics and data/truncation_statistics. Samples generated from estimated distrubtion will be saved to data/view_distribution and truncation_distribution.

  2. Render images with Blender This step is computationally heavy and may take a long time depending how powerful your computers are. It takes us around 8 hours to render 2.4M images on 6 multi-core servers. If you have multiple servers with shared filesystem, you can set g_hostname_synset_idx_map in accordingly. Note that currently models are directly from ShapeNet, deformed models will be released separately later.

    python render_pipeline/

    You can stop rendering at any time and execute following commands to crop and overlay background on images that have already been rendered. In default, rendered images will be saved at data/syn_images.

  3. Crop images This step is IO heavy and it takes around 1~2 hours on a multi-core server. SSD or high-end HDD disk could help a lot. In default, cropped images are saved to data/syn_images_cropped.

    python render_pipeline/
  4. Overlay backgrounds Time consumption is similar to cropping step above. In default, background overlaid (also cropped from step above) images are saved to data/syn_images_cropped_bkg_overlaid.

    python render_pipeline/

    If you'd like to get a file containing all synthesized image filenames and their labels (class, azimuth, elevation, tilt angles), we have some helper functions for that - just go look at get_one_category_image_label_file and combine_files in view_estimation/, also refer to view_estimation/ for usage examples.

Off-the-shelf Viewpoint Estimator


  1. Caffe (with pycaffe compiled). For testing we support the new caffe interface and prototxt files (which uses "layer" instead of "layers" in prototxt file). You can follow this webpage for installation details.

  2. Download our pre-trained caffe model (~390MB). The model was trained on rendered images and VOC12 train set real images.

    cd caffe_models

Set up paths

The steps are the same as above in Render for CNN Image Synthesis Pipeline.

Demo of 3D viewpoint estimator

This demo at demo_view shows how one can use our off-the-shelf viewpoint estimator. To estimate viewpoint of an example image of airplane, do the following.

cd demo_view

To visualize the estimated 3D viewpoint, run and see a rendered image of the viewpoint.


Testing on VOC12 val


  1. Caffe with python interface and pretrained caffe mdoel - as in requirement in Off-the-shelf Viewpoint Estimator.

  2. PASCAL3D+ dataset - if you haven't downloaded it. It will be used for preparing test images and evaluation.

    bash dataset/
  3. MATLAB for preparing test images.

Set up paths

The steps are the same as above in Render for CNN Image Synthesis Pipeline.

Evaluation of AVP-NV, Acc-pi/6 and MedErr

For AVP-NV (Average Viewpoint Precision), both localization (from R-CNN) and viewpoint estimation (azimuth) are evaluated. For Acc-\pi/6 and MedErr, we evaluate on VOC12 val images without truncations and occlusions. For more details on definition of the metrics, please refer to the paper.

Firstly, we need to prepare the testing images from VOC12, by running:

python view_estimation/

Then, do evaluation by running:

python view_estimation/

Results are displayed on screen and saved to view_estimation/avp_test_results/avp_nv_results.txt and view_estimation/vp_test_results/acc_mederr_results.txt

Training your Own Models

to be updated.