Transferring Dense Pose to Proximal Animal Classes

In CVPR 2020

arXiv preprint

*Done during an internship at FAIR.

Pipeline: We consider the problem of dense pose labelling in animal classes. We show that, for proximal to humans classes such as chimpanzees (left), we can obtain excellent performance by learning an integrated recognition architecture from existing data sources, including DensePose for humans as well as detection and segmentation information from other COCO classes (right). The key is to establish a common reference (middle), which we obtain via alignment of the reference models of the animals. This enables training a model for the target class without having to label a single example image for it.


Recent contributions have demonstrated that it is possible to recognize the pose of humans densely and accurately given a large dataset of poses annotated in detail. In principle, the same approach could be extended to any animal class, but the effort required for collecting new annotations for each case makes this strategy impractical, despite important applications in natural conservation, science and business. We show that, at least for proximal animal classes such as chimpanzees, it is possible to transfer the knowledge existing in dense pose recognition for humans, as well as in more general object detectors and segmenters, to the problem of dense pose recognition in other classes. We do this by (1) establishing a DensePose model for the new animal which is also geometrically aligned to humans (2) introducing a multi-head R-CNN architecture that facilitates transfer of multiple recognition tasks between classes, (3) finding which combination of known classes can be transferred most effectively to the new animal and (4) using self-calibrated uncertainty heads to generate pseudo-labels graded by quality for training a model for this class. We also introduce two benchmark datasets labelled in the manner of DensePose for the class chimpanzee and use them to evaluate our approach, showing excellent transfer learning performance.



Predicted body part segmentation and uv-maps for chimps

Comparison of teacher and student model

Visual results: (left) teacher network predictions vs (right) predictions of student network trained using I-sampling. The student produces more accurate boundaries and uv-maps. Zoom-in for details.


  title={Transferring Dense Pose to Proximal Animal Classes},
  author={Artsiom Sanakoyeu and Vasil Khalidov and Maureen S. McCarthy
          and Andrea Vedaldi and Natalia Neverova},