Robo-ABC : Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation

Yuanchen Ju^1,2* Kaizhe Hu^1,2,5* Guowei Zhang^2,3 Gu Zhang^1,4,2 Mingrun Jiang² Huazhe Xu^2,5,1

¹Shanghai Qi Zhi Institute ²IIIS, Tsinghua University ³School of Software, Tsinghua University

⁴Shanghai Jiao Tong University ⁵Shanghai AI Lab

💐 ECCV 2024 🐱

*Equal contribution

Check out our latest paper 🌟 ICLR 2025 Spotlight: DenseMatcher: Learning 3D Semantic Correspondence for Category-Level Manipulation from One Demo

Abstract

Enabling robotic manipulation that generalizes to out-of-distribution scenes is a crucial step toward open-world embodied intelligence. For human beings, this ability is rooted in the understanding of semantic correspondence among objects, which naturally transfers the interaction experience of familiar objects to novel ones. Although robots lack such a reservoir of interaction experience, the vast availability of human videos on the Internet may serve as a valuable resource, from which we extract an affordance memory including the contact points.

Inspired by the natural way humans think, we propose Robo-ABC. Through our framework, robots can generalize to manipulate out-of-category objects in a zero-shot manner without any manual annotation, additional training, part segmentation, pre-coded knowledge, or viewpoint restrictions. Quantitatively, Robo-ABC significantly enhances the accuracy of visual affordance retrieval by a large margin of 31.6% compared to state-of-the-art end-to-end affordance models. We also conduct real-world experiments of cross-category object-grasping tasks. Robo-ABC achieved a success rate of 85.7%, proving its capacity for real-world tasks.

Real World Deployment Video

We demonstrated the generalization capabilities of Robo-ABC across object categories and different viewpoints in the real world. We select the grasp pose from all the possible poses which are generated by AnyGrasp to deploy on real robots.

Our Pipeline

Affordance Generalization Beyond Categories Visualization

We aim to showcase our method’s ability to generalize the affordance of a small group of seen objects to various objects beyond its category. To this end, we fix a category of source images and provide the contact points derived from human videos. For each object of the other category, we use the same semantic correspondence setting of Robo-ABC, then obtain the target affordance.

In each group of figures from left to right, the span of object categories gradually increases. represents the contact points extracted from human videos, while represents the inferred points found by Robo-ABC across object categories.

Zero-shot Affordance Generalization Visualization

We demonstrate the performance of Robo-ABC and other baselines across various object categories within the entire evaluation dataset. As can be seen, in the vast majority of cases, Robo-ABC exhibits superior zero-shot generalization capabilities.

Robot experiments under cluster setting and cross-view setting

We showcased the deployment pipeline of Robo-ABC in the real world

Contact

If you have any questions, please feel free to contact us:

Yuanchen Ju: juuycc0213@gmail.com

Citation

If you find this project helpful, please cite us:

@inproceedings{ju2025robo, title={Robo-abc: Affordance generalization beyond categories via semantic correspondence for robot manipulation}, author={Ju, Yuanchen and Hu, Kaizhe and Zhang, Guowei and Zhang, Gu and Jiang, Mingrun and Xu, Huazhe}, booktitle={European Conference on Computer Vision}, pages={222--239}, year={2025}, organization={Springer} }