CAPE: A CLIP-Aware Pointing Ensemble of Complementary Heatmap Cues for Embodied Reference Understanding

  • Author:

    Eyiokur, F.I.; Yaman, D.; Ekenel, H.K.; Waibel, A.

  • Source:

    Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2026