Zanyi Wang

I am a MS student in Electrical and Computer Engineering (ECE) at the University of California, San Diego (UCSD).

Previously, I received my B.Eng. in Automation Engineering from Xi’an Jiaotong University (XJTU). I also spent a wonderful time as an exchange student at CUHK.

My research interests lie in Video Understanding and Generative Modeling.

Email  /  Github

profile photo
News

  • [2026-02] One paper accepted to CVPR 2026!
  • [2026-01] One paper accepted to ICLR 2026!
  • [2025-07] One paper accepted to ACM MM 2025!
  • Selected Publications (* denotes equal contribution)

    FlowRVS Teaser
    RefTON: Person-to-Person Virtual Try-On with Unpaired Visual References
    Liuzhuozheng Li, Yue Gong, Shanyuan Liu, Zanyi Wang, Dengyang Jiang, Liebucha Wu, Bo Cheng, YuhangMa, Dawei Leng, Yuhui Yin
    Conference on Computer Vision and Pattern Recognition (CVPR), 2026
    code / arXiv

    An End-to-End Virtual Try-on model that directly fits the target garment onto the person image.

    FlowRVS Teaser
    Deforming Videos to Masks: Flow Matching for Referring Video Segmentation
    Zanyi Wang, Dengyang Jiang, Liuzhuozheng Li, Sizhe Dang, Chengzu Li, Harry Yang, Guang Dai, Mengmeng Wang, Jingdong Wang
    International Conference on Learning Representations (ICLR), 2026
    code / arXiv / 机器之心

    Reformulated RVOS as learning a continuous, text-conditioned flow that deforms a video's content into its target mask.

    FlowRVS Teaser
    TriCLIP-3D: A Unified Parameter-Efficient Framework for Tri-Modal 3D Visual Grounding based on CLIP
    Fan Li, Zanyi Wang*, Zeyi Huang, Guang Dai, Jingdong Wang, Mengmeng Wang
    ACM International Conference on Multimedia (ACM MM), 2025
    arXiv

    Developed a unified framework leveraging CLIP’s ViT encoder for efficient tri-modal 3D visual grounding.

    FlowRVS Teaser
    Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning
    Chengzu Li, Zanyi Wang*, Jiaang Li, Yi Xu, Han Zhou, Huanyu Zhang, Ruichuan An, Dengyang Jiang, Zhaochong An, Ivan Vulić, Serge Belongie, Anna Korhonen
    Preprint, 2026
    arXiv

    Exploring how visual context enhances video reasoning capabilities.

    FlowRVS Teaser
    Distribution Matching Distillation Meets Reinforcement Learning
    Dengyang Jiang, Dongyang Liu, Zanyi Wang, Qilong Wu, Liuzhuozheng Li, Hengzhuang Li, Xin Jin, David Liu, Zhen Li, Bo Zhang, Mengmeng Wang, Steven Hoi, Peng Gao, Harry Yang
    Preprint, 2026
    code / arXiv

    Showing that DMD and RL can be trained simultaneously, with RL enabling the student model to surpass the teacher and DMD loss regularizing RL to prevent reward hacking.


    Design and source code from Jon Barron's website.