Junyang Wang (王君阳)

Email: junyangwang@bjtu.edu.cn; junyangwang287@gmail.com

I am a research intern at Tongyi AI Lab of Alibaba Group.

I am a Ph.D candidate in the School of Computer and Information Technology, BJTU and work with Prof. Jitao Sang.

My current research content is Multi-modal Large Language Model (MLLMs), including MLLMs hallucination and MLLM-based agent. In addition, I have also studied Vision-Language Pre-training (VLP) and social fairness in computer vision.

Recent News

* [09.2024] Our paper Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation has been accepted by NeurIPS 2025.

* [08.2025] We have released Mobile-Agent-v3. It achieves SOTA performance on 10 GUI benchmarks. The Mobile-Agent repository has received 5k+ stars.

* [03.2025] Our paper PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC has been accepted by ICLR 2025 Workshop

* [09.2024] Our paper Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration has been accepted by NeurIPS 2024.

* [07.2024] Our work Mobile-Agent won the best demo award at the The 23rd China National Conference on Computational Linguistics (CCL 2024).

* [03.2024] Our paper Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception has been accepted by ICLR 2024 Workshop

* [07.2023] Our paper Improved Visual Fine-tuning with Natural Language Supervision has been accepted by ICCV 2023 Oral.

* [04.2023] Our paper From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping has been accepted by IJCAI 2023.

* [10.2022] I joined Intelligent Computing of Alibaba Group, Ltd as a research intern.

* [06.2022] Our paper Counterfactually Measuring and Eliminating Social Bias in Vision-Language Pre-training Models has been accepted by MM 2022.

Publications

See Google scholar

Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang.

Annual Conference on Neural Information Processing Systems. NeurIPS 2024 (CCF-A).
Improved Visual Fine-tuning with Natural Language Supervision
Junyang Wang, Yuanhong Xu, Juhua Hu, Jitao Sang, Qi Qian.

IEEE/CVF International Conference on Computer Vision. ICCV 2023 Oral (CCF-A).
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping
Junyang Wang, Ming Yan, Yi Zhang, Jitao Sang.

International Joint Conference on Artificial Intelligence. IJCAI 2023 (CCF-A).
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Junyang Wang, Haiyang Xu, Jiabo Ye, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang.

ICLR 2024 Workshop on Large Language Model (LLM) Agents. ICLR 2024 Workshop.
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation
Yuyang Wanyan, Xi Zhang, Haiyang Xu, Haowei Liu, Junyang Wang, Jiabo Ye, Yutong Kou, Ming Yan, Fei Huang, Xiaoshan Yang, Weiming Dong, Changsheng Xu.

Annual Conference on Neural Information Processing Systems. NeurIPS 2025 (CCF-A).
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
Haowei Liu, Xi Zhang, Haiyang Xu, Yuyang Wanyan, Junyang Wang, Ming Yan, Ji Zhang, Chunfeng Yuan, Changsheng Xu, Weiming Hu, Fei Huang.

ICLR 2025 Workshop on LLM Reason and Plan. ICLR 2025 Workshop.
Benign Shortcut for Debiasing: Fair Visual Recognition via Intervention with Shortcut Features
Yi Zhang, Jitao Sang, Junyang Wang, Dongmei Jiang, Yaowei Wang.

ACM International Conference on Multimedia. ACM MM 2023 (CCF-A).
mPLUG-Octopus: The Versatile Assistant Empowered by A Modularized End-to-End Multimodal LLM
Qinghao Ye, Haiyang Xu, Ming Yan, Chenlin Zhao, Junyang Wang, Xiaoshan Yang, Ji Zhang, Fei Huang, Jitao Sang, Changsheng Xu.

ACM International Conference on Multimedia. ACM MM 2023 (CCF-A).
Counterfactually Measuring and Eliminating Social Bias in Vision-Language Pre-training Models
Yi Zhang, Junyang Wang, Jitao Sang.

ACM International Conference on Multimedia. ACM MM 2022 (CCF-A).

Experience/Education

Research Intern, Tongyi AI Lab of Alibaba Group. 2022.10 - Now
Ph.D (computer science), Beijing Jiaotong University . 2023.9 - Now
M.S (computer science), Beijing Jiaotong University . 2021.9 - 2023.6
B.S (computer science), Beijing Jiaotong University . 2017.9 - 2021.6