I am a technical staff at Beijing Academy of Artificial Intelligence (BAAI), the vision and multimodal research center (a.k.a. BAAI Vision). I received my M.S. in EECS from Peking University. My research interests focus on multimodal foundation models and efficient scaling technologies.
Breakthrough in multimodal scaling laws and native unified generation/understanding across text, image, and video modalities through next-token prediction.
Loading...
Quan Sun*, Jinsheng Wang*, Qiying Yu*, Yufeng Cui, Fan Zhang, Xiaosong Zhang, Xinlong Wang
The largest open-source CLIP model demonstrating efficient weak-to-strong scaling methodologies, achieving exceptional 80.7% zero-shot top-1 accuracy averaged across 27 widely recognized image classification benchmarks with optimized training strategies.
Loading...
ICLR 2024 (Spotlight) - Pioneering scaling roadmap for 3D foundation models, establishing the first billion-parameter 3D understanding framework.
Loading...
CVPR 2022 - Efficient lane detection achieving SOTA accuracy and high FPS. Successfully deployed in production with automotive industry partners (Bosch, Horizon Robotics). Demo available with YOLO integration.
Loading...
Experience
Beijing Academy of Artificial Intelligence (BAAI)
Technical Staff
Multimodal Foundation Model Research Center
Jul 2023 - Present
Microsoft Research Asia (MSRA)
Research Intern
Visual Computing Group
Jun 2022 - Dec 2022
Baidu Research
Research Intern
Robotics and Autonomous Driving Lab (RAL)
Dec 2021 - May 2022
SenseTime
Research Intern
Intelligent Vehicle Group
Apr 2021 - Nov 2021
DJI
Research Intern
Robotics Department
Sep 2019 - Mar 2020
Selected Awards
[2019] National Scholarship
[2019] RoboMaster Robotics Competition - World Finals Special Prize (Rank 3/173) (Team Lead)