Bo Zhang received the Ph.D. degree in electronic engineering from the School of Information Science and Technology, Fudan University. He is currently a Research Scientist in Shanghai AI Laboratory. His work has led to many awards, including Shanghai Rising Star under Grant No. 23QD1401000, awarded by the Shanghai Municipal Commission of Science and Technology, the National Scholarship 2020/2021 China Award, the 2019 Excellent Doctoral Scholarship of Fudan University Award, and various awards from VALSE China and Shanghai Government. His research outcomes have some impacts on industrial applications like airport checkpoint security recognition, domain adaptive face recognition, and localization of concealed dangerous objects.
He has published 30+ papers in top-tier international conferences and journals such as CVPR, NeurIPS, ICLR, ICML, ACL, T-PAMI, TIP, TGRS, and IJCV. He also serves as a reviewer for several prestigious academic conferences and journals, including CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, ACL etc. He led the development of the 3DTrans general scene representation open-source project, which won the Waymo Challenge international competition and accumulated over 3k stars. Furthermore, he is committed to exploring the fundamental nature of long-chain reasoning in large models and aims to develop innovator-level agents through reinforcement learning methods and reflection mechanism.
🚀 Join Shanghai AI Lab's Elite Team!
We're recruiting PhDs (2025/2026 intake) & Researcher (March/June 2025/2026 start) to pioneer LLM, Multi-Agent Optimization, and AutoGPT innovations.
👉 Contact now with your CV + research vision: zhangbo@pjlab.org.cn & bo.zhangzx@gmail.com

🔥 Highlighted Projects

  • NovelSeek. (End-to-end Auto-research Framework that has demonstrated its versatility across 12 scientific research tasks.) [Project][Technical report]

  • MinerU and PDF-Extract-Kit. (A popular open-source tool, converting PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format.) [Project][Technical report]

  • InternVL 1.5 and InternVL 2. (Rank 1st among open-source VLM models on MMMU, DocVQA, ChartQA, and MathVista.) [Project][Technical report]

  • 3DTrans (Work during the PhD period). (An Open-source Codebase for Continuous Learning towards Autonomous Driving Task, including Unsupervised Domain Adaptation (UDA), Active Domain Adaptation (ADA), Semi-Supervised Domain Adaptation (SSDA), and Multi-dateset Domain Fusion (MDF) tasks.) [Project][Technical report]

🌎 News

2025:

  • 2025.05:   🔥🔥🎉🎉 When Agent Becomes the Scientist: Your ultimate AI-powered Scientist for finding, analyzing, and experimentation like never before! NovelSeek Page

  • 2025.05:   🎉🎉 SurveyForge and Dolphin are accepted by ACL-2025.

  • 2025.05:   MME-CoT is accepted by ICML-2025.

  • 2025.02:   🎉🎉 Three papers are accepted by CVPR-2025: JiSAM, OmniDocBench, CDM.

  • 2025.01:   One of our papers has been accepted for publication in TPAMI, another has been accepted by TGRS.

  • 2025.01:   🎉🎉 Two papers accepted to ICLR 2025: GeoX, OmniCorpus

2024:

  • 2024.10:  🎉🎉 Grateful for the heartfelt recognition and thoughtful sharing of my research work Fudan_CYL and Fudan_SIST .

  • 2024.10:  🎉🎉 The technical report for MinerU with high table extraction ability (StructEqTable-Deploy), an open-source solution for high-precision document content extraction, has been published.

  • 2024.09: Three papers accepted to NeurIPS 2024: AdaptiveDiffusion, ZOPP, LeapAD

  • 2024.08: Bo Zhang was invited to serve as a PC member of AAAI 2025.

  • 2024.08: We collaborated with the OpenDataLab team to open-source the PDF-Extract-Kit. It can extract high-quality and structured content from PDFs and has gained 6K+ stars.

  • 2024.07: One paper (Reg-TTA3D) is accepted by ECCV 2024. We explore test-time adaptive 3d object detection for the first time.

  • 2024.05: Our paper entitled "Cross-Task Linearity Emerges in the Pretraining-Finetuning Paradigm" is accepted for publication in ICML 2024.

  • 2024.05: One paper (Expert Pruning-Skipping) is accepted by ACL 2024.

  • 2024.02: One paper (Once for Both) is accepted by CVPR-2024.

  • 2024.01: One paper (ReSimAD) is accepted by ICLR 2024. We propose a zero-shot generalization framework by reconstructing mesh and simulating target point clouds.

  • 2024.01: Two papers (IPNet and MVNet) are accepted by TCSVT.

2023:

  • 2023.12: We have released the ChartX benchmark, covering 18 chart types, 7 chart tasks, 22 disciplinary topics to evaluate the chart-related capabilities of the existing MLLMS.

  • 2023.09: SPOT, showing a promising and scalable 3D pre-training on autonomous driving, has been released.

  • 2023.09: One paper entitled “AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset” is accepted by NeurIPS-2023.

  • 2023.08: One paper BFDA about cross-domain background-fouced alignment is accepted by TIP.

  • 2023.07:   One paper entitled "SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification" is accepted by ACM MM-2023.

  • 2023.04: One paper entitled "Performance-aware Approximation of Global Channel Pruning for Multitask CNNs" is accepted for publication in T-PAMI.

  • 2023.03:  🎉🎉 Three papers are accepted by CVPR-2023: Uni3D, Bi3D, GDP.

  • 2023.02: Bo Zhang started to work on exploring how to improve the problem-solving and reasoning ability of LLMs or VLMs for complicated modalities, including Chart, Table, Geometry, Scientific Document, by investigating foundation LLM models from the perspective of structured knowledge-rich data.

📝 Selected Publications

ACL 2025
sym

SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing

Xiangchao Yan, Shiyang Feng, Jiakang Yuan, Renqiu Xia, Bin Wang, Lei Bai, Bo Zhang^(corr.) [Project][Benchmark][Paper]

  • We propose SurveyForge, a novel automated framework for generating high-quality academic survey papers
  • We propose a heuristic outline generation method and a memory-driven scholar navigation agent
  • To facilitate objective evaluation, we establish SurveyBench, to assess outline, reference, and content quality
ACL 2025
sym

Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback

Jiakang Yuan, Xiangchao Yan, Shiyang Feng, Bo Zhang^(corr.), Tao Chen, Botian Shi, Wanli Ouyang, Yu Qiao, Lei Bai, Bowen Zhou [Project][Paper]

  • we propose task-attribute-guided paper ranking and exception-traceback-guided debugging process to improve the quality of generated ideas and the successful rate of code execution.
ICML 2025
sym

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Dongzhi Jiang, Renrui Zhang, Ziyu Guo, Yanwei Li, Yu Qi, Xinyan Chen, Liuhui Wang, Jianhan Jin, Claire Guo, Shen Yan, Bo Zhang, Chaoyou Fu, Peng Gao, Hongsheng Li [Project][Paper]

  • We introduce MME-CoT, a specialized benchmark evaluating the CoT reasoning performance of LMMs
  • MME-CoT covers six domains: math science, OCR, logic, space-time, and general scenes.
ICLR 2025
sym

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training

Renqiu Xia, Mingsheng Li, Hancheng Ye, Wenjie Wu, Hongbin Zhou, Jiakang Yuan, Tianshuo Peng, Xinyu Cai, Xiangchao Yan, Bin Wang, Conghui He, Botian Shi, Tao Chen, Junchi Yan, Bo Zhang^(corr.)

[Project][Paper]

  • Our study reveals the large potential of formalized visual-language pre-training in enhancing geometric problem-solving abilities. To enable the formalized pre-training, we propose GeoX, aiming to build geometric generalist models by modeling geometric tasks into a unified formulation.
  • We propose a Generator-And-Sampler Transformer (GS-Former) to generate discriminative queries and eliminate uninformative representations from unevenly distributed geometric signals.
ICLR 2025
sym

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang Jin, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Zhenxiang Li, Pei Chu, Yi Wang, Min Dou, Changyao Tian, Xizhou Zhu, Lewei Lu, Yushi Chen, Junjun He, Zhongying Tu, Tong Lu, Yali Wang, Limin Wang, Dahua Lin, Yu Qiao, Botian Shi, Conghui He, Jifeng Dai

[Project][Paper]

  • We filter and extract large-scale high-quality documents, which contain 8.6 billion images and 1,696 billion tet tokens.
  • Our dataset is 15 times larger with high quality, features diverse sources (including English, non-English, and video-centric websites), and offers flexibility to adapt from an image-text interleaved format to pure text or image-text pairs.
SCIS
sym

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang

[Project][Paper]

  • Propose InternVL 1.5 and InternVL 2. (Rank 1st among open-source VLM models on MMMU, DocVQA, ChartQA, and MathVista.)
NeurIPS 2024
sym

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy

Hancheng Ye, Jiakang Yuan, Renqiu Xia, Xiangchao Yan, Tao Chen, Junchi Yan, Botian Shi, Bo Zhang^(corr.)

[Project][Paper]

  • Propose AdaptiveDiffusion to adaptively reduce the noise prediction steps during the denoising proces guided by the third-order latent difference.
ICML 2024
sym

On the Emergence of Cross-Task Linearity in the Pretraining-Finetuning Paradigm

Zhanpeng Zhou, Zijun Chen, Yilan Chen, Bo Zhang^(corr.), Junchi Yan

[Project][Paper]

  • We discover an intriguing linear phenomenon in models that are initialized from a common pretrained checkpoint and finetuned on different tasks, termed as Cross-Task Linearity (CTL).
CVPR 2024
sym

Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression

Hancheng Ye, Chong Yu, Peng Ye, Renqiu Xia, Yansong Tang, Jiwen Lu, Tao Chen, Bo Zhang^(corr.)

[Project][Paper]

  • We investigate how to integrate the evaluations of importance and sparsity scores into a single stage, searching the optimal subnets in an efficient manner.
  • We present OFB, a cost-efficient approach that simultaneously evaluates both importance and sparsity scores, termed Once for Both (OFB), for Vision Transformer Compression (VTC) task.
ACL 2024
sym

Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models

Xudong Lu, Qi Liu, Yuhui Xu, Aojun Zhou, Siyuan Huang, Bo Zhang, Junchi Yan, Hongsheng Li

[Project][Paper]

  • Different from previous weight pruning methods that rely on specifically designed hardware, this paper mainly aims to enhance the deployment efficiency of MoE LLMs by introducing plug-and-play expert-level sparsification techniques.
  • We present to post-training approaches for task-agnostic and task-specific expert pruning and skipping of MoE LLM.
ICLR 2024
sym

ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation

Bo Zhang, Xinyu Cai, Jiakang Yuan, Donglin Yang, Jianfei Guo, Xiangchao Yan, Renqiu Xia, Botian Shi, Min Dou, Tao Chen, Si Liu, Junchi Yan, Yu Qiao

[Project][Paper]

  • Provide a new perspective and approach of alleviating the domain shifts, by proposing a Reconstruction-Simulation-Perception scheme.
CVPR 2023
sym

Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection

Bo Zhang, Jiakang Yuan, Botian Shi, Tao Chen, Yikang Li, Yu Qiao

[Project][Paper]

  • Present a Uni3D which tackle multi-dataset 3D object detection from data-level and semantic-level.
NeurIPS 2023
sym

AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset

Jiakang Yuan, Bo Zhang^(corr.), Xiangchao Yan, Tao Chen, Botian Shi, Yikang Li, Yu Qiao

[Project][Paper]

  • Build a large-scale pre-training point-cloud dataset with diverse data distribution, and meanwhile learn generalizable representations.
CVPR 2023
sym

Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection

Jiakang Yuan, Bo Zhang^(corr.), Xiangchao Yan, Tao Chen, Botian Shi, Yikang Li, Yu Qiao

[Project][Paper]

  • Propose a Bi-domain active learning approach which select samples from both source and target domain to solve the cross-domain 3D object detection task.
ACM'MM 2023
sym

SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification

Siyuan Huang, Bo Zhang^(corr.), Botian Shi, Peng Gao, Yikang Li, Hongsheng Li

[Project][Paper]

  • Propose a Single-dataset Unified Generalization (SUG) framework that only leverages a single source dataset to alleviate the unforeseen domain differences faced by a well-trained source model. .
TPAMI
sym

Performance-aware Approximation of Global Channel Pruning for Multitask CNNs

Hancheng Ye, Bo Zhang, Tao Chen, Jiayuan Fan, Bin Wang

[Project][Paper]

  • We propose a Performance-Aware Global Channel Pruning (PAGCP) framework. We first theoretically present the objective for achieving superior GCP, by considering the joint saliency of filters from intra- and inter-layers.
TIP
sym

Sample-Centric Feature Generation for Semi-Supervised Few-Shot Learning

Bo Zhang, Hancheng Ye, Gang Yu, Bin Wang, Yike Wu, Jiayuan Fan, Tao Chen

[Project][Paper]

  • Propose a sample-centric feature generation (SFG) approach for semi-supervised few-shot image classification.
ACM'MM 2022
sym

Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification

Bo Zhang, Jiakang Yuan, Baopu Li, Tao Chen, Jiayuan Fan, Botian Shi

[Project][Paper]

  • Propose a Transformer-based double-helix model to achieve the cross-image object semantic relation mining in a bidirectional and symmetrical manner.

💬 Invited Talks

  • 2024.07, Invited talk of Multimodal Large Model Summit. [Video]
  • 2023.09, Invited talk of Effcient Pre-training of Autonomous Driving. [Video]
  • 2023.07, Invited talk of Towards 3D General Representation at Techbeat. [Video]
  • 2023.03, Invited talk of Transferable Pwerception of Autonomous Driving. [Video]

💻 Internships

🎓 Ph.D Thesis

During his Ph.D. studies, Bo Zhang dedicated himself to advancing domain-adaptive models including 2D/3D domains. With a strong foundation in both theoretical research and practical applications, he has gained extensive expertise in model adaptation and continuous learning.

[Ph.D Thesis]

📝 Collaborators