Bo Zhang received the Ph.D. degree in electronic engineering from the School of Information Science and Technology, Fudan University. He is currently a Research Scientist in Shanghai AI Laboratory. His work has led to many awards, including Shanghai Rising Star under Grant No. 23QD1401000, awarded by the Shanghai Municipal Commission of Science and Technology, the National Scholarship 2020/2021 China Award, the 2019 Excellent Doctoral Scholarship of Fudan University Award, and various awards from VALSE China and Shanghai Government. His research outcomes have some impacts on industrial applications like airport checkpoint security recognition, domain adaptive face recognition, and localization of concealed dangerous objects.

He has published 40+ papers in top-tier international conferences and journals such as CVPR, NeurIPS, ICLR, ICML, ACL, T-PAMI, TIP, TGRS, and IJCV. He also serves as a reviewer for several prestigious academic conferences and journals, including CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, ACL etc. He led the development of the 3DTrans general scene representation open-source project, which won the Waymo Challenge international competition and accumulated over 3k stars. Furthermore, he is committed to exploring the fundamental nature of long-chain reasoning in large models and aims to develop innovator-level agents through reinforcement learning methods and reflection mechanism.

🚀 Join Shanghai AI Lab's Elite Team!
We're recruiting PhDs (2025/2026 intake) & Researcher (June/Sep 2025/2026 start) to pioneer AI-Scientist, LLM/VLM, and Multi-Agent Self-evolution.
👉 Contact now with your CV + research vision: zhangbo@pjlab.org.cn & bo.zhangzx@gmail.com

🔥 Highlighted Projects

NovelSeek. (End-to-end Auto-research Framework that has demonstrated its versatility across 12 scientific research tasks.) [Project][Technical report]
MinerU and PDF-Extract-Kit. (A popular open-source tool, converting PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format.) [Project][Technical report]
InternVL 1.5 and InternVL 2. (Rank 1st among open-source VLM models on MMMU, DocVQA, ChartQA, and MathVista.) [Project][Technical report]
3DTrans (Work during the PhD period). (An Open-source Codebase for Continuous Learning towards Autonomous Driving Task, including Unsupervised Domain Adaptation (UDA), Active Domain Adaptation (ADA), Semi-Supervised Domain Adaptation (SSDA), and Multi-dateset Domain Fusion (MDF) tasks.) [Project][Technical report]

🌎 News

2025:

2025.06: SPOT has been accepted as a Regular Paper in Transactions on Pattern Analysis and Machine Intelligence.</a>
2025.06: Three papers accepted to ICCV 2025: Chimera, Lumina-Image 2.0, TOP
2025.05: 🔥🔥🎉🎉 When Agent Becomes the Scientist: Your ultimate AI-powered Scientist for finding, analyzing, and experimentation like never before! NovelSeek Page
2025.05: 🎉🎉 SurveyForge and Dolphin are accepted by ACL-2025.
2025.05: MME-CoT is accepted by ICML-2025.
2025.02: 🎉🎉 Three papers are accepted by CVPR-2025: JiSAM, OmniDocBench, CDM.
2025.01: One of our papers has been accepted for publication in TPAMI, another has been accepted by TGRS.
2025.01: 🎉🎉 Two papers accepted to ICLR 2025: GeoX, OmniCorpus

2024:

2024.10: 🎉🎉 Grateful for the heartfelt recognition and thoughtful sharing of my research work Fudan_CYL and Fudan_SIST .
2024.10: 🎉🎉 The technical report for MinerU with high table extraction ability (StructEqTable-Deploy), an open-source solution for high-precision document content extraction, has been published.
2024.09: Three papers accepted to NeurIPS 2024: AdaptiveDiffusion, ZOPP, LeapAD
2024.08: Bo Zhang was invited to serve as a PC member of AAAI 2025.
2024.08: We collaborated with the OpenDataLab team to open-source the PDF-Extract-Kit. It can extract high-quality and structured content from PDFs and has gained 6K+ stars.
2024.07: One paper (Reg-TTA3D) is accepted by ECCV 2024. We explore test-time adaptive 3d object detection for the first time.
2024.05: Our paper entitled "Cross-Task Linearity Emerges in the Pretraining-Finetuning Paradigm" is accepted for publication in ICML 2024.
2024.05: One paper (Expert Pruning-Skipping) is accepted by ACL 2024.
2024.02: One paper (Once for Both) is accepted by CVPR-2024.
2024.01: One paper (ReSimAD) is accepted by ICLR 2024. We propose a zero-shot generalization framework by reconstructing mesh and simulating target point clouds.
2024.01: Two papers (IPNet and MVNet) are accepted by TCSVT.

2023:

2023.12: We have released the ChartX benchmark, covering 18 chart types, 7 chart tasks, 22 disciplinary topics to evaluate the chart-related capabilities of the existing MLLMS.
2023.09: SPOT, showing a promising and scalable 3D pre-training on autonomous driving, has been released.
2023.09: One paper entitled “AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset” is accepted by NeurIPS-2023.
2023.08: One paper BFDA about cross-domain background-fouced alignment is accepted by TIP.
2023.07: One paper entitled "SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification" is accepted by ACM MM-2023.
2023.04: One paper entitled "Performance-aware Approximation of Global Channel Pruning for Multitask CNNs" is accepted for publication in T-PAMI.
2023.03: 🎉🎉 Three papers are accepted by CVPR-2023: Uni3D, Bi3D, GDP.
2023.02: Bo Zhang started to work on exploring how to improve the problem-solving and reasoning ability of LLMs or VLMs for complicated modalities, including Chart, Table, Geometry, Scientific Document, by investigating foundation LLM models from the perspective of structured knowledge-rich data.

📝 Selected Publications

ICCV 2025

Chimera: Improving Generalist Model with Domain-Specific Experts

Tianshuo Peng, Mingsheng Li, Jiakang Yuan, Hongbin Zhou, Renqiu Xia, Renrui Zhang, Lei Bai, Song Mao, Bin Wang, Aojun Zhou, Botian Shi, Tao Chen, Bo Zhang^(corr.), Xiangyu Yue [Project][Models][Paper]

We propose Chimera, a scalable pipeline that integrates specialist models into generalist LMMs, facilitating their adaptation to many specialized tasks.
Chimera achieves SOTA performance on MathVista and MathVerse. Furthermore, it achieves near-specialist-level results in visual structural extraction on benchmarks like ChartQA-SE, Table-SE, Doc-SE.

ACL 2025

SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing

Xiangchao Yan, Shiyang Feng, Jiakang Yuan, Renqiu Xia, Bin Wang, Lei Bai, Bo Zhang^(corr.) [Project][Benchmark][Paper]

We propose SurveyForge, a novel automated framework for generating high-quality academic survey papers
We propose a heuristic outline generation method and a memory-driven scholar navigation agent
To facilitate objective evaluation, we establish SurveyBench, to assess outline, reference, and content quality

ACL 2025

Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback

Jiakang Yuan, Xiangchao Yan, Shiyang Feng, Bo Zhang^(corr.), Tao Chen, Botian Shi, Wanli Ouyang, Yu Qiao, Lei Bai, Bowen Zhou [Project][Paper]

we propose task-attribute-guided paper ranking and exception-traceback-guided debugging process to improve the quality of generated ideas and the successful rate of code execution.

ICML 2025

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Dongzhi Jiang, Renrui Zhang, Ziyu Guo, Yanwei Li, Yu Qi, Xinyan Chen, Liuhui Wang, Jianhan Jin, Claire Guo, Shen Yan, Bo Zhang, Chaoyou Fu, Peng Gao, Hongsheng Li [Project][Paper]

We introduce MME-CoT, a specialized benchmark evaluating the CoT reasoning performance of LMMs
MME-CoT covers six domains: math science, OCR, logic, space-time, and general scenes.

ICLR 2025

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training

Renqiu Xia, Mingsheng Li, Hancheng Ye, Wenjie Wu, Hongbin Zhou, Jiakang Yuan, Tianshuo Peng, Xinyu Cai, Xiangchao Yan, Bin Wang, Conghui He, Botian Shi, Tao Chen, Junchi Yan, Bo Zhang^(corr.)

Bo Zhang (张铂)

🔥 Highlighted Projects

🌎 News

📝 Selected Publications

💬 Invited Talks

💻 Internships

🎓 Ph.D Thesis

📝 Collaborators

Bo Zhang
(张铂)