Bo Zhang received the Ph.D. degree in electronic engineering from the School of Information Science and Technology, Fudan University. He is currently a Research Scientist in Shanghai AI Laboratory. His work has led to many awards, including Shanghai Rising Star under Grant No. 23QD1401000, awarded by the Shanghai Municipal Commission of Science and Technology, the National Scholarship 2021 China Award, the 2019 Excellent Doctoral Scholarship of Fudan University Award, and various awards from VALSE China and Shanghai Government. His research outcomes have some impacts on industrial applications like airport checkpoint security recognition, domain adaptive face recognition, and localization of concealed dangerous objects.

He has published 30+ papers in top-tier international conferences and journals such as CVPR, NeurIPS, ICML, ICLR, T-PAMI, TIP, T-MM, and IJCV. He also serves as a reviewer for several prestigious academic conferences and journals, including CVPR, ECCV, ICCV, ICLR, and ICML. He led the development of the 3DTrans general scene representation open-source project, which won the Waymo Challenge international competition and accumulated over 1.5k stars. Furthermore, he is committed to exploring the fundamental nature of long-chain reasoning in large models and aims to develop innovator-level agents through reinforcement learning methods and reflection mechanism.

🔥 News

2024:

2024.12: 🎉🎉 Our research project, GeoX, has been officially open-sourced today. It is the first to explore formalized visual-language pre-training in enhancing geometric problem-solving abilities.
2024.10: 🎉🎉 Grateful for the heartfelt recognition and thoughtful sharing of my research work Fudan_CYL and Fudan_SIST .
2024.10: 🎉🎉 The technical report for MinerU, an open-source solution for high-precision document content extraction, has been published.
2024.09: 🎉🎉 Three papers accepted to NeurIPS 2024: AdaptiveDiffusion, ZOPP, LeapAD
2024.09: 🎉🎉 Previous evaluation metrics for Formula and Table Recognition tasks, such as BLEU and Edit Distrance, exhibit limitations. Our CDM has been released to ensure the evaluation objectivity by designing an image-level rather than LaTex-level metric score for Formula and Table Recognition evaluation.
2024.08: 🎉🎉 Bo Zhang was invited to serve as a PC member of AAAI 2025.
2024.08: 🎉🎉 We open-sourced StructTable: Table Structural Extraction Model Models and StructEqTable-Deploy. It is a open-source repository to support the structuring tasks of visual tables.
2024.08: 🎉🎉 We collaborated with the OpenDataLab team to open-source the PDF-Extract-Kit. It can extract high-quality and structured content from PDFs and has gained 6K+ stars.
2024.07: 🎉🎉 One paper (Reg-TTA3D) is accepted by ECCV 2024. We explore test-time adaptive 3d object detection for the first time.
2024.03: 🎉🎉 One paper is accepted by ACL 2024. We propose All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models.
2024.02: 🎉🎉 One paper (Once for Both) is accepted by CVPR 2024. Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression.
2024.01: 🎉🎉 One paper (ReSimAD) is accepted by ICLR 2024. We propose a zero-shot generalization framework by reconstructing mesh and simulating target point clouds.
2024.01: 🎉🎉 Two papers (IPNet and MVNet) are accepted by TCSVT.

2023:

2023.12: 🎉🎉 We have released the ChartX benchmark, covering 18 chart types, 7 chart tasks, 22 disciplinary topics to evaluate the chart-related capabilities of the existing MLLMS.
2023.09: 🎉🎉 StructChart: our research on visual chart, has been released arXiv paper, where we will release the SimChart9K dataset powered by LLM. By the proposed SimChart9K, we observe that StructChart continuously improves the chart perception performance as more simulated charts are used for pre-training.
2023.09: 🎉🎉 SPOT, showing a promising and scalable 3D pre-training on autonomous driving, has been released (See our paper for more details, arXiv paper).
2023.09: 🎉🎉 - One paper entitled “AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset” is accepted by NeurIPS-2023.
2023.07: 🎉🎉 One paper about cross-domain background-fouced alignment "Rethinking Cross-Domain Pedestrian Detection: A Background-Focused Distribution Alignment Framework for Instance-Free One-Stage Detectors" is accepted by TIP.
2023.07: 🎉🎉 One paper entitled "SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification" is accepted by ACM MM-2023.
2023.04: 🎉🎉 One paper entitled "Performance-aware Approximation of Global Channel Pruning for Multitask CNNs" is accepted for publication in T-PAMI.
2023.03: 🎉🎉 Three papers are accepted by CVPR-2023: Uni3D, Bi3D, GDP.
2023.02: 🎉🎉 Bo Zhang started to work on exploring how to improve the problem-solving and reasoning ability of LLMs or VLMs for complicated modalities, including Chart, Table, Geometry, Scientific Document, by investigating foundation LLM models from the perspective of structured knowledge-rich data.

📝 Selected Publications

SCIS

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang

Bo Zhang (张铂)

🔥 News

📝 Selected Publications

💬 Invited Talks

💻 Internships

📝 Collaborators

Bo Zhang
(张铂)