Bo Zhang

Researcher & Shanghai Artificial Intelligence Laboratory.

Office: Level 12, L1, Longwen Road No. 129, Xuhui District, Shanghai, China

Email: zhangbo [at] pjlab.org.cn & bo.zhangzx [at] gmail.com

Academic activities: Reviewer at CVPR/ECCV/ICCV/ICML/ICLR/ACM-MM & T-PAMI/TIP/TGRS/T-CSVT/T-MM/T-NNLS/TKDE

Bo Zhang received the Ph.D. degree in electronic engineering from Fudan University, in 2022. Currently, he is committed to promoting the rapid application of vision-language models in different scenarios, such as scientific document context extraction and understanding, automatic paper survey, mathematical reasoning, and autonomous driving.

His work has led to many awards, including Shanghai Rising Star under Grant No. 23QD1401000, awarded by the Shanghai Municipal Commission of Science and Technology, the National Scholarship 2021 China Award, the 2019 Excellent Doctoral Scholarship of Fudan University Award, and various awards from VALSE China and Shanghai Government. His research outcomes have some impacts on industrial applications like airport checkpoint security perceptual recognition and localization of concealed or dangerous objects.

News

2024:

  • Oct 06, 2024: Grateful for the heartfelt recognition and thoughtful sharing of my research work: Fudan_CYL and Fudan_SIST.
  • Oct 02, 2024: The technical report for MinerU, an open-source solution for high-precision document content extraction, has been published.
  • Sep 26, 2024: Three papers are accepted by NeurIPS-2024: AdaptiveDiffusion, ZOPP, LeapAD.
  • Sep 06, 2024: Previous evaluation metrics for Formula and Table Recognition tasks, such as BLEU and Edit Distrance, exhibit limitations. CDM has been released to ensure the evaluation objectivity by designing an image-level rather than LaTex-level metric score for Formula and Table Recognition evaluation.
  • Aug 13, 2024: Bo Zhang was invited to serve as a PC member of AAAI 2025.
  • Aug 08, 2024: We open-sourced Models and StructEqTable-Deploy, which is a open-source repository to support the structuring tasks of visual tables.
  • Aug 01, 2024: We collaborated with the OpenDataLab team to open-source the PDF-Extract-Kit repository, which can extract high-quality and structured content from PDFs and has gained 4K+ stars.
  • Jul 01, 2024: RegTTA3D: Regression Makes Better Test-time Adaptive 3D Object Detection is accepted by ECCV 2024.
  • Jun 06, 2024: We have released the DocGenome benchmark, a structured scientific document dataset constructed by annotating 500K scientific documents from 153 disciplines in the arXiv open-access community.
  • May 16, 2024: Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models (arXiv paper) is accepted by ACL 2024.
  • May 02, 2024: Our paper entitled “Cross-Task Linearity Emerges in the Pretraining-Finetuning Paradigm” (arXiv paper) is accepted for publication in ICML 2024.
  • Feb 28, 2024: Our paper entitled “Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression” (arXiv paper) is accepted for publication in CVPR 2024.
  • Jan 15, 2024: Our paper entitled “ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation” (arXiv paper) is accepted for publication in ICLR 2024.
  • Jan 02, 2024: Two papers are accepted by TCSVT: IPNet, MVNet.

2023:

  • Dec 30, 2023: We have released the ChartX benchmark (data downloading) covering 18 chart types, 7 chart tasks, 22 disciplinary topics to evaluate the chart-related capabilities of the existing MLLMS.
  • Sep 24, 2023: StructChart: our research on visual chart, has been released (arXiv paper), where we will release the SimChart9K dataset powered by LLM. By the proposed SimChart9K, we observe that StructChart continuously improves the chart perception performance as more simulated charts are used for pre-training.
  • Sep 29, 2023, SPOT, showing a promising and scalable 3D pre-training on autonomous driving, has been released (See our paper for more details, arXiv paper).
  • Sep 22, 2023: One paper entitled “AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset” is accepted by NeurIPS-2023.
  • Aug 16, 2023: One paper about cross-domain background-fouced alignment “Rethinking Cross-Domain Pedestrian Detection: A Background-Focused Distribution Alignment Framework for Instance-Free One-Stage Detectors” is accepted by TIP.
  • Jul 20, 2023: One paper entitled “SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification” is accepted by ACM MM-2023.
  • May 25, 2023: AD-PT, our research on 3D point-cloud pre-training, has been released (Code, arXiv paper).
  • Apr 10, 2023: One paper entitled “Performance-aware Approximation of Global Channel Pruning for Multitask CNNs” is accepted for publication in T-PAMI.
  • Mar 20, 2023: Bo Zhang started to work on exploring how to improve the reasoning ability of LLMs or VLMs in complex modalities, such as Chart, Table, Geometry, Scientific Document Understanding, by investigating foundation LLM models from the perspective of structured knowledge-rich data.
  • Mar 08, 2023: Three papers are accepted by CVPR-2023: Uni3D, Bi3D, GDP.

Preprints

  • MinerU: An Open-Source Solution for Precise Document Content Extraction. Bin Wang, Chao Xu, Xiaomeng Zhao, Linke Ouyang, Fan Wu, Zhiyuan Zhao, Rui Xu, Kaiwen Liu, Yuan Qu, Fukai Shang, Bo Zhang, Liqun Wei, Zhihao Sui, Wei Li, Botian Shi, Yu Qiao, Dahua Lin, Conghui He. [arXiv, Open-source Project]

  • CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation. Bin Wang, Fan Wu, Linke Ouyang, Zhuangcheng Gu, Rui Zhang, Renqiu Xia, Bo Zhang, Conghui He. [arXiv, Benchmark, Metric]

  • ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning. Renqiu Xia, Bo Zhang, Hancheng Ye, Xiangchao Yan, Qi Liu, Hongbin Zhou, Zijun Chen, Min Dou, Botian Shi, Junchi Yan, Yu Qiao. [arXiv, Benchmark, Metric]

  • UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition. Bin Wang, Zhuangcheng Gu, Guang Liang, Chao Xu, Bo Zhang, Botian Shi, Conghui He. [arXiv, Code, Benchmark]

  • How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites. Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang. [arXiv, HomePage]

  • OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text. Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang Jin, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Zhenxiang Li, Pei Chu, Yi Wang, Min Dou, Changyao Tian, Xizhou Zhu, Lewei Lu, Yushi Chen, Junjun He, Zhongying Tu, Tong Lu, Yali Wang, Limin Wang, Dahua Lin, Yu Qiao, Botian Shi, Conghui He, Jifeng Dai. [arXiv, HomePage, Data Downloading]

  • DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models. Renqiu Xia, Song Mao, Xiangchao Yan, Hongbin Zhou, Bo Zhang^(corr.), Haoyang Peng, Jiahao Pi, Daocheng Fu, Wenjie Wu, Hancheng Ye, Shiyang Feng, Bin Wang, Chao Xu, Conghui He, Pinlong Cai, Min Dou, Botian Shi, Sheng Zhou, Yongwei Wang, Bin Wang, Junchi Yan, Fei Wu, Yu Qiao. [arXiv, HomePage, Data Downloading]

Selected Publications

  • ^ refers to the corresponding author

Pre-training Models and Benchmarks:

  • ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving. Tao Ma, Hongbin Zhou, Qiusheng Huang, Xuemeng Yang, Jianfei Guo, Bo Zhang, Min Dou, Yu Qiao, Botian Shi, Hongsheng Li. Accepted by NeurIPS-2024. [arXiv, Code, CCF A]

  • Cross-Task Linearity Emerges in the Pretraining-Finetuning Paradigm. Zhanpeng Zhou, Zijun Chen, Yilan Chen, Bo Zhang^(corr.), Junchi Yan. Published in ICML-2024. [arXiv, CCF A]

  • AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset. Jiakang Yuan, Bo Zhang^(corr.), Xiangchao Yan, Tao Chen, Botian Shi, Yikang LI, Yu Qiao. Published in NeurIPS-2023. [arXiv, Code, CCF A]

  • Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection. Bo Zhang, Jiakang Yuan, Botian Shi, Tao Chen, Yikang LI, Yu Qiao. Published in CVPR-2023. [arXiv, Code, CCF A]

  • Generative Diffusion Prior for Unified Image Restoration and Enhancement. Ben Fei, Zhaoyang Lyu, Liang Pan, Junzhe Zhang, Weidong Yang, Tianyue Luo, Bo Zhang, Bo Dai. Published in CVPR-2023. [arXiv, Code, CCF A]

  • Sample-centric feature generation for semi-supervised few-shot learning. B Zhang, H Ye, G Yu, B Wang, Y Wu, J Fan, T Chen. Published in TIP. [IEEE, Code, CCF A]

Efficient AI Models:

  • Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy. Hancheng Ye, Jiakang Yuan, Renqiu Xia, Xiangchao Yan, Tao Chen, Junchi Yan, Botian Shi, Bo Zhang^(corr.). Accepted by NeurIPS-2024. [arXiv, Code, CCF A]

  • Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models. Xudong Lu, Qi Liu, Yuhui Xu, Aojun Zhou, Siyuan Huang, Bo Zhang, Junchi Yan, Hongsheng Li. Published in ACL-2024. [arXiv, Code, CCF A]

  • Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression. Hancheng Ye, Chong Yu, Peng Ye, Renqiu Xia, Yansong Tang, Jiwen Lu, Tao Chen, Bo Zhang^(corr.). Published in CVPR-2024. [arXiv, Code, CCF A]

  • Performance-aware Approximation of Global Channel Pruning for Multitask CNNs. Hancheng Ye, Bo Zhang, Tao Chen, Jiayuan Fan, and Bin Wang. Published in T-PAMI. [IEEE, Code, CCF A]

Domain-adaptive Models:

  • Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving. Jianbiao Mei, Yukai Ma, Xuemeng Yang, Licheng Wen, Xinyu Cai, Xin Li, Daocheng Fu, Bo Zhang, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, Yu Qiao. Accepted by NeurIPS-2024. [arXiv, Code, CCF A]

  • RegTTA3D: Regression Makes Better Test-time Adaptive 3D Object Detection. Jiakang Yuan, Bo Zhang, Kaixiong Gong, Xiangyu Yue, Botian Shi, Yu Qiao, Tao Chen. Published in ECCV-2024.

  • ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation. Bo Zhang, Xinyu Cai, Jiakang Yuan, Donglin Yang, Jianfei Guo, Xiangchao Yan, Renqiu Xia, Botian Shi, Min Dou, Tao Chen, Si Liu, Junchi Yan, Yu Qiao. Published in ICLR-2024. [arXiv, Code, CCF A]

  • Rethinking Cross-Domain Pedestrian Detection: A Background-Focused Distribution Alignment Framework for Instance-Free One-Stage Detectors. Yancheng Cai, Bo Zhang, Baopu Li, Tao Chen, Hongliang Yan, and Jiahao Xu. Published in TIP. [IEEE, Code, CCF A]

  • Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection. Jiakang Yuan, Bo Zhang^(corr.), Xiangchao Yan, Tao Chen, Botian Shi, Yikang LI, Yu Qiao. Published in CVPR-2023. [arXiv, Code, CCF A]

  • SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification. Siyuan Huang, Bo Zhang^(corr.), Botian Shi, Peng Gao, Yikang Li, and Hongsheng Li. Published in ACM-MM-2023. [arXiv, Code, CCF A]

  • A Closer Look at Few-Shot 3D Point Cloud Classification. C Ye, H Zhu, B Zhang, T Chen. Published in IJCV. [arXiv, CCF A]

  • Learning cross-image object semantic relation in transformer for few-shot fine-grained image classification. B Zhang, J Yuan, B Li, T Chen, J Fan, B Shi. Published in ACM-MM-2022. [arXiv, Code, CCF A]

  • Joint distribution alignment via adversarial learning for domain adaptive object detection. B Zhang, T Chen, B Wang, R Li. Published in TMM. [arXiv, Code, Tsinghua-A / CAAI-A, Accept without any changes during the first round]

Ph.D Thesis

During the period of pursuing a Ph.D degree, Bo Zhang was focused on studying domain adaptive 2D object detection or semantic segmentation models, and has deep research and practical experience for the model adaptation/transfer task.

Ph.D Thesis