About Me

Baoquan Zhao (赵宝全) is currently an associate professor at the School of Artificial Intelligent, Sun Yat-sen University, China. Prior to his current position, he was a Research Fellow under the supervision of Prof. Weisi Lin at the School of Computer Science and Engineering (SCSE), Nanyang Technological University, Singapore, from Sep. 2018 to Jan. 2022. He received his Ph.D. degree in computer science from Sun Yat-sen University, Guangzhou, China, in 2017. His research interests span AIGC, AI Agents, computer graphics, computer vision, and multimedia systems and applications. In these areas, he has published over 60 papers in leading international journals and conferences, and filed approximately 20 Chinese and PCT patents. He has led and participated in numerous national and provincial research programs, including projects funded by the National Natural Science Foundation of China, the Ministry of Science and Technology Key R&D Program, and the Guangdong Provincial Natural Science Foundation. He has long served as a reviewer for prestigious journals including IEEE Transactions on Image Processing, IEEE Transactions on Multimedia, Information Sciences, Pattern Recognition, and Neurocomputing, as well as a Technical Program Committee member for top-tier conferences such as CVPR, ECCV, AAAI, ACM MM, and WWW.

  zhaobaoquan@mail.sysu.edu.cn

  Sun Yat-sen University (Zhuhai Campus), No. 2 Daxue Road, Xiangzhou District, Zhuhai, Guangdong, China, 519082

Selected Publications

Y. Yang, G. Yue, W. Zhou, X. Mao, R. Wang, and B. Zhao*, Expressive Human Volumetric Video Generation with Rich Text, IEEE Transactions on Circuits and Systems for Video Technology, 2026

F. Wang, Z. Zhang, J. Jiang, B. Zhao*, Z. Hao, and X. Luo, Sketch-Based Feature Fusion and Complement for Robust Sketch-to-Voxel Reconstruction, Information Fusion, pp. 103522, 2026

Y. Liao, J. Liang, K. Cui, B. Zhao, H. Xie, W. Liu, Q. Li, and X. Mao, FreqEdit: Preserving High-Frequency Features for Robust Multi-Turn Image Editing, Computer Vision and Pattern Recognition Conference (CVPR), 2026

Z. Zheng, Y. Tan, Z. Su, F. Zhou, and B. Zhao, NEGS-Avatar: Normal Embedded Gaussians for 2D Avatar from Monocular Video, Computers & Graphics, 104538, 2026

J. Cen, X. Mao, G. Yue, W. Zhou, R. Wang, F. Zhou, and B. Zhao*, Depth-Guided Metric-Aware Temporal Consistency for Monocular Video Human Mesh Recovery, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026

Y. Liu, G. Yue, S. Tian, C. Zhao, L. Dong, T. Wang, and B. Zhao, PGA: A Prompt-Guided Adapter for Enhancing Deeply Supervised Polyp Segmentation Models, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026

J. Zhong, R. Wang, B. Zhao, and F. Zhou, Beyond Pixel Prophecy: Hierarchical Knowledge Structures for Training-Free Video Anomaly Prediction, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026

H. Chen, F. Zhou, R. Wang, and B. Zhao, V-HOI: Velocity-Aware Human-Object Interaction Generation, International Conference on Multimedia Modeling (MMM), pp. 519–532, 2026

B. Zhao, X. Ma, Q. Pang, R. Wang, F. Zhou, and S. Lin, VisAug: Facilitating Speech-Rich Web Video Navigation and Engagement with Auto-Generated Visual Augmentations, Proceedings of the 33rd ACM International Conference on Multimedia, pp. 9168–9176, 2025

Y. Meng, J. Ye, W. Zhou, G. Yue, X. Mao, R. Wang, and B. Zhao*, VideoForest: Person-Anchored Hierarchical Reasoning for Cross-Video Question Answering, ACM International Conference on Multimedia, pp. 1–10, 2025

X. Li, J. Zheng, Y. Chen, X. Mao, G. Yue, W. Zhou, C. Lv, R. Wang, F. Zhou, and B. Zhao*, DepthGait: Multi-Scale Cross-Level Feature Fusion of RGB-Derived Depth and Silhouette Sequences for Robust Gait Recognition, ACM International Conference on Multimedia, pp. 1–9, 2025

M. Liu, F. Zhou, R. Wang, B. Zhao*, and F. Zhang, Semantic Distance-Aware Cross-Modal Attention Mechanism for Video Question Answering, IEEE Transactions on Instrumentation and Measurement, 74: 5033114, 2025

X. Li, Y. Yang, Y. Chen, G. Yue, W. Zhou, R. Wang, X. Mao, J. Zheng, F. Zhou, Z. Qiu, and B. Zhao*, MSPoint-Gait: Multi-Scale Point Cloud Analysis for 3D Gait Recognition via Cross-Modal Learning, IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, 2025

Y. Yang, X. Li, Y. Chen, G. Yue, W. Zhou, Z. Su, R. Wang, F. Zhou, and B. Zhao*, MCSMoG: Multi-Conditional Diffusion for Stylized Motion Generation with Parametric Control, IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, 2025

Y. Fu, Z. Cheng, Y. Liao, J. Wang, R. Wang, G. Yue, C. Lv, and B. Zhao*, GameMLD: A Game-Sourced Motion-Language Dataset for Stylized Motion Generation, IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, 2025

X. Liang, R. Wang, B. Zhao#, and J. Feng, Dynamic Feature-Focusing with Cross-Modal Semantic Alignment for Video Moment Retrieval and Highlight Detection, IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, 2025

M. Liu, F. Zhou, R. Wang, and B. Zhao#, Multi-Granularity Frequency Difference-Aware Attention for Video Question Answering, IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, 2025

F. Wu, Y. Pang, J. Zhang, L. Pang, J. Yin, B. Zhao, Q. Li, and X. Mao, CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization, Proceedings of the AAAI Conference on Artificial Intelligence, 39(8): 8377–8385, 2025

Z. Su, Y. Tan, Z. Zheng, F. Zhou, and B. Zhao, Single-View Clothed Human Reconstruction with Multi-View Consistency Representation, IEEE Transactions on Visualization and Computer Graphics, 31(9): 6550, 2025

G. Yue, S. Wu, G. Li, C. Zhao, Y. Hao, T. Zhou, and B. Zhao#, Boundary-Guided Feature-Aligned Network for Colorectal Polyp Segmentation, IEEE Transactions on Circuits and Systems for Video Technology, 2025

T. Zhou, S. Tan, L. Li, B. Zhao, Q. Jiang, and G. Yue, Cross-Modality Interactive Attention Network for AI-Generated Image Quality Assessment, Pattern Recognition, 167: 111693, 2025

L. Pang, J. Yin, B. Zhao, F. Wu, F. L. Wang, Q. Li, and X. Mao, AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation, Advances in Neural Information Processing Systems (NeurIPS), 37: 39869–39900, 2024

Y. Huang, Y. Yuan, X. Zeng, L. Xie, Y. Fu, G. Yue, and B. Zhao*, Full-Reference Motion Quality Assessment Based on Efficient Monocular Parametric 3D Human Body Reconstruction, IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, 2024

F. Wang, J. Sheng, K. Jiang, Z. Zhang, J. Zheng, and B. Zhao*, Single Free-Hand Sketch Guided Free-Form Deformation for 3D Shape Generation, IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, 2024

T. Zhou, S. Tan, B. Zhao, and G. Yue, Multitask Deep Neural Network with Knowledge-Guided Attention for Blind Image Quality Assessment, IEEE Transactions on Circuits and Systems for Video Technology, 34(8): 7577, 2024

B. Zhao, Y. Fu, Z. Su, R. Wang, C. Lv, and X. Luo, A Survey on Multimodal Information Guided 3D Human Motion Generation, Journal of Image and Graphics, 29(9): 2541–2565, 2024

Y. Fu, B. Zhao*, C. Lv, G. Yue, R. Wang, and F. Zhou, Improved Text-Driven Human Motion Generation via Out-of-Distribution Detection and Rectification, International Conference on Computational Visual Media (CVM), LNCS 14592, pp. 218–231, 2024

Y. Chen, G. Yue, W. Liu, C. Lv, R. Wang, F. Zhou, and B. Zhao*, Predicting Plain Text Imageability for Faithful Prompt-Conditional Image Generation, Pacific Rim International Conference on Artificial Intelligence (PRICAI), pp. 89–95, 2024

H. Chen, B. Zhao*, G. Yue, W. Liu, C. Lv, R. Wang, and F. Zhou, CLIP-medfake: Synthetic Data Augmentation with AI-Generated Content for Improved Medical Image Classification, IEEE International Conference on Image Processing (ICIP), pp. 3854–3860, 2024

G. Yue, H. Xiao, H. Xie, T. Zhou, W. Zhou, W. Yan, B. Zhao, T. Wang, and Q. Jiang, Dual-Constraint Coarse-to-Fine Network for Camouflaged Object Detection, IEEE Transactions on Circuits and Systems for Video Technology, 34(5): 3286–3298, 2023

C. Lv, W. Lin, and B. Zhao, KSS-ICP: Point Cloud Registration Based on Kendall Shape Space, IEEE Transactions on Image Processing, 32: 1681–1693, 2023

J. Yu, S. Wang, L. Zheng, Q. Su, W. Liu, B. Zhao, and J. Yin, Generating Deep Questions with Commonsense Reasoning Ability from the Text by Disentangled Adversarial Inference, Findings of the Association for Computational Linguistics (ACL), pp. 470–486, 2023

F. Wang, K. Tang, H. Wu, B. Zhao, H. Cai, and T. Zhou, SketchBodyNet: A Sketch-Driven Multi-Faceted Decoder Network for 3D Human Reconstruction, Pacific Graphics, 2023

C. Lv, W. Lin, and B. Zhao, Intrinsic and Isotropic Resampling for 3D Point Clouds, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022

C. Xu, W. Jia, R. Wang, Xi. He, B. Zhao, Y. Zhang, Semantic Navigation of PowerPoint-Based Lecture Video for AutoNote Generation, IEEE Transactions on Learning Technologies, 2022

J. Hou, W. Lin, G. Yue, W. Liu, and B. Zhao, Interaction-Matrix Based Personalized Image Aesthetic Assessment, IEEE Transactions on Multimedia, 25:5263 - 5278, 2022

[Bibtex]
coming soon...

B. Zhao, W. Lin, and C. Lv, Fine-Grained Patch Segmentation and Rasterization for 3D Point Cloud Attribute Compression, IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 12, pp. 4590-4602, 2021

A complexity-aware heat kernel signature (HKS)-based patch segmentation method is developed to effectively partition a given point into fine-grained patches that are suitable for attribute image generation while well preserving the inherent spatial correlation among points.
A new patch rasterization and rectification method is developed to achieve a balance between assignment energy and intrinsic patch shape preserving.

[Bibtex]

@article{zhao2021fine,
title={Fine-grained patch segmentation and rasterization for 3-d point cloud attribute compression},
author={Zhao, Baoquan and Lin, Weisi and Lv, Chenlei},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
volume={31},
number={12},
pages={4590--4602},
year={2021},
publisher={IEEE}
}

F. Wang, S. Xu, D. Jiang, B. Zhao, X. Dai, T. Zhou, and X. Luo, Particle Hydrodynamic Simulation of Thrombus Formation Using Velocity Decay Factor, Computer Methods and Programs in Biomedicine, 2021

The proposed method for thrombus formation simulation mainly consists of three steps. First, we formulate the formation of thrombus as a particle-based model and obtain the fibrin concentration of the particles with a discretized form of the convection-diffusion-reaction equation; then, we calculate the velocity decay factor using the obtained fibrin concentration. Finally, the formation of thrombus can be simulated by applying the velocity decay factor on particles.

[Bibtex]
@article{wang2021particle,
title={Particle hydrodynamic simulation of thrombus formation using velocity decay factor},
author={Wang, Fei and Xu, Songhua and Jiang, Dazhi and Zhao, Baoquan and Dong, Xiaoqiang and Zhou, Teng and Luo, Xiaonan},
journal={Computer Methods and Programs in Biomedicine},
volume={207},
pages={106173},
year={2021},
publisher={Elsevier}
}

C. Lv, W. Lin, and B. Zhao, Approximate Intrinsic Voxel Structure for Point Cloud Simplification, IEEE Transactions on Image Processing, vol. 30, pp. 7241-7255, 2021

[Bibtex]

@article{lv2021approximate,
title={Approximate intrinsic voxel structure for point cloud simplification},
author={Lv, Chenlei and Lin, Weisi and Zhao, Baoquan},
journal={IEEE Transactions on Image Processing},
volume={30},
pages={7241--7255},
year={2021},
publisher={IEEE} }

C. Lv, W. Lin, and B. Zhao, Voxel Structure-based Mesh Reconstruction from a 3D Point Cloud, IEEE Transactions on Multimedia, vol. 24, pp. 1815-1829, 2021

A novel voxel structure-based framework was introduced to reconstruct an isotropic mesh from a point cloud keeping important geometric features such as external and internal edges.

[Project page] [Data] [Demo] [Bibtex]

@article{lv2021voxel,
title={Voxel structure-based mesh reconstruction from a 3D point cloud},
author={Lv, Chenlei and Lin, Weisi and Zhao, Baoquan},
journal={IEEE Transactions on Multimedia},
volume={24},
pages={1815--1829},
year={2021},
publisher={IEEE}
}

J. Hou, W. Lin, and B. Zhao, Content-Dependency Reduction with Multi-Task Learning in Blind Stitched Panoramic Image Quality Assessment, IEEE International Conference on Image Processing (ICIP), pp. 3463-3467, 2020

We propose a multi-task learning strategy which encourages learned representation to be less dependent on image content. A siamese network with two weight-shared CNN branches is trained to simultaneously compare the quality of two images of the same scene and predict the quality score of each image.

[Bibtex]

@inproceedings{hou2020content,
title={Content-dependency reduction with multi-task learning in blind stitched panoramic image quality assessment},
author={Hou, Jingwen and Lin, Weisi and Zhao, Baoquan},
booktitle={2020 IEEE International Conference on Image Processing (ICIP)},
pages={3463--3467},
year={2020},
organization={IEEE}
}

J. U. Hou, B. Zhao, N. Ansari, and W. Lin, Range Image Based Point Cloud Colorization Using Conditional Generative Model, IEEE International Conference on Image Processing (ICIP), pp. 524-528, 2019

We introduce an automatic colorization scheme based on a deep generative network for 3D point clouds. The proposed approach uses the range images of point could geometry and trains a conditional generative adversarial network to predict the color of those images.

[Bibtex]

@inproceedings{hou2019range,
title={Range Image Based Point Cloud Colorization Using Conditional Generative Model},
author={Hou, Jong-Uk and Zhao, Baoquan and Ansari, Naushad and Lin, Weisi},
booktitle={2019 IEEE International Conference on Image Processing (ICIP)},
pages={524--528},
year={2019},
organization={IEEE}
}

B. Zhao, S. Xu, S. Lin, R. Wang, and X. Luo, A New Visual Interface for Searching and Navigating Slide-Based Lecture Videos, 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 928-933, 2019

The interface comprehensively derives versatile semantic clues for video content indexing and visual aid generation according to visual elements, text, and mathematical expressions included on lecture slides, speeches recorded, as well as mouse and cursor pointing actions captured during a lecture.

[Demo (MP4, ~60MB)] [Bibtex]

@inproceedings{zhao2019new,
title={A new visual interface for searching and navigating slide-based lecture videos},
author={Zhao, Baoquan and Xu, Songhua and Lin, Shujin and Wang, Ruomei and Luo, Xiaonan},
booktitle={2019 IEEE International Conference on Multimedia and Expo (ICME)},
pages={928--933},
year={2019},
organization={IEEE}
}

C. Xu, R. Wang, S. Lin, X. Luo, B. Zhao, L. Shao, and M. Hu, Lecture2Note: Automatic Generation of Lecture Notes from Slide-Based Educational Videos, 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 898-903, 2019

Most educational/lecture videos on the internet are lengthy and lack of elaborate annotations. we introduced an novel method to generate note-like video summarizations by establishing the semantic relationship between visual entities in the slide-based lecture video and their descriptive speech texts.

[Bibtex]

@inproceedings{xu2019lecture2note,
title={Lecture2Note: Automatic Generation of Lecture Notes from Slide-Based Educational Videos},
author={Xu, Chengpei and Wang, Ruomei and Lin, Shujin and Luo, Xiaonan and Zhao, Baoquan and Shao, Lijie and Hu, Mengqiu},
booktitle={2019 IEEE International Conference on Multimedia and Expo (ICME)},
pages={898--903},
year={2019},
organization={IEEE}
}

F. Wang, S. Lin, R. Wang, Y. Li, B. Zhao, and X. Luo, Improving Incompressible SPH Simulation Efficiency by Integrating Density-Invariant and Divergence-Free Conditions, ACM SIGGRAPH (Posters), pp. 1-2, 2018

Our method shortens the time of fluid simulation by coupling the two conditions of density-invariant and divergence-free, and achieves the same simulation effect compared with other methods. Further, we regard the displacement of particles as the only basic variable of the continuity equation, which improves the stability of the fluid to a certain extent.

[Demo (MP4, ~70MB)] [Bibtex]

@incollection{wang2018improving,
title={Improving incompressible SPH simulation efficiency by integrating density-invariant and divergence-free conditions},
author={Wang, Fei and Lin, Shujin and Wang, Ruomei and Li, Yi and Zhao, Baoquan and Luo, Xiaonan},
booktitle={ACM SIGGRAPH 2018 Posters},
pages={1--2},
year={2018}
}

B. Zhao, S. Lin, X. Luo, S. Xu, and R. Wang, A Novel System for Visual Navigation of Educational Videos Using Multimodal Cues, ACM Multimedia, pp. 1680-1688, 2017

The system tightly integrates multimodal cues obtained from the visual, audio and textual channels of educational videos and presents them with a series of interactive visualization components. With the help of this system, users can explore the educational video content using multiple levels of details to identify content of interest with ease.

[Demo (AVI, ~60MB)] [Bibtex]

@inproceedings{zhao2017novel,
title={A novel system for visual navigation of educational videos using multimodal cues},
author={Zhao, Baoquan and Lin, Shujin and Luo, Xiaonan and Xu, Songhua and Wang, Ruomei},
booktitle={Proceedings of the 25th ACM international conference on Multimedia},
pages={1680--1688},
year={2017}
}

B. Zhao, S. Lin, X. Qi, Z. Zhang, X. Luo, and R. Wang, Automatic Generation of Visual-Textual Web Video Thumbnail, ACM SIGGRAPH ASIA (Posters), pp. 1-2, 2017

We proposed an automatic approach to generate magazine-cover-like thumbnail using the salient visual and textual metadata extracted from video. Compared with traditional snapshot, the synthesized thumbnail is more informative and attractive, which would be helpful for online video selection.

[Bibtex]

@incollection{zhao2017automatic,
title={Automatic generation of visual-textual web video thumbnail},
author={Zhao, Baoquan and Lin, Shujin and Qi, Xin and Zhang, Zhiquan and Luo, Xiaonan and Wang, Ruomei},
booktitle={SIGGRAPH Asia 2017 Posters},
pages={1--2},
year={2017}
}

B. Zhao, S. Xu, S. Lin, X. Luo, and L. Duan, A New Visual Navigation System for Exploring Biomedical Open Educational Resource (OER) Videos, Journal of the American Medical Informatics Association, vol. 23, no. e1, pp. e34-e41, 2016

Biomedical videos as open educational resources (OERs) are increasingly proliferating on the Internet. Unfortunately, seeking personally valuable content from among the vast corpus of quality yet diverse OER videos is nontrivial due to limitations of today's keyword- and content-based video retrieval techniques. To address this need, this study introduces a novel visual navigation system that facilitates users' information seeking from biomedical OER videos in mass quantity by interactively offering visual and textual navigational clues that are both semantically revealing and user-friendly.

[Demo 1: Architecture (MP4, ~18MB)] [Demo 2: System (MP4, ~75MB)] [Bibtex]

@article{zhao2016new,
title={A new visual navigation system for exploring biomedical Open Educational Resource (OER) videos},
author={Zhao, Baoquan and Xu, Songhua and Lin, Shujin and Luo, Xiaonan and Duan, Lian},
journal={Journal of the American Medical Informatics Association},
volume={23},
number={e1},
pages={e34--e41},
year={2016},
publisher={Oxford University Press}
}