Shancheng Fang-深圳大学未来媒体技术与计算研究所

Shancheng Fang

Shancheng Fang, Assistant Professor, Research Institute for Future Media Computing
Expertise: Multimodal Large Models, Image/Video Generation, Multimodal Post-Training and Agents
Contact Details：fangsc@szu.edu.cn

Work and Research Experience

2025 – To date: Assistant Professor, Research Institute for Future Media Computing, Shenzhen University, China
2023 – 2025: Beijing Yuanshi Technology Co., Ltd., China
2022 – 2023: Beijing ByteDance Co., Ltd., China
2020 – 2022: Postdoc, University of Science and Technology of China, China
2015 – 2020: PhD, Institute of Information Engineering, Chinese Academy of Sciences, China

Representative Publications

[1] Shancheng Fang, Zhendong Mao, Hongtao Xie, Yuxin Wang, Chenggang Yan, Yongdong Zhang. ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 2023.

[2] Shancheng Fang, Hongtao Xie, Yuxin Wang, Zhendong Mao, Yongdong Zhang. Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021.

[3] Yadong Qu, Shancheng Fang*, Yuxin Wang, Xiaorui Wang, Zhineng Chen, Hongtao Xie, Yongdong Zhang. IGD: Instructional Graphic Design with Multimodal Layer Generation. Proceedings of the IEEE/CVF International Conference on Computer Vision 2025.

[4] Tianhao Qi, Jianlong Yuan, Wanquan Feng, Shancheng Fang*, Jiawei Liu, SiYu Zhou, Qian He, Hongtao Xie, Yongdong Zhang. Mask2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025.

[5] Yaqi Cai, Shancheng Fang*, Yadong Qu, Xiaorui Wang, Meng Shao, Hongtao Xie. IterMeme: Expert-Guided Multimodal LLM for Interactive Meme Creation with Layout-Aware Generation. International Joint Conference on Artificial Intelligence 2025.

[6] Fengyi Fu, Shancheng Fang*, Weidong Chen, Yan Song, Zhendong Mao, Yongdong Zhang. Sentiment-oriented transformer-based variational autoencoder network for live video commenting. ACM Transactions on Multimedia Computing, Communications, and Applications 2023.

[7] Jingjing Zhang, Shancheng Fang*, Zhendong Mao, Zhiwei Zhang, Yongdong Zhang. Fine-tuning with Multi-modal Entity Prompts for News Image Captioning. Proceedings of the ACM International Conference on Multimedia 2022.

[8] Yuxin Wang, Hongtao Xie, Shancheng Fang*, Jing Wang, Shenggao Zhu, Yongdong Zhang. From two to one: A new scene text recognizer with visual language modeling network. Proceedings of the IEEE/CVF International Conference on Computer Vision 2021.

[9] Jianjun Chen, Shancheng Fang*, Hongtao Xie, Zheng-Jun Zha, Yue Hu, Jianlong Tan. End-to-end Boundary Exploration for Weakly-supervised Semantic Segmentation. Proceedings of the ACM International Conference on Multimedia 2021.

[10] Shancheng Fang, Hongtao Xie, Jianjun Chen, Jianlong Tan, Yongdong Zhang. Learning to Draw Text in Natural Images with Conditional Adversarial Networks. International Joint Conference on Artificial Intelligence 2019.

[11] Shancheng Fang, Hongtao Xie, Zheng-Jun Zha, Nannan Sun, Jianlong Tan, Yongdong Zhang. Attention and Language Ensemble for Scene Text Recognition with Convolutional Sequence Modeling. Proceedings of the ACM International Conference on Multimedia 2018.

[12] Hongtao Xie, Shancheng Fang*, Zheng-Jun Zha, Yating Yang, Yan Li, Yongdong Zhang. Convolutional Attention Networks for Scene Text Recognition. ACM Transactions on Multimedia Computing, Communications, and Applications 2019.

[13] Shancheng Fang, Hongtao Xie, Zhineng Chen, Yizhi Liu, Yan Li. Uyghur Text Matching in Graphic Images for Biomedical Semantic Analysis. Neuroinformatics 2018.

[14] Shancheng Fang, Hongtao Xie, Zhineng Chen, Shiai Zhu, Xiaoyan Gu, Xingyu Gao. Detecting Uyghur text in complex background images with convolutional neural network. Multimedia Tools and Applications 2017.