Outstanding graduates are invited to apply for full PhD Studentships within the Research Institute. For further details, please click Here
People

                   

Shancheng Fang

  • Shancheng Fang, Assistant Professor, Research Institute for Future Media Computing

  • Expertise: Multimodal Large Models, Image/Video Generation, Multimodal Post-Training and Agents

  • Contact Details:fangsc@szu.edu.cn


Work and Research Experience

  • 2025 – To date: Assistant Professor, Research Institute for Future Media Computing, Shenzhen University, China

  • 2023 – 2025: Beijing Yuanshi Technology Co., Ltd., China

  • 2022 – 2023: Beijing ByteDance Co., Ltd., China

  • 2020 – 2022: Postdoc, University of Science and Technology of China, China

  • 2015 – 2020: PhD, Institute of Information Engineering, Chinese Academy of Sciences, China



Representative Publications

[1] Shancheng Fang, Zhendong Mao, Hongtao Xie, Yuxin Wang, Chenggang Yan, Yongdong Zhang. ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence 2023.

[2] Shancheng Fang, Hongtao Xie, Yuxin Wang, Zhendong Mao, Yongdong Zhang. Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021.

[3] Yadong Qu, Shancheng Fang*, Yuxin Wang, Xiaorui Wang, Zhineng Chen, Hongtao Xie, Yongdong Zhang. IGD: Instructional Graphic Design with Multimodal Layer Generation. Proceedings of the IEEE/CVF International Conference on Computer Vision 2025.

[4] Tianhao Qi, Jianlong Yuan, Wanquan Feng, Shancheng Fang*, Jiawei Liu, SiYu Zhou, Qian He, Hongtao Xie, Yongdong Zhang. Mask2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025.

[5] Yaqi Cai, Shancheng Fang*, Yadong Qu, Xiaorui Wang, Meng Shao, Hongtao Xie. IterMeme: Expert-Guided Multimodal LLM for Interactive Meme Creation with Layout-Aware Generation. International Joint Conference on Artificial Intelligence 2025.

[6] Fengyi Fu, Shancheng Fang*, Weidong Chen, Yan Song, Zhendong Mao, Yongdong Zhang. Sentiment-oriented transformer-based variational autoencoder network for live video commenting. ACM Transactions on Multimedia Computing, Communications, and Applications 2023.

[7] Jingjing Zhang, Shancheng Fang*, Zhendong Mao, Zhiwei Zhang, Yongdong Zhang. Fine-tuning with Multi-modal Entity Prompts for News Image Captioning. Proceedings of the ACM International Conference on Multimedia 2022.

[8] Yuxin Wang, Hongtao Xie, Shancheng Fang*, Jing Wang, Shenggao Zhu, Yongdong Zhang. From two to one: A new scene text recognizer with visual language modeling network. Proceedings of the IEEE/CVF International Conference on Computer Vision 2021.

[9] Jianjun Chen, Shancheng Fang*, Hongtao Xie, Zheng-Jun Zha, Yue Hu, Jianlong Tan. End-to-end Boundary Exploration for Weakly-supervised Semantic Segmentation. Proceedings of the ACM International Conference on Multimedia 2021.

[10] Shancheng Fang, Hongtao Xie, Jianjun Chen, Jianlong Tan, Yongdong Zhang. Learning to Draw Text in Natural Images with Conditional Adversarial Networks. International Joint Conference on Artificial Intelligence 2019.

[11] Shancheng Fang, Hongtao Xie, Zheng-Jun Zha, Nannan Sun, Jianlong Tan, Yongdong Zhang. Attention and Language Ensemble for Scene Text Recognition with Convolutional Sequence Modeling. Proceedings of the ACM International Conference on Multimedia 2018.

[12] Hongtao Xie, Shancheng Fang*, Zheng-Jun Zha, Yating Yang, Yan Li, Yongdong Zhang. Convolutional Attention Networks for Scene Text Recognition. ACM Transactions on Multimedia Computing, Communications, and Applications 2019.

[13] Shancheng Fang, Hongtao Xie, Zhineng Chen, Yizhi Liu, Yan Li. Uyghur Text Matching in Graphic Images for Biomedical Semantic Analysis. Neuroinformatics 2018.

[14] Shancheng Fang, Hongtao Xie, Zhineng Chen, Shiai Zhu, Xiaoyan Gu, Xingyu Gao. Detecting Uyghur text in complex background images with convolutional neural network. Multimedia Tools and Applications 2017.

Research Institute for Future Media Computing,Shenzhen University 2014 - 2022