Xiaohuan Pei (Terry) is a PhD student in Computer Science at the University of Sydney (USYD), supervised by Prof. Chang Xu. He collaborate with A/Prof. Tao Huang at SJTU, RA/Prof. Yanxi Li at NTU, A/Prof. Minjing Dong at CityU, Yuheng Shi at USYD and researcher Pichao Wang at NVIDIA. He is a visiting graduate researcher at the University of California, Los Angeles (UCLA), hosted by Prof. Cho-Jui Hsieh.
Recently, he is working on large-scale pretraining of foundation models from scratch at the billion-parameter level, with a particular focus on principled training recipes spanning Stage-1 (Alignment), Stage-2 (SFT), and a newly designed Stage-3 paradigm. In parallel, he investigates foundation models for autonomous driving, emphasizing scalable pretraining pipelines and efficiency-oriented inference to support real-world deployment.
He is also a nationally certified table tennis 🏓 athlete and registered professional coach.
Openning to one research intern position working on efficiency for World Model.
📝 Selected Work

Action-aware Dynamic Pruning for Efficient Vision-Language-Action Manipulation
Xiaohuan Pei*, Yuxing Chen*, Siyu Xu, Yunke Wang, Yuheng Shi, Chang Xu

Self-Distilled RoI Predictors for Fine-Grained MLLM Perception
Yuheng Shi, Xiaohuan Pei, Minjing Dong, Chang Xu

Light Future-aware Masking for Vision-Language Inference
Xiaohuan Pei, Tao Huang, Yanxiang Ma, Chang Xu

Cross-Self KV Cache Pruning for Efficient Vision-Language Inference
Xiaohuan Pei, Tao Huang, Chang Xu

EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba
Xiaohuan Pei, Tao Huang, Chang Xu
Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2025

Xiaohuan Pei, Yanxi Li, Minjing Dong, Chang Xu
The International Conference on Learning Representations (ICLR), 2024

LocalMamba: Visual State Space Model with Windowed Selective Scan
Tao Huang, Xiaohuan Pei, Chang Xu
The European Conference on Computer Vision (ECCV), Workshop, 2024
🧑🏻💻 Preprints
Cross-Self KV Cache Pruning for Efficient Vision-Language Inference.
Xiaohuan Pei, Tao Huang, Chang Xu.
arXiv preprint [arXiv:2412.04652].
GPT self-supervision for a better data annotator.
Xiaohuan Pei, Yanxi Li, Chang Xu.
arXiv preprint [arXiv:2306.04349] (2023).
Text-driven Neural Architecture Embeddings and Retrieval.
Xiaohuan Pei, Yanxi Li, Minjing Dong, Chang Xu.
🧑🏻💻 Academic Publications
Action-aware Dynamic Pruning for Efficient Vision-Language-Action
Manipulation.
Xiaohuan Pei, Yuxing Chen, Siyu Xu, Yunke Wang, Yuheng Shi, Chang Xu
The Fourteenth International Conference on Learning Representations (ICLR 2026). (Core Rank A*)
Catching the details: self-distilled ROI predictors for fine-grained Vision-Language-Model perception.
Yuheng Shi, Xiaohuan Pei, Minjing Dong, Chang Xu.
The Fourteenth International Conference on Learning Representations (ICLR 2026). (Core Rank A*)
Rethinking Causal Mask Attention for Vision-Language Inference.
Xiaohuan Pei, Tao Huang, Yanxiang Ma, Chang Xu.
The Fourteenth International Conference on Learning Representations (ICLR 2026). (Core Rank A*)
Efficientvmamba: Atrous selective scan for light weight visual mamba.
Xiaohuan Pei, Tao Huang, and Chang Xu.
Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2025). (Core Rank A*)
Localmamba: Visual state space model with windowed selective scan.
Tao Huang, Xiaohuan Pei, Chang Xu.
European Conference on Computer Vision (ECCV 2024), Workshop. (Core Rank A*)
Neural Architecture Retrieval.
Xiaohuan Pei, Yanxi Li, Minjing Dong, Chang Xu.
The Twelfth International Conference on Learning Representations (ICLR 2024). (Core Rank A*)
Contrastive code-comment pre-training.
Xiaohuan Pei, Daochang Liu, Qian Luo, Chang Xu.
IEEE International Conference on Data Mining (ICDM 2022). (Core Rank A*)
Self-attention gated cognitive diagnosis for faster adaptive educational assessments.
Xiaohuan Pei, Shuo Yang, Jiajun Huang, Chang Xu.
IEEE International Conference on Data Mining (ICDM 2022). (Core Rank A*)
TCNAS: Transformer Architecture Evolving in Clone Detection.
Hongyan Xu, Xiaohuan Pei, Shan You, Chang Xu.
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024). (Core Rank A)
📖 Teaching
-
2024, Guest Lecture, Artifical Intelligence, The University of Sydney
-
2023, 2025, Tutor for COMP5329 Deep Learning, The University of Sydney
🎖 Awards
- ICDM Best Student Paper Award
- Two Full Scholarship Awards
- Outstanding Graduate
- National Second Prize in Mathematics Competition
- Provincial Prize in C++ Programming Competition
🌟 Funding Grants
- The National Computational Infrastructure (NCI) Adapter Scheme, Australia
Services
- Reviewer of TPAMI, ICML, NeurIPS, ICLR, CVPR, ICCV, KDD, ICDM.
Contact: xiaohuan.pei at sydney.edu.au, terrypei123 at gmail.com