Yanjiang Guo

Hi! I am a fourth-year CS PhD student at Tsinghua University, advised by Prof. Jianyu Chen (Founder of RobotEra). Previously, I got my bachelor's degree from Dept. of EE, Tsinghua University in 2022. I have also spent time at Tencent, SenseTime, RobotEra and Shanghai Qi Zhi Institute as interns.

My research focuses on Embodied AI and Generative Models, with a particular emphasis on training robot foundation models capable of performing a wide range of tasks in physical world. I prefer simple and scalable methods :)

Life Update: Currently, I am a visiting researcher at Stanford University and fortunate to work with Prof. Chelsea Finn (Co-founder of Physical Intelligence).

Email  /  Scholar  /  Twitter  /  Github

Honors and Awards:
[2024.07] Best Paper Award Finalists in RSS 2024.
[2022.06] Outstanding Graduates Award (Top 10% Tsinghua undergraduate students).
[2017.11] Silver Medal in 34th National Physics Olympiad (CPhO).

profile photo

Selected Research (* indicates equal contribution)

Generative World Models:

Ctrl-World: A Controllable Generative World Model for Robot Manipulation
Yanjiang Guo*, Lucy Xiaoyang Shi*, Jianyu Chen, Chelsea Finn
Coming Soon
project page

We train a controllable generative world model that can be used to evaluate and improve SOTA generalist robot policy.
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Yucheng Hu*, Yanjiang Guo*, Pengchao Wang, Xiaoyu Chen, Yen-Jen Wang, Jianke Zhang, Koushil Sreenath, Chaochao Lu, Jianyu Chen
ICML, 2025   (Spotlight, 2.6%)
project page / code / arXiv / twitter / 机器之心 / 量子位

We finetune a general-purpose video diffusion model into manipulation-focused video prediction model to guide policy learning.
Prediction with Action: Visual Policy Learning via Joint Denoising Process
Yanjiang Guo*, Yucheng Hu*, Jianke Zhang, Yen-Jen Wang, Xiaoyu Chen, Chaochao Lu#, Jianyu Chen#
NeurIPS, 2024
project page / code / arXiv

We jointly predict future images and robot actions in a unified DiT network, transfering physical knowledge from internet video data to robots.

Vison-Language-Action Models:

UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent
Jianke Zhang*, Yanjiang Guo*, Yucheng Hu*, Xiaoyu Chen, Jianyu Chen
ICML, 2025
arXiv / code

We incoperate both multi-modal understanding (MMU) and future prediction into VLA model, enhancing both high-level semantic knowledge and low-level visual dynamics.

Improving Vision-Language-Action Model with Online Reinforcement Learning
Yanjiang Guo*, Jianke Zhang*, Xiaoyu Chen*, Xiang Ji, Yen-Jen Wang, Yucheng Hu, Jianyu Chen
ICRA, 2025
arXiv / twitter1 / twitter2

We make some initial exploration on leveraging online RL to improve the VLA model! We notice that online RL for VLA can be extremely unstable and thus we adopted a iterative approach.

HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers
Jianke Zhang*, Yanjiang Guo*, Xiaoyu Chen, Yen-Jen Wang, Yucheng Hu, Chengming Shi, Jianyu Chen
CoRL, 2024
arXiv / twitter / 机器之心

We finetune pretrained VLM into VLA models with hierarchical transformers, keeping the generalization ability but also much higher control frequency.

DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment
Yanjiang Guo* , Yen-Jen Wang*, Lihan Zha*, Jianyu Chen
IROS, 2024
project page / arXiv

We leverage LLM to pefrom both planning and monitoring, with a fine-tuned VLM as detector.

Reinforcement Learning:

Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning
Xinyang Gu*, Yen-Jen Wang*, Xiang Zhu*, Chengming Shi*, Yanjiang Guo, Yichen Liu, Jianyu Chen
RSS, 2024   (Best Paper Award Finalists)
project page / code / arXiv / 机器之心

We train humanoid robot to master challenging terrains such as stairs, slopes, and snow grounds with zero-shot sim2real transfer.

Other Publications

Decentralized Motor Skill Learning for Complex Robotic Systems
Yanjiang Guo*, Zheyuan Jiang*, Yen-Jen Wang, Jingyue Gao, Jianyu Chen
RA-L, 2023 (with ICRA 2024)

Zero-shot policy transfer with disentangled task representation of meta-reinforcement learning
Zheng Wu, Yichen Xie, Wenzhao Lian, Changhao Wang, Yanjiang Guo, Jianyu Chen, Stefan Schaal, Masayoshi Tomizuka
ICRA, 2023

Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward
Yanjiang Guo, Jingyue Gao, Zheng Wu, Chengming Shi, Jianyu Chen
CoRL, 2022


Source code from Jon Barron.