Challenge Background Track Introduction Process & Timeline Awards General Rules Contact

Particapate

TRACK1：Manipulation TRACK2：Navigation

IROS 2025 Workshop >>

Organizer

Co-organizer

Sponsors

Challenge Background

Evaluating embodied intelligence remains a fundamental challenge in the field. Physical-world testing often fails to control all variables, resulting in unreproducible outcomes, high replication costs, and difficulties in conducting large-scale assessments. While simulation-based evaluation addresses reproducibility issues, current virtual platforms suffer from significant reality gaps and lack proper benchmarking against real-world tests, compromising result credibility.

As a core part of the Workshop on Multimodal Robot Learning in Physical Worlds, IROS 2025, this competition considers the above challenges and features two common embodied tasks across dual phases: a simulated round hosted on InternUtopia, a high-fidelity simulation platform, followed by real-world testing. This dual-phase design aims to drive innovation in both model architectures and training methodologies within the field.

Participants are required to utilize interactively collected data from either real-world or simulated environments to accurately interpret complex scenarios and make contextually appropriate decisions. The challenge explores key issues in transferring skills from simulation to reality, including domain gaps, multimodal information fusion (vision, language, and action), and scalable sim-to-real transfer techniques.

We hope this event will bring together researchers and practitioners from around the world to explore cutting-edge topics in multimodal robot learning, laying a solid foundation for the future development of intelligent robotics.

Track Introduction

Vision-Language Manipulation in Open Tabletop Environments

This track focuses on developing multimodal robotic manipulator capable of understanding and executing task instructions. Participants are required to design end-to-end control policy models that integrate visual perception, instruction following, and action prediction. The robot must operate within a simulated physics-based environment, using platforms such as robotic arms or mobile dual-arm systems, to carry out a variety of manipulation tasks. The challenges are rooted in open tabletop scenarios, diverse task instructions, and multiple manipulation skills.The official competition rules are as specified on the [challenge page].

Key Challenges Include:

Effectively fusing visual and linguistic information to drive a unified perception-decision-control pipeline;
Robustly interpreting natural language instructions and executing multi-skill manipulation behaviors on physically simulated robotic arms or mobile manipulators;
Achieving generalization at both the task and object levels to support diverse and long-horizon manipulation tasks in open tabletop environments.

participant >

Vision-and-Language Navigation in Physical Environments

This track focuses on developing multimodal mobile robot navigation systems with language understanding capabilities. Participants are required to devise a navigation agent that is capable of egocentric visual perception and natural language instruction comprehension to trajectory history modeling and navigation action prediction. The agent will be evaluated in a realistic physics-based simulation environment, operating a legged robot (e.g., the humanoid Unitree H1) to perform indoor navigation tasks guided by language instructions. The system should be capable of handling challenges such as camera shake, height variation, and local obstacle avoidance, ultimately achieving robust and safe vision-and-language navigation.The official competition rules are as specified on the [challenge page].

Key Challenges Include:

Integrating visual and language inputs to drive a unified perception-decision-control pipeline;
Ensuring robust performance on a humanoid robot platform within a physics engine, especially under camera shake, dynamic height changes, and local obstacle interactions during walking;
Producing human-like navigation behavior to complete instruction-following tasks in complex indoor environments.

participant >

Process & Timeline

07/25

Competition Start & Materials Release

01

Registration

Go register >>

02

Guidelines

A baseline model, training datasets, and tutorials are provided to support model development.

Manip Track \ Nav Track

03

Model Training

Train your model using the provided data and baselines, or develop your own pipeline.

07/30

Test Server Open

10/07

Test Server Close

10/10

Results Announcement

Final rankings will be announced on October 10th.

04

Model Submission

Submit your solution to the online test server.

Submit the model >>
Manip Track \ Nav Track

10/18

Onsite Challenge

05

Onsite Challenge

In each track, we will select up-to-8 teams for the onsite challenge.

10/20

Results Announcement

06

Workshop Day Champions Announcement

On October 20th, the winners will present their work at our workshop in Hangzhou, China.

Awards

1st

$10000

2nd

$5000

3rd

$3000

Travel Grant: $1,500

For each selected team from both tracks

Additional prizes and official certificates will also be awarded

Top performers can receive:

Internship opportunity at Shanghai AI Lab
Direct access to the campus recruitment interview, entering the JOB TALK stage

General Rules

Eligibility

A participant must be a team member and cannot be a member of multiple teams.

Participants can form teams of up to 10 members.

A team is limited to one submission account.

A team can participate in multiple tracks.

An entity can have multiple teams.

Attempting to hack the test set or engaging in similar behaviors will result in disqualification.

Technical

All publicly available datasets and pretrained weights are allowed.

Unauthorized access to test sets is strictly prohibited.

The use of large-scale model APIs (e.g., GPT, Claude, Gemini, etc.) is not permitted.

Award & Voucher

All participants that make a valid submission and submit their team name and related information before the leaderboard is opened will receive an electronic certificate of participation.

Teams must make their results public on the leaderboard before the submission deadline.

Code or Docker image must be opensourced.

Organizers reserve the right to update the rules or disqualify teams for violations. Winners will be awarded the following prizes (per track):

1st Place:$10,000 cash prize, $1,500 travel subsidy, additional prizes, and a certificate
2nd Place: $5,000 cash prize, $1,500 travel subsidy, additional prizes, and a certificate
3rd Place: $3,000 cash prize, $1,500 travel subsidy, additional prizes, and a certificate
4th–10th Place: Prizes and a certificate

Onsite Challenge

Real-robot execution of manipulation and navigation tasks under natural language instructions, extending the online simulation phase.

The unified robot platform with its supporting hardware will be provided by the organizers.

Teams must package their code into a container before the event; a unified debugging environment will be provided onsite.

Model weights and configurations may be adjusted, but the core algorithmic logic cannot be changed.

Tasks will be executed sequentially with referee resets; each task allows limited retries within the time limit.

No onsite data collection, pre-mapping, or manual intervention is permitted.

Final Score = Rank-based Online Score (40%) + Onsite Score (60%).

More details: Manipulation Track Rules | Navigation Track Rules

embodiedai@pjlab.org.cn

Discord

Join Us

Wechat

Wechat Group

Organizer：Shanghai AI Lab
Co-organizer：ManyCore Tech, University of Adelaide
Sponsors (order not indicative of ranking)：ByteDance, HUAWEI, ENGINEAI, HONOR, ModelScope, Alibaba Cloud, AGILEX, DOBOT

Contributors

We gratefully acknowledge the collaborations from our excellent contributors.

Vision-Language Manipulation in Open Tabletop Environments

Key Challenges Include:

Vision-and-Language Navigation in Physical Environments

Key Challenges Include:

07/25

01

Registration

02

Guidelines

03

Model Training

07/30

10/07

10/10

04

Model Submission

10/18

05

Onsite Challenge

10/20

06

Workshop Day Champions Announcement

1st

2nd

3rd

Eligibility

Technical

Award & Voucher

Onsite Challenge

Manipulation Track

Navigation Track

Logistics