ICLR 2026

Action-aware Dynamic Pruning for
Efficient Vision-Language-Action Manipulation

1University of Sydney
Motivation: token redundancy varies across manipulation stages

Motivation. Visual token redundancy varies significantly across robot manipulation stages. VLA-ADP exploits end-effector motion as a dynamic gating signal to identify and prune redundant tokens at each timestep, reducing computation without sacrificing task success.

Real-world ALOHA demonstrations — VLA-ADP applied to OpenVLA-OFT (1.5× speed)

Task 1: Put the white mug on the plate and put the chocolate on the plate
Task 2: Open the top drawer and put the bowl inside
Task 3: Pick up the tomato sauce and place it in the basket
Task 4: Pick up the black bowl next to the plate and place it on the rack
1.49×
Real-world latency speedup
76.9 ms → 51.8 ms
88.3%
Real-world success rate
up from 85.8% baseline
1.35×
LLM speedup on LIBERO
at 30–40% token keep ratio
≤0.9%
SR drop at 50–70%
keep ratio (LIBERO)

Abstract


We propose Action-aware Dynamic Pruning (ADP), a training-free, plug-and-play method that adaptively prunes redundant visual tokens across manipulation stages by combining text-driven token relevance with an action-aware gating signal derived from end-effector motion.

Method Overview


ADP method overview

ADP Architecture. ADP maintains an observation window of past states and uses end-effector velocity/acceleration to produce a dynamic gating decision. The gate selects between sparse and dense token retention ratios, and text-driven cross-attention scores rank tokens by relevance before pruning.

Token pruning visualization

Token Pruning. Spatially redundant background tokens (low attention score) are removed while task-relevant tokens are preserved, maintaining action prediction fidelity.

Experimental Results


Simulation Results (LIBERO Benchmark)

LIBERO simulation results table

Comparison against OpenVLA, SparseVLM, FastVLM, and other VLA methods across four LIBERO task suites (Spatial, Object, Goal, Long). VLA-ADP achieves 94.4–99.0% SR with 1.13–1.35× LLM speedup.

LIBERO task suite visualization

LIBERO benchmark task suites used for simulation evaluation: Spatial, Object, Goal, and Long.

Real-World Results (ALOHA Robot)

Real-world results table

VLA-ADP improves SR from 85.8% to 88.3% while reducing latency by 33% (76.9 → 51.8 ms), achieving a 1.49× speedup on real hardware.

Real-world robot setup

Real-world experimental setup: bimanual ALOHA robot performing tabletop manipulation tasks.

Citation


@article{pei2025action,
  title={Action-aware dynamic pruning for efficient
         vision-language-action manipulation},
  author={Pei, Xiaohuan and Chen, Yuxing and Xu, Siyu
          and Wang, Yunke and Shi, Yuheng and Xu, Chang},
  journal={arXiv preprint arXiv:2509.22093},
  year={2025}
}

Acknowledgements


We thank the authors of OpenVLA-OFT, OpenVLA, and Hugging Face Transformers for making their code publicly available. This project page was inspired by the Nerfies template.