VLA-ADP: Action-aware Dynamic Pruning for Efficient VLA Manipulation

Motivation: token redundancy varies across manipulation stages

Motivation. Visual token redundancy varies significantly across robot manipulation stages. VLA-ADP exploits end-effector motion as a dynamic gating signal to identify and prune redundant tokens at each timestep, reducing computation without sacrificing task success.

Real-world ALOHA demonstrations — VLA-ADP applied to OpenVLA-OFT (1.5× speed)

Task 1: Put the white mug on the plate and put the chocolate on the plate

Task 2: Open the top drawer and put the bowl inside

Task 3: Pick up the tomato sauce and place it in the basket

Task 4: Pick up the black bowl next to the plate and place it on the rack

1.49×

Real-world latency speedup
76.9 ms → 51.8 ms

88.3%

Real-world success rate
up from 85.8% baseline

1.35×

LLM speedup on LIBERO
at 30–40% token keep ratio

≤0.9%

SR drop at 50–70%
keep ratio (LIBERO)

Abstract

We propose Action-aware Dynamic Pruning (ADP), a training-free, plug-and-play method that adaptively prunes redundant visual tokens across manipulation stages by combining text-driven token relevance with an action-aware gating signal derived from end-effector motion.

Method Overview

ADP Architecture. ADP maintains an observation window of past states and uses end-effector velocity/acceleration to produce a dynamic gating decision. The gate selects between sparse and dense token retention ratios, and text-driven cross-attention scores rank tokens by relevance before pruning.

Token Pruning. Spatially redundant background tokens (low attention score) are removed while task-relevant tokens are preserved, maintaining action prediction fidelity.

Experimental Results

Simulation Results (LIBERO Benchmark)

Comparison against OpenVLA, SparseVLM, FastVLM, and other VLA methods across four LIBERO task suites (Spatial, Object, Goal, Long). VLA-ADP achieves 94.4–99.0% SR with 1.13–1.35× LLM speedup.

LIBERO benchmark task suites used for simulation evaluation: Spatial, Object, Goal, and Long.

Real-World Results (ALOHA Robot)

VLA-ADP improves SR from 85.8% to 88.3% while reducing latency by 33% (76.9 → 51.8 ms), achieving a 1.49× speedup on real hardware.

Real-world experimental setup: bimanual ALOHA robot performing tabletop manipulation tasks.

Citation

@article{pei2025action,
  title={Action-aware dynamic pruning for efficient
         vision-language-action manipulation},
  author={Pei, Xiaohuan and Chen, Yuxing and Xu, Siyu
          and Wang, Yunke and Shi, Yuheng and Xu, Chang},
  journal={arXiv preprint arXiv:2509.22093},
  year={2025}
}

Acknowledgements

We thank the authors of OpenVLA-OFT, OpenVLA, and Hugging Face Transformers for making their code publicly available. This project page was inspired by the Nerfies template.

Action-aware Dynamic Pruning for Efficient Vision-Language-Action Manipulation