Academic Project Page

SparseSSM:
Efficient Selective Structured State Space Models Can Be Pruned in One-Shot

¹Westlake University ²Tongji University

^*Corresponding author: wanghuan@westlake.edu.cn

Abstract

State-space language models such as Mamba match Transformer quality while permitting linear complexity inference, yet still comprise billions of parameters that hinder deployment. Existing one-shot pruning methods are tailored to attention blocks and fail to account for the time-shared and discretized state-transition matrix at the heart of the selective state-space module (SSM). In this paper, we introduce SparseSSM, the first training-free pruning framework that extends the classic optimal brain surgeon (OBS) framework to state space architectures. Our layer-wise algorithm (i) derives an approximate second-order saliency score that aggregates Hessian-trace information across time steps, (ii) incorporates a component sensitivity analysis to guide feed-forward network (FFN) pruning, which also sheds light on where redundancy resides in mamba architecture, (iii) can be easily extended to semi-structured and structured sparsity. Empirically, we prune 50% of SSM weights without fine-tuning and observe no zero-shot accuracy loss, achieving the current state-of-the-art pruning algorithm for Mamba-based LLMs.

Detailed Algorithm of Our Proposed Method

We introduce SparseSSM that adapts the classic optimal brain surgeon framework to the selective SSM module in Mamba. Our method computes approximate second-order weight importance for the time-sharing SSM parameters, enabling principled one-shot pruning of the SSM layers. This is the first application of OBS-based pruning to Mamba’s architecture, addressing the challenges of its discrete diagonalized design.
We further improve SparseSSM with two complementary techniques. First, we propose a mask aggregation method to address the time-sharing nature of the SSM module. Second, we provide an in-depth analysis of the components of Mamba and compare their pruning tolerance, which informs the FFN pruning strategy, guiding which linear projections should be pruned more conservatively, and sheds light on where redundancy resides in Mamba.

Experiment Results

Performance analysis for one-shot unstructured pruning of SSM modules in Mamba models (130M ∼ 1.4B) at 50% sparsity. Here, ↓ lower metrics reflect better outcomes, and ↑ denotes higher metrics reflect better outcomes.

Performance analysis for one-shot unstructured pruning of the whole Mamba models (130M ∼ 1.4B) at 50% sparsity. Here, ↓ lower metrics reflect better outcomes, and ↑ denotes higher metrics reflect better outcomes.

mamba pruning across multipul sparsities

Performance of the full Mamba architecture at multiple sparsity levels by measuring zero-shot task accuracy and Wikitext perplexity

Performance analysis for one-shot structured pruning of the SSM module in Mamba-370M.

@article{tuo2025sparsessm, title={SparseSSM: Efficient Selective Structured State Space Models Can Be Pruned in One-Shot}, author={Kaiwen Tuo and Huan Wang}, journal={arXiv preprint arXiv:2506.09613}, year={2025} }