SparseSSM:
Efficient Selective Structured State Space Models Can Be Pruned in One-Shot

1Westlake University 2Tongji University
*Corresponding author: wanghuan@westlake.edu.cn

WLU
TJU
PDF Preview

We introduce SparseSSM, an OBS–based pruning methodology tailored to SSM-based architectures such as Mamba, effectively reducing redundancy while preserving model performance. we prune 50% of SSM weights without fine-tuning and observe no zero-shot accuracy loss, achieving the current state-of-the-art pruning algorithm for Mamba-based LLMs.

Abstract

State-space language models such as Mamba match Transformer quality while permitting linear complexity inference, yet still comprise billions of parameters that hinder deployment. Existing one-shot pruning methods are tailored to attention blocks and fail to account for the time-shared and discretized state-transition matrix at the heart of the selective state-space module (SSM). In this paper, we introduce SparseSSM, the first training-free pruning framework that extends the classic optimal brain surgeon (OBS) framework to state space architectures. Our layer-wise algorithm (i) derives an approximate second-order saliency score that aggregates Hessian-trace information across time steps, (ii) incorporates a component sensitivity analysis to guide feed-forward network (FFN) pruning, which also sheds light on where redundancy resides in mamba architecture, (iii) can be easily extended to semi-structured and structured sparsity. Empirically, we prune 50% of SSM weights without fine-tuning and observe no zero-shot accuracy loss, achieving the current state-of-the-art pruning algorithm for Mamba-based LLMs.

Detailed Algorithm of Our Proposed Method

  • We introduce SparseSSM that adapts the classic optimal brain surgeon framework to the selective SSM module in Mamba. Our method computes approximate second-order weight importance for the time-sharing SSM parameters, enabling principled one-shot pruning of the SSM layers. This is the first application of OBS-based pruning to Mamba’s architecture, addressing the challenges of its discrete diagonalized design.
  • We further improve SparseSSM with two complementary techniques. First, we propose a mask aggregation method to address the time-sharing nature of the SSM module. Second, we provide an in-depth analysis of the components of Mamba and compare their pruning tolerance, which informs the FFN pruning strategy, guiding which linear projections should be pruned more conservatively, and sheds light on where redundancy resides in Mamba.

SparseSSM Algorithm

Experiment Results

BibTeX


        @article{tuo2025sparsessm,
          title={SparseSSM: Efficient Selective Structured State Space Models Can Be Pruned in One-Shot},
          author={Kaiwen Tuo and Huan Wang},
          journal={arXiv preprint arXiv:2506.09613},
          year={2025}
        }
      

Visitor Location Map