Abstract
Truly dynamic sparse training (T-DST), unlocking the great potential of dynamic sparse training for achieving comparable or even higher accuracy with less resource cost (less yields more), has been a renewed research topic towards green AI. However, certain aspects of T-DST, such as its sensitivity to the dataset, network architecture, and sparsity strategy, are still not well understood. In this paper, we first implement the truly sparse training for Rigged Lottery (RigL) algorithm, then evaluate its “less yields more” hypothesis by demonstrating that “95% fewer parameters and FLOPs yield up to 33% test accuracy improvement” on CIFAR100. Further on, we take broader insights into how the dataset size, the activation function, and weights distribution affect the performance of the neural network with T-DST. Based on the empirical study, we summarize a guideline in order to
exploit “less yields more” in T-DST, hoping to catalyze research progress on the topic of T-DST. Our code will be available online.
exploit “less yields more” in T-DST, hoping to catalyze research progress on the topic of T-DST. Our code will be available online.
Original language | English |
---|---|
Number of pages | 7 |
Publication status | Published - 1 May 2023 |
Event | ICLR 2023 Workshop on Sparsity in Neural Networks - Kigali, Rwanda Duration: 5 May 2023 → 5 May 2023 https://www.sparseneural.net/ |
Workshop
Workshop | ICLR 2023 Workshop on Sparsity in Neural Networks |
---|---|
Country/Territory | Rwanda |
City | Kigali |
Period | 5/05/23 → 5/05/23 |
Internet address |