Improving the Convergence Rates of Forward Gradient Descent with Repeated Sampling

Research output: Working paperPreprintAcademic

1 Downloads (Pure)

Abstract

Forward gradient descent (FGD) has been proposed as a biologically more plausible alternative of gradient descent as it can be computed without backward pass. Considering the linear model with $d$ parameters, previous work has found that the prediction error of FGD is, however, by a factor $d$ slower than the prediction error of stochastic gradient descent (SGD). In this paper we show that by computing $\ell$ FGD steps based on each training sample, this suboptimality factor becomes $d/(\ell \wedge d)$ and thus the suboptimality of the rate disappears if $\ell \gtrsim d.$ We also show that FGD with repeated sampling can adapt to low-dimensional structure in the input distribution. The main mathematical challenge lies in controlling the dependencies arising from the repeated sampling process.
Original languageEnglish
PublisherArXiv.org
DOIs
Publication statusPublished - 26 Nov 2024

Keywords

  • math.ST
  • cs.LG
  • cs.NE
  • stat.TH
  • 62L20, 62J05

Fingerprint

Dive into the research topics of 'Improving the Convergence Rates of Forward Gradient Descent with Repeated Sampling'. Together they form a unique fingerprint.

Cite this