Correction to “Nonparametric regression using deep neural networks with ReLU activation function”

Research output: Contribution to journalComment/Letter to the editorAcademicpeer-review

1 Citation (Scopus)
22 Downloads (Pure)

Abstract

Correction: Condition (ii) of Theorem 1 in [1] should be changed to (Equation presented). Moreover, the constants C, C' in Theorem 1 also depend on the implicit constants that appear in conditions (ii)-(iv). There are large regimes where the new condition (ii') is weaker than (ii). Explanation: Rather than choosing m and N in the proof of Theorem 1 globally, one should instead apply Theorem 5 individually to each i with (Equation presented) and (Equation presented), where 0 < c ≤ 1/2 is a sufficiently small constant. As mentioned at the beginning of the proof of Theorem 1, it is sufficient to prove the result for sufficiently large n. Therefore, we can assume that mi ≥ 1 for all i = 0,..., q and (Equation presented). The latter implies (Equation presented). If we now define L'i := 8 + (mi + 5)(1 + ⌈log2(ti ∨ βi)⌉), then there exists a network (Equation presented)ij ∈ F(L'i, (ti, 6(ti + ⌈βi⌉)Ni,..., 6(ti + ⌈βi⌉)Ni, 1), si) with si ≤ 141(ti + βi + 1)3+tiNi(mi + 6), such that (Equation presented), where Qi is any upper bound of the Hölder norms of hij, j = 1,..., di+1. We can now argue as in the original proof to show that the composite network f is in the class F(E, (d, 6ri maxi Ni,..., 6ri maxi Ni, 1), (Equation presented)), with (Equation presented). Using the definition of L'i above, it can be shown as in the original proof that (Equation presented) (log2(4) + log2(ti ∨ βi))log2(n) for all sufficiently large n. All remaining steps are the same as in the original proof of Theorem 1. The constant c in the definition of Ni will also depend on the implicit constant in the conditions L ≾ nφn, nφn ≾ mini=1,..., L pi and s nφn log n. Further comments: - Lemma 1 requires that the constant K is large enough such that Theorem 3 is applicable. - First display on page 1886: The value t2 is N not Nd. - Equation (18) also requires that the inputs are nonnegative. - In Lemma 3, the L-norms should be replaced by the supremum, that is, f L(A) should be changed to supx∈A |f (x)|. - In the proof of Theorem 1, ri does not depend on i and should be named r. Three lines after equation (26), C should be replaced by C'. - In the proof of Theorem 3, β in the first line on page 1893 should be β∗∗. It is sufficient to check that the Hölder constant of φw is bounded by (β + 1)t∗ (t + 1) as all later arguments of the proof carry over. Moreover, gi(x) = (x1,..., xdi ) should be gi(x) = (x1,..., xdi+1 ) if di ≥ di+1 and gi(x) = (x1,..., xdi, 0,..., 0) if di < di+1. Finally, ψu22 should be replaced by ψuB22. - In the proof of Lemma 2, one can simply take the constant function hj, α = K if μ0/= 0. This immediately gives dj, k = Kμd0 2−jd/2 for all wavelet coefficients dj, k. For the case μ0 = 0, one should replace the binomial coefficient (Equation presented) by the multinomial coefficient (Equation presented) = (dr)!/(r!)d. - To verify the last inequality in (B.8) of the Supplementary Material, one can replace ≤ 1/M by < 1/M. - In the proof of Lemma 4, some F are missing. In particular, it should be (Equation presented). Also, on page 12 of the Supplementary Material it should be P (T ≥ t) ≤ 1 ∧ 2Nn maxj (Equation presented). Since rj ≥ F n−1 log Nn, we can argue as before and obtain P (T ≥ t) ≤ 2Nn exp(−3t√log Nn/(16√n)) for all t ≥ 6√n log Nn. The conclusion of (I) is still valid. - In the proof of Lemma 5, we always work with the | · |-norm for vectors. The grid size of an individual parameter should be taken as δ/((L + 1)V).

Original languageEnglish
Pages (from-to)413-414
Number of pages2
JournalAnnals of statistics
Volume52
Issue number1
DOIs
Publication statusPublished - Feb 2024

Fingerprint

Dive into the research topics of 'Correction to “Nonparametric regression using deep neural networks with ReLU activation function”'. Together they form a unique fingerprint.

Cite this