(Parametrized) First Order Transport Equations: Realization of Optimally Stable Petrov-Galerkin Methods

We consider ultraweak variational formulations for (parametrized) linear first order transport equations in time and/or space. Computationally feasible pairs of optimally stable trial and test spaces are presented, starting with a suitable test space and defining an optimal trial space by the application of the adjoint operator. As a result, the inf-sup constant is one in the continuous as well as in the discrete case and the computational realization is therefore easy. In particular, regarding the latter, we avoid a stabilization loop within the greedy algorithm when constructing reduced models within the framework of reduced basis methods. Several numerical experiments demonstrate the good performance of the new method.


Introduction
Transport phenomena are omnipresent in several areas of science and technology such as cell movement [19] for instance in brain tumors [13].Even though there is a huge literature on the corresponding partial differential equations (PDEs), there is still significant need for research, in particular concerning efficient and robust numerical solvers.Even less results are known for parametrized PDEs when one either wishes to compute the solution for many different parameters (many-query) or in real-time.Of course, in the simplest case of first order linear transport problems with constant coefficients, there are even closed formulas for the solution using the method of characteristics.This, however, already changes when allowing for variable advection and/or reaction coefficients, which we encounter for instance in mesoscopic formulations of Glioma spreading models [13].Such a model contains patient-specific data as parameters.To use this problem for individual cancer treatment planning it has to be solved numerically reasonably fast for given parameter values.This is the background why in this paper, as a first step, we are concerned with the simplified model problem of (parametrized) time-dependent linear first-order transport: for all parameters µ in a compact set P ⊂ R p , for all times t ∈ (0, T ) (T > 0 being some final time) and all x ∈ D ⊂ R d accompanied with appropriate initial and boundary conditions.
It is well-known that the above point-wise formulation of (1.1) does not make sense for several realistic cases of coefficients, initial and/or boundary conditions, the geometry of D, etc.In fact, often it is known that continuous solutions of (1.1) do not exist; appropriate variational formulations are a possible way-out.
Recalling d'Alembert's solution formula for the linear transport equation [9], it is well-known that the solution "inherits" the (lack of) regularity of the initial condition, which means for instance that the solution stays in L 2 (D) if the initial condition is only in L 2 (D) (and not more).This motivates us to consider an ultraweak space-time formulation with X = L 2 (I; L 2 (D)) ≅ L 2 (I ×D) as trial space.It then remains to determine a test space Y such that the arising variational problem is well-posed.Besides existence and uniqueness of the solution, the stability is of particular interest for numerical purposes.In the optimal case the stability constant is unity, which means that error and residual coincide and at the same time the approximation is the best one from the chosen trial space.This is highly relevant for error estimation in adaptive methods and reduced order models.
Such optimally stable ultraweak variational formulations for first order transport equations have been proposed for instance in [6,7,10,11].The optimal relation between trial and test spaces is thus known.For numerical purposes, however, this relation is not easy to deal with.In fact, given a finite-dimensional approximation trial space X δ ⊂ X , where δ is some discretization parameter such as the mesh size, the numerical construction of the optimal test space Y δ amounts to solving the PDE dim(X δ )-times, which is in general infeasible.Therefore, in [4,6,10,11,30] a discontinuous Petrov-Galerkin (DPG) approximation with a (possibly suboptimal) broken test space is suggested so that the approximation of the optimal test basis functions reduces to the solution of local problems.In [7], a global approximate test space Y δ ⊊ Z δ ⊂ Y is constructed by using an appropriate so-called test search space Z δ similar to [11].Finally, the authors of [12] employ a discontinuous Galerkin approximation in space and a conforming Petrov-Galerkin approximation in time resulting in a suboptimal inf-sup constant, in particular w.r.t.time [12, Lemmata 1 and 3].
In this article we propose to first choose an appropriate test space Y δ and subsequently compute the corresponding trial space X δ .Doing so, the optimal trial space X δ arises from the application of the differential operator on the basis functions of Y δ rather than by approximately solving (local) PDEs.If the test space is chosen for instance as a standard finite element (FE) space, the application of the differential operator is straightforward and by far more efficient than computing approximate test functions.Moreover, the approach is (very) easy to implement.In contrast to the approaches mentioned above we obtain an optimally stable scheme, meaning inf-sup and continuity constants of unity also for variable coefficients.In particular, the inf-sup constant does not depend on δ.We also prove convergence for our scheme as δ → 0 but do not derive convergence rates in δ.Instead, we investigate the achieved rates numerically, obtaining convergence rates similar to the ansatz proposed in [7].We also believe that our approach relatively easily might be generalizable to more complex problems.Finally, we note that choosing first the test space and constructing subsequently the associated optimal trial space was already suggested in [6,Theorem 2.10] for the DPG method but not further pursued in the remainder of the respective article.Moreover, the same approach is investigated in parallel for the wave equation in [15].
Generalizing and applying our proposed approach to parametrized PDEs offers additional advantages.To realize the generalization we make use of the reduced basis (RB) method (see for instance [16,18,22] and references therein), which is nowadays a well-known and accepted efficient numerical method for solving parametrized PDEs in a many-query and/or realtime context.For instance, we employ a greedy algorithm for the construction of reduced test and trial spaces.Here, by applying the now parameter-dependent operator to the reduced test space in order to construct the then also parameter-dependent reduced trial space we obtain an optimally stable reduced scheme.In contrast, the approaches proposed in [8,29] yield a suboptimal inf-sup constant.Moreover, we avoid an additional stabilization loop during the greedy algorithm as proposed in [8] or the construction of a parameterdependent preconditioner as suggested in [29] for the approximation of the optimal test space.Not least because of that, our proposed ansatz allows (especially in the parametric context) for a (very) easy implementation.However, in contrast to [8,29], until now, we were not able to prove the convergence of the greedy algorithm proposed in this paper.Note also that the RB approximation is no longer a linear combination of snapshots, but a linear combination of parameter-dependent applications of operators.Nevertheless the reproduction of snapshots is maintained.We finally note that our approach does not aim at obtaining an approximation that converges faster than the Kolmogorov n-width.
The remainder of this paper is organized as follows: In Section 2 we present an optimally stable ultraweak variational formulation of first order linear transport equations, covering both time-independent and time-dependent operators.Section 3 is devoted to the finite-dimensional, discrete case where we introduce an optimally stable Petrov-Galerkin method.Parametrized transport problems are considered in Section 4 within the framework of the RB method.We describe the fairly easy computational realization of the new approach in Section 5 and report on several numerical experiments in Section 6.Finally, we end with some conclusions in Section 7.

An optimally stable ultraweak (space-time) formulation
In this section we present an ideally conditioned variational framework for linear first order transport equations using results from [7] and [1,2].To that end, let Ω ⊂ R n , n ≥ 1, be a bounded polyhedral domain with Lipschitz boundary, where we note that Ω may also be a space-time domain, as will be shown in Example 2.6 at the end of this section.Moreover, ⃗ n shall denote the outward normal of Γ ∶= ∂Ω.Next, we introduce the advection field ⃗ b(⋅) ∈ C 1 ( Ω) n and the reaction coefficient c(⋅) ∈ C 0 ( Ω), noting that for some statements the regularity assumption on ⃗ b(⋅) may be relaxed.We assume throughout this paper that c(z) − 1 2 ∇ ⋅ ⃗ b(z) ≥ 0 for z ∈ Ω almost everywhere.Then, we consider the first order transport equation For functions v, w ∈ C 0 ( Ω) ∩ C 1 (Ω) we obtain To account for the non-homogeneous boundary conditions we introduce as in [7] the spaces . Thus, we may define the domain of For the derivation of a stable variational formulation we require as in [7] the following two assumptions: Assumption 2.1.We assume that the following conditions hold: (B1) There exists a dense subspace dom( We now give examples for conditions on the coefficient functions ⃗ b and c such that Assumption 2.1 holds true b .Proposition 2.2.Let one of the following two conditions hold: (i) The flow associated with ⃗ b is Ω-filling, meaning that its trajectories starting from the inflow boundary do fill Ω except perhaps for a set of measure zero in a finite bounded time ).Then, the operator B * ○ satisfies (B1) and (B2).Moreover, we have the curved Poincaré inequality The following proposition gives a sufficient condition for ⃗ b to have an Ω-filling flow: n is bounded as well as its gradient in a neighborhood V of Ω, if there are a unit vector ⃗ k, a number α > 0 such that and if Ω is bounded in the ⃗ k direction then the flow is Ω-filling.
We may now define as in [7] v a Considering (2.1) with g(z) ≡ 0 and thus homogeneous Dirichlet boundary conditions we define the formal adjoint We reuse condition (ii) of the corresponding Remark 2.2 in [7].However, as can be seen from the counterexamples in Appendix B, Remark 2.2(i) in [7] is in general not sufficient for well-posedness.Therefore, we develop an alternative condition based on [1,2].
c For a more formal definition see Definition A.1.
and note that due to (B1) ⋅ * is a norm on dom(B * ○ ).With this framework at hand, we can define as in [7] the test space by Then, we can define B ∶ L 2 (Ω) → Y ′ again by duality, i.e., B ∶= (B * ) * .The variational formulation of (2.1) may then be based upon the bilinear form To incorporate the boundary conditions, we introduce as in [7] the weighted Assume that one of the two conditions of Proposition 2.2 holds.
Then, there exists a linear continuous mapping Proof.Integration by parts yields (see also (A.4))

By using the general assumption
where we have used (2.2) in the last estimate.The assertion for v ∈ Y follows by density.
Next, we define for any Then, we obtain the well-posedness of the variational formulation: 7,Thm. 2.4]).Assume that one of the two conditions in Proposition 2.2 is valid and b and f are defined as in (2.4) and (2.7), respectively.Then, there exists a unique u ∈ L 2 (Ω) such that d Note, that due to a wrong estimate the constant given in the corresponding result [7, Prop.
2.3] is generally not true.We therefore give a modified proof using (2.2) for the estimate in question.
and the stability estimate u L2(Ω) ≤ f Y ′ holds.Moreover, i.e., inf-sup and continuity constants are unity and, equivalently, Proof.The proof follows the lines of the proof of [7,Thm. 2.4]  Example 2.6 (Time-dependent linear transport equations).The setting described in the beginning of this section includes both time-independent and time-dependent linear first order transport problems: As remarked in [7] we can consider time as an additional transport direction in the space-time domain, i.e., z = (t, x) ∈ Ω ∶= (0, T ) × D = I × D, n = 1 + d, where D ⊂ R d denotes the spatial domain.Next, we define the space-time transport direction , where ⃗ b x denotes the spatial advective field.Moreover, we introduce the space-time gradient operator as ∇ ∶= (∂ ∂t, ∇ x ) T , where ∇ x is the gradient on the spatial domain D. Accordingly, we set ) is again the outward normal of Γ.Then, we obtain exactly the form (1.1), namely For the space-time boundary, we have We emphasize that also non-homogeneous initial values are thus prescribed in an essential manner.Note that for the time-dependent case condition (i) of Proposition 2.2 is always fulfilled, which can be seen by taking [1]).As an alternative to this realization of a space-time formulation, one could also treat spatial and temporal variables separately.Such a strong in time variational formulation however results in a suboptimal inf-sup constant, for details see Appendix C.

An optimally stable Petrov-Galerkin method
In this section we introduce computationally feasible and optimally stable (conforming) finite dimensional trial and test spaces X δ ⊂ X = L 2 (Ω) and Y δ ⊂ Y for the approximation of the solution of (2.8).Here, we denote by δ a discretization parameter, where δ equals the mesh size h for spatial problems and δ = (∆t, h) for time-dependent problems in space and time with a time step ∆t e .Then, the discrete counterpart of (2.8) reads e If we use a tensor product discretization in space, δ may also take the form δ = (h 1 , . . ., h d ) or δ = (∆t, h 1 , . . ., h d ), respectively.
This latter equation admits a unique solution u δ ∈ X δ provided that where we additionally require the existence of β and γ such that for all δ > 0.
These constants for continuity γ δ and stability (or inf-sup) β δ also play a key role for the relation of the error e δ ∶= u − u δ and the residual r δ ∈ Y ′ defined as as can be seen by the standard lines In the optimal case, i.e., β = γ = 1, error and residual coincide, i.e., e δ L2(Ω) = r δ Y ′ .Moreover, we have the following Céa-type lemma [28] , so that in the optimal case β = γ = 1 it holds that (u; X δ ), i.e., the numerical approximation is the best approximation.
3.1.An optimally conditioned Petrov-Galerkin method.To realize an optimally conditioned and thus optimally stable Petrov-Galerkin method, which is also computationally feasible, we suggest in this paper to first choose a conformal finite-dimensional test space Y δ ⊂ Y and then to set For this pair of trial and test spaces we then obtain for every w δ ∈ X δ that (3.5) Here, we have exploited the fact that for all w δ ∈ X δ for the supremizer s δ w δ ∈ Y δ , defined as the solution of (s δ w δ , v δ ) Y = b(w δ , v δ ) for all v δ ∈ Y δ , we have s δ w δ = B − * w δ as B * is boundedly invertible.From (3.5) we may thus conclude that indeed (3.6) and the proposed method is optimally stable.We note that the same approach is investigated in parallel for the wave equation in [15].Moreover, we emphasize that the suggested approach is computationally feasible since B * is a differential operator which can easily be applied -as long as the test space is formed by 'easy' functions such as splines as in the case of finite elements (FE).Additionally, for our choice of test and trial space we may reformulate the discrete problem (3.1) as follows: Thanks to the definition of the trial space X δ in (3.4) there exists for all v δ ∈ X δ a unique w δ ∈ Y δ such that v δ = B * w δ .Therefore, the problem (3.1) is equivalent to the problem which obviously is a symmetric and coercive problem, the normal equations, or a least-squares problem.Thus, problem (3.7) is well-posed and we identify the solution of (3.1) as u δ ∶= B * w δ .This reformulation will also be used for the implementation of the framework.From (3.7) we see that for the setup of the linear system for w δ the precise knowledge of the basis of X δ = B * Y δ is not needed -only for the pointwise evaluation of u δ when e.g.visualizing the solution.For further details on the computational realization we refer to Section 5. Thanks to (3.6) we are moreover in the optimal case described in the beginning of this section and the numerical approximation u δ ∈ X δ is thus the best approximation of u ∈ L 2 (Ω) for our suggested choice of trial and test space.Hence, we obtain (3.4) we have that for any w δ ∈ X δ there exists a unique v δ ∈ Y δ with B * v δ = w δ .In view of (B1) in Assumption 2.1, there also exists a unique v ∈ Y such that We may thus also infer from (3.8) the (strong) convergence of the approximation u δ to u in L 2 (Ω) provided that inf v δ ∈Y δ v − v δ Y converges to 0 as δ → 0. Note, that the latter can be ensured by choosing an appropriate test space Y δ as say a standard FE space.
We finally remark that in standard FE methods the error analysis is usually done in two steps: (1) relation of the error to the best approximation by a Céa-type lemma; (2) proving an asymptotic rate of convergence e.g. by using a Clément-type interpolation operator.As seen above, (1) also holds for our new trial spaces -in a non-standard norm, however.Regarding the second step (2) there is hope that it might maybe be possible to derive convergence rates via the term (3.8)) and mapping properties of the operator B. This is however beyond the scope of the present paper and the subject of future work.Here, we will hence investigate the rate of convergence in numerical experiments in Section 6.
Example 3.1 (Illustration of trial space).We illustrate the trial space X δ as defined in (3.4) for a very simple, one-dimensional problem.In detail, we consider Ω ∶= (0, 1), a constant transport term b > 0, and a variable reaction coefficient According to our proposed approach, we start by defining a test space Y h .To this end, let n h ∈ N and h ∶= 1 n h , for i = 1, . . ., n h and define Y h ∶= span{η 1 , . . ., η n h }.Then, we construct the optimal trial space in the above sense by X h ∶= span{ξ 1 , . . ., ξ n h }, where we set else, for i = 1, . . ., n h .Note, that for the special case of constant reaction c(x) ≡ c, the functions ξ i are piecewise linear and discontinuous, see Figure 1.

3.2.
Nonphysical restrictions at the boundary.From a computational perspective it is appealing to use discrete spaces that are tensor products of onedimensional spaces; for details see Section 5.However, this choice may result in nonphysical restrictions of functions in the trial space on certain parts of the outflow boundary.
To illustrate this, consider Ω = (0, 1) 2 and let such that we have for the inflow boundary Γ − = ({0}×(0, 1))∪((0, 1)×{0}) and thus for the outflow boundary Next, we define the discrete test space on Ω = (0, 1) 2 as the tensor product space Then, the optimal trial functions are given for i, j, = 1, . . ., n h , by However, this simple tensor product ansatz results in ψ i,j (1, 1) = 0 for all i and j, i.e., any numerical approximation would vanish at the right upper corner (1, 1) ∈ Ω. Needless to say that this is a nonphysical restriction at the boundary, even though point values do not matter for an L 2 -approximation.It is obvious that the 2D-case is only the simplest one in which this effect appears.In fact, in a general dD-situation (d ≥ 2), we would obtain that optimal trial functions constructed as the B * -image of tensor products would vanish on (d − 2)-dimensional sets along the boundary of Ω, leading to nonphysical boundary values.To reduce the impact of this effect, we suggest to consider an additional "layer" around the computational domain by defining a tube of width α > 0 around Γ + by (3.9) Then, we solve the original transport problem on the extended domain Ω(α) using the associated pair of optimal trial and test spaces.As a result the trial functions vanish on the exterior boundary of Ω + (α), but not on ∂Ω.From a numerical perspective, by choosing α = mh for a (small) m ∈ N and the mesh size h, this adds m layers of grid cells and thus O(n d−1 h ) degrees of freedom.On the larger domain Ω + (α), the numerical solution remains to be a best-approximation in the enlarged trial space.Due to the larger dimension, this is no longer true w.r.t. the original domain Ω.However, note, that the additional unknowns are only (d − 1)dimensional.We will numerically investigate this effect in Section 6.
3.3.Post-processing.As already mentioned, we are particularly interested in using our framework for problems with non-regular solutions u ∈ L 2 (Ω), which especially includes jump discontinuities that are transported through the domain.However, it is well known that (piecewise) polynomial L 2 -approximations of such discontinuities result -especially for higher polynomial orders -in overshoots, the so-called Gibbs phenomenon.There are many works concerning post-processing techniques to mitigate such effects, see for instance [23] and the references therein.
Within the scope of this paper, we restrict ourselves to a rather simple postprocessing procedure aimed at limiting the solution near jump discontinuities.Let Y δ ⊂ Y be a conforming FE test space on Ω ⊂ R n corresponding to a partition . ., n contain discontinuities across the cell boundaries, such that limiting these terms has the potential to mitigate overshoot effects.For all K ∈ T δ , we have ∂ xi w δ K ∈ P p (K).Based upon this, we define where P P (p−1) (K) is the L 2 -orthogonal projection onto the polynomials of order at most (p − 1) on K.We then define the post-processed solution to (3.1) as As a first attempt, one may perform the element-wise L 2 -projection on all grid cells.However, for many problems it might be better (or even necessary) to choose a set of grid cells T jump δ ⊂ T δ that contains all cells where overshoots due to the jumps indeed occur, and only perform the post-processing for the cells K ∈ T jump δ .For methods that are able to detect such cells we refer to [21].
Due to the construction of the post-processed solution independent from the trial space X δ , it is not clear whether the post-processed solution shows the same convergence rate as the standard solution.We will investigate the convergence behavior in numerical examples in Section 6.We will test this approach for piecewise constant solutions u with jump discontinuities.For more complex problems, perhaps other, more sophisticated methods from the literature have to be used.

The reduced basis method for parametrized transport problems
In this section we generalize the above setting to problems depending on a parameter and apply the reduced basis method for that purpose, [16,18,22].4.1.Parametrized transport problem.We consider a parametrized problem based upon a compact set of parameters P ⊂ R p .In analogy to the above framework we define the domain Ω and the now possibly parameter-dependent quantities For all µ ∈ P we define f µ;○ ∈ C 0 ( Ω) and g µ ∈ C 0 ( Γ− ).Then we consider the parametric problem of finding Assumption 4.1.We assume that Ω, P and ⃗ b µ are chosen such that the in-and outflow boundaries Remark 4.2.As we shall see below, Assumption 4.1 is a direct consequence of a necessary density assumption to be formulated below.However, as stated in [8], for parameter-dependent Γ ± (µ) and a polyhedral domain Ω, it is always possible to decompose P into a finite number of subsets P m , m = 1, . . ., M, with fixed parameterindependent corresponding in-and outflow boundaries.Hence, one considers M sub-problems on P m , m = 1, . . ., M , with separate reduced models.Moreover, one could also consider parameter-dependent Ω µ , Γ ±,µ that can be transformed onto a parameter-independent reference domain Ω with fixed in-and outflow boundaries by varying the data.
Next, we require Assumption 2.1 for the formal adjoint B * µ;○ for all µ ∈ P such that we can apply the above framework separately for all µ ∈ P in order to define the test space Y µ with parameter-dependent norm v Yµ ∶= B * µ v L2(Ω) as well as the extended operators . Hence, we aim at determining solutions We mention that the norms ⋅ Yµ cannot be expected to be pairwise equivalent for different µ ∈ P, which means that even the sets of two test spaces Y µ1 , Y µ2 , µ 1 ≠ µ 2 , can differ.Therefore, we define as in [8] the parameter-independent test space where we assume that Ȳ is dense in Y µ for all µ ∈ P. f Thanks to the compactness of P we may equip Ȳ with the norm v Ȳ ∶= sup µ∈P v Yµ .The above theory of optimal trial and test spaces as well as well-posedness immediately extends to the parameter-dependent case in an obvious manner.
As usual, we assume that B * µ and f µ are affine w.r.t. the parameter.In detail, we assume that there exist functions θ q b ∈ C 0 ( P) for q = 1, . . ., Q b and θ q f ∈ C 0 ( P) f This assumption, which is required for instance for Lemma 4.3, automatically implies that Γ± are parameter-independent (Assumption 4.1), since a homogeneous Dirichlet boundary condition on Γ+ is included in the test spaces.
Proof.(Sketch) The main idea of the proof is to exploit the continuity of the mappings µ ↦ B * µ and µ ↦ f µ to show that ũ satisfies (4.1) for some µ for all v ∈ Ȳ and subsequently use a density argument, see Appendix D for details.4.2.Discretization.For the discretization of the parametric problem, we introduce a parameter-independent discrete space Y δ ⊂ Ȳ. Next, for fixed µ ∈ P we define the discrete test space and the corresponding trial space as Note, that for different µ ∈ P, the spaces X δ µ differ as sets but have the common norm ⋅ L2(Ω) , whereas the spaces Y δ µ consist of the common set Y δ with different norms ⋅ Yµ .With the same reasoning as for the non-parametric case (see (3.5)), we have an optimal discrete inf-sup constant for all µ ∈ P, i.e., β δ µ ∶= inf As in §3.1 we observe that problem (4.4) is equivalent to the problem (4.5) and we may thus solve (4.5) and identify the solution of (4.4) as u δ µ ∶= B * µ w δ µ .
Remark 4.4.Since for all µ ∈ P we have , which means that the trial spaces for all µ ∈ P are contained in a common discrete space with dimension Corollary 4.5.Under the above assumptions the discrete solution set M δ ∶= {u δ µ solves (4.4), µ ∈ P} ⊂ X δ is a compact subset of X δ .Proof.The proof can be done completely analogous to the continuous setting exploiting that X δ is a Hilbert space equipped with the L 2 -inner product.4.3.Reduced scheme.We assume that we have at our disposal a reduced test space Y N ⊂ Y δ with dimension N ∈ N g constructed for instance via a greedy algorithm (see §4.4).Then, for each µ ∈ P we introduce the reduced discretization with test space g In order to have a clear distinction between high-and low-dimensional spaces, we use calligraphic letters for the high-dimensional and normal symbols for the reduced spaces.
As in the high-dimensional case discussed in §4.2, these pairs of spaces yield optimal inf-sup constants Hence, regardless of the choice of the 'initial' reduced test space Y N we get a perfectly stable numerical scheme without the need to stabilize.Note, that this is a major difference to the related work [8], where, due to a different strategy in finding discrete spaces, a stabilization procedure is necessary.Using the least-squares-type reformulation (4.5), we can (similarly to (3.7)) first compute , and then set u N µ ∶= B * µ w N µ as the solution of (4.6).
Offline-/Online-Decomposition.By employing the assumed affine parameter dependence of B * µ and f µ , the computation of u N µ can be decomposed efficiently in an offline and online stage: Let {v N i ∶ i = 1, . . ., N } be a basis of the parameterindependent test space Y N .In the offline stage, we precompute and store the following parameter-independent quantities: In the online stage, given a new parameter µ ∈ P, we assemble for all i, j = 1, . . ., N ) by solving the linear system A N µ w N µ = f N µ of size N , where w N µ ∶= (w i (µ)) i=1,...,N ∈ R N .The reduced basis approximation is then determined as 4.4.Basis generation.While in the standard RB method a reduced trial space is generated from snapshots of the parametrized problem, the reduced discretization of our method is based upon one common reduced test space, while the reduced trial spaces are parameter-dependent.However, although we have to find a good basis of the reduced test space Y N ⊂ Y δ , we still want to build the reduced model from snapshots of the problem.To that end, we again use the formulation (4.5):Given μ ∈ P, let w δ μ ∈ Y δ μ be the solution of (4.5), such that μ holds for the solution of (4.6).Note, however, that due to the parameter dependence of the trial spaces u δ μ is only included in end if 10: N ← N + 1 13: end while of (4.5) is thus analogous to the standard RB strategy to build the reduced trial space from snapshots of the problem of interest: Although a single trial space X N µ is not solely spanned of snapshots, the model error u N µ − u δ µ L2(Ω) is zero for all parameter values µ whose (4.5)-snapshot is included in Y N .
Algorithm 1 describes an analogue of the standard RB strong greedy algorithm for our setting: Iteratively, we first evaluate the model errors of reduced solutions for all parameters µ in a train sample Ξ ⊂ P.Then, we extend Y N by the (4.5)snapshot w δ µ * ∈ Ȳδ corresponding to the worst-approximated parameter µ * .This automatically extends X N µ * by the (4.4)-snapshot u δ µ * ∈ X δ µ * , such that from then on the model error for µ * is zero.
Of course, this algorithm is computationally expensive, since we have to compute u δ µ for all µ ∈ Ξ, which may not be feasible for very complex problems and a finely resolved Ξ ⊂ P. It is hence desirable to use some kind of surrogate -ideally a reliable and efficient error estimator -instead of the true model error in the greedy algorithm.However, as will be seen in the next subsection, the standard error estimator is not offline-online decomposable in our setting -a problem already encountered in [8].Therefore, we have to use error indicators instead when using the full model error is computationally not feasible.We note that until now we are not able to prove convergence of the greedy algorithm due to the parameterdependent trial spaces.
Alternatively, to obtain a computational more feasible offline stage one might let the strong greedy run on a small test set with relatively high tolerance and use a hierarchical a posteriori error estimator on the large(r) training set, which was proposed in a slightly different context in [24].Another idea might be to keep a second test training set during the greedy algorithm.In order to estimate the dual norm of the residual more cheaply one could then compute Riesz representations on the span of test training snapshots instead of the full discrete space.4.5.Error analysis for the reduced basis approximation.In the online stage, for a given (new) parameter µ ∈ P we are interested in efficiently estimating the model error u δ µ −u N µ L2(Ω) to assess the quality of the reduced solution.As already mentioned above, due to the choice of the reduced spaces, the reduced inf-sup and continuity constants are unity.This means that the error, the residual, and the error of best approximation coincide also in the reduced setting (cf.(3.3)).To be more precise, defining for some v ∈ L 2 (Ω) the discrete residual In principle, r δ µ (v) ∈ (Y δ µ ) ′ can be computed.However, due to the special choice of the parameter-dependent norm of Y δ µ , i.e., w Y δ µ = B * µ w L2(Ω) , the computation of the dual norm involves applying the inverse operator (B * µ ) −1 and is thus as computationally expensive as solving the discrete problem (4.4).Therefore, the computation of r δ µ (u N µ ) (Y δ µ ) ′ is not offline-online decomposable, so that the residual cannot be computed in an online-efficient manner.
As an alternative for the error estimation mainly in the online stage, we consider an online-efficient, but non-rigorous hierarchical error estimator similar to the one proposed in [3].Let Y N ⊂ Y M ⊂ Y δ be nested reduced spaces with dimensions N and M , N < M and denote for some µ ∈ P by u corresponding solutions of (4.6).Then, we can rewrite the model error of u N as , which can be computed efficiently also in the online stage.In practice, Y N and Y M can be generated by the strong greedy algorithm with different tolerances ε N and ε M ≪ ε N .Of course, this approximation to the model error is in general not reliable, since it depends on the quality of Y M .Reliable and rigorous variants of such an error estimator can be derived based on an appropriate saturation assumption, see [17].There, also a strategy for the use of hierarchical estimators in terms of Hermite spaces Y M for the construction of a reduced model in the offline phase have been discussed.We do not go into details here.Numerical investigations of the quality of the error estimator will be given in §6.2.

Computational realization
In this section, we specify the implementation of the solution procedure developed in Section 3.This is also used for the methods for parameter-dependent problems developed in Section 4. In fact, due to our assumption of affine dependence in the parameter (4.3), the computational realization in the parametric setting is very similar to the standard setting and can be done following the offline-online decomposition described at the end of §4.3, which is why we do not address it in this section.
To solve the discrete problem (3.1) we use the equivalent formulation (3.7), i.e., we first find w δ ∈ Y δ such that (B * w δ , B * v δ ) L2(Ω) = f (v δ ) for all v δ ∈ Y δ , and then set u δ ∶= B * w δ ∈ X δ .The solution procedure thus consists of, first, assembling and solving the problem for w δ in Y δ , and second, computing u δ .The implementation is especially dependent on the exact form of the adjoint operator B * .First, we address the case of constant data, which is easier to implement and slightly more computationally efficient than the general case which we discuss subsequently.5.1.Implementation for constant data.We first consider constant data functions in the adjoint operator, which has thus the form B * w ∶= − ⃗ b ⋅ ∇w + cw for 0 ≠ ⃗ b ∈ R n , c ∈ R. We have already seen in Example 3.1 that in the one-dimensional case, choosing a standard linear continuous FE space for the test space Y δ yields a trial space X δ with piecewise linear and discontinuous functions.This can be generalized to conforming FE test spaces with arbitrary dimension, grid, and polynomial order: If v δ ∈ Y δ is globally continuous and polynomial on each grid cell, all terms of B * v δ are, due to the constant data functions, still polynomials of the same or lower order on the cells, while the gradient terms yield discontinuities on the cell boundaries.Denoting thus by Y δ ⊂ Y a conforming FE space on a partition i=1 K i with polynomial order p, and by X δ ⊂ L 2 (Ω) the corresponding discontinuous FE space, i.e., we have X δ = B * Y δ ⊂ X δ and can determine the solution u δ ∈ X δ in terms of the standard nodal basis of X δ .
Let B * ∈ R nx×ny be the matrix representation of B * ∶ Y δ → X δ in the nodal bases (φ 1 , . . ., φ ny ) of Y δ and (ψ 1 , . . ., ψ nx ) of X δ , meaning that the i-th column of B * contains the coefficients of B * φ i in the basis (ψ 1 , . . ., ψ nx ), i.e., B * φ i = ∑ nx j=1 [B * ] j,i ψ j .Due to the form of the operator and the chosen spaces, the matrix B * can be computed rather easily, see the example in §5.2.Then, the coefficient vector u = (u 1 , . . ., u nx ) T of u δ = ∑ nx i=1 u i ψ i ∈ X δ can simply be computed from the coefficient vector w = (w 1 , . . ., w ny ) of w δ = ∑ ny i=1 w i φ i ∈ Y δ by u = B * w.To solve (3.7), we have to assemble the matrix corresponding to the bilinear form a ∶ Y δ × Y δ , a(w δ , v δ ) = (B * w δ , B * v δ ) L2(Ω) = (w δ , v δ ) Y , i.e., the Y-inner product matrix of Y δ .One possibility for the assembly is to use the matrix B * : Denoting by The solution procedure thus consists of the following steps: (1) Assemble B * and Y (2) Assemble the load vector Assembling the matrices for spaces on rectangular grids.As a concrete example on how to assemble the matrices B * and Y we consider Ω = (0, 1) n and use a rectangular grid.We start with the one-dimensional case as already seen in Example 3.1.Let thus Ω = (0, 1) and b > 0.Moreover, let T h = {[(i − 1)h, ih)} n h i=1 be the uniform one-dimensional grid with mesh size h = 1 n h , fix a polynomial order p ≥ 1, and define Y h,p 1D , X h,p 1D as in (5.1), (5.2).Let (φ 1 , . . ., φ ny ) and (ψ 1 , . . ., ψ nx ) be the respective nodal bases of Y h,p 1D and X h,p 1D .
Moreover, let I 1D ∈ R nx×ny be the matrix representation of the embedding Id ∶ Y h,p 1D → X h,p 1D in the respective nodal bases, i.e., the i-th column of I 1D contains the coefficients of φ i ∈ Y h,p 1D ⊂ X h,p 1D in the basis (ψ 1 , . . ., ψ nx ), such that for u = I 1D ⋅ w it holds ∑ nx i=1 u i ψ i = ∑ ny i=1 w i φ i .Similarly, let A 1D ∈ R nx×ny be the matrix representation of the differentiation d dx ∶ Y h,p 1D → X h,p 1D , w h ↦ (w h ) ′ .Additionally, as above, we define M 1D ∈ R nx×nx , [M 1D ] i,j = (ψ i , ψ j ) L2((0,1)) as the L 2 -mass matrix of X h,p 1D .For p = 1, i.e., linear FE, and a standard choice of the nodal bases the matrices I 1D , A 1D , and M 1D read .
With these three matrices we can then compose the matrices B * 1D and Y 1D by Next, we consider a rectangular domain of higher dimension, e.g., Ω = (0, 1) n , n ≥ 2. We choose in each dimension one-dimensional FE spaces Y i , X i , i = 1, . . ., n as in (5.1), (5.2) separately, and use the tensor product of these spaces X i as FE spaces on the rectangular grid formed by a tensor product of all one-dimensional grids.The system matrices can then be assembled from Kronecker products of the one-dimensional matrices corresponding to the spaces Y i , X i , i = 1, . . ., n: We first assemble for i = 1, . . ., n the matrices I i 1D and A i 1D corresponding to the pair of spaces Y i , X i .Then, the matrix corresponding to the adjoint operator can be assembled by e.g. for n = 2 we have ).Similarly, the mass matrix M of X δ can be computed from the one-dimensional mass matrices also be directly assembled using the matrices

Implementation for non-constant data.
If the data functions ⃗ b and c are not constant, we do not automatically get a standard FE space X δ in which the solution u δ can be described, thus the implementation has to be adapted.A way to retain the implementation for constant data functions is to approximate the data by piecewise constants on each grid cell.Then, there holds again u δ ∈ X δ and we only have to slightly modify the implementation presented in §5.1:Every nodal basis function ψ i ∈ X δ , i = 1, . . ., nx , has, due to the discontinuous FE space, a support of only one grid cell.Denoting by c i the value of c on the grid cell of ψ i , we define the diagonal matrix c ∈ R nx×nx , [c] i,i ∶= c i , and, similarly, the matrices b j ∈ R nx×nx corresponding to b j , j = 1, . . ., n.We then simply change the scalars b j and c in (5.3) to matrices b j and c, j = 1, . . ., n.However, a piecewise constant approximation of the functions ⃗ b ∈ C 1 (Ω) n , c ∈ C 0 (Ω) may not lead to a sufficient accuracy of the solution.For general ⃗ b ∈ C 1 (Ω) n , c ∈ C 0 (Ω), we thus first assemble the Y-inner product matrix Y ∈ R ny×ny of Y δ and the load vector f ∈ R ny corresponding to the right-hand side as in standard FE implementations for elliptic equations, by using e.g.Gauss quadratures for the approximation of the integrals.We can then solve (3.7) as above by To compute the solution u δ ∈ X δ , we use the fact that we still have w δ ∈ X δ and ∂w δ ∂xi ∈ X δ , i = 1, . . ., n, and store the corresponding X δ -coefficients of w δ and its derivatives separately, as well as the data functions.We can then evaluate u δ = B * w δ for arbitrary x ∈ Ω by evaluating all w δ -dependent functions and all data functions in x and using the definition of

Numerical experiments
In this section, we report on results of our numerical experiments.We consider the parametric and the non-parametric case, starting with the latter one.We are particularly interested in quantitative results concerning the rate of approximation for the discrete case as the discretization parameter δ (see above) approaches zero, quantitative comparisons of the inf-sup constant with existing methods from the literature and the greedy convergence in the parametric case.We report on timedependent and time-independent test cases.The source code to reproduce all results is provided in [5].

6.1.1.
Convergence rates for problems with different smoothness.As indicated in §3.1, we can show the convergence of the proposed approximation for appropriate test spaces Y δ , but did not derive theoretical rates of convergence in this paper.Therefore, in this subsection we investigate the rate of convergence in numerical experiments.In all test cases we use as test space Y δ a continuous FE space on a uniform hexahedral grid.Since we want to investigate here the best possible convergence rates, we choose test cases where the trial space restrictions due to tensor product spaces described in §3.2 do not lead to additional errors.These cases will then afterwards be compared to cases where the restrictions indeed do lead to additional errors in §6.1.2.
Table 2. L 2 -errors and convergence rates for two-dimensional problem with boundary values (6.1), (6.2), and (6.3).We start with the one-dimensional problem introduced in Example 3.1 and set Ω = (0, 1), b(x) ≡ 1, c(x) ≡ 2 with boundary value u(0) = 1.We compute approximate solutions for linear FE spaces Y h (recall Figure 1 for the corresponding basis functions and see Figure 2 for an illustration of the solution), as well as quadratic FE spaces.We observe an (optimal) convergence rate of 1 for the linear and 2 for the quadratic case (see Table 1).
Next, we consider Ω = (0, 1) 2 , and choose ⃗ b ≡ (cos 30°, sin 30°) T , c ≡ 0, f ≡ 0, and compare boundary values with different smoothness.In detail, we solve for the boundary values 31.25y 3 − 18.75y 2 + 1, y ≤ 0.4 0, y > 0.4, (6.1) We use second order FE on a uniform rectangular mesh with n h = h −1 cells in both dimensions, i.e., δ = (h, h).As already mentioned above, the data is chosen such that for all boundary conditions it holds u(1, 1) = 0 for the exact solution, so that we do not observe problems from the nonphysical restriction of the trial space.We observe a convergence of order about 1.65 for the differentiable case g = g 1 , an order of 1 for the continuous case g = g 2 , and an order of about 1 3 for the discontinuous boundary g = g 3 (see Table 2).
To assess the effect of a non-constant transport direction on the convergence rate we use ⃗ b(x, y) = (1 − y, x) T , which has an Ω-filling flow with T = π 2 , c ≡ 0, f ≡ 0, and a C 1 -boundary value g 4 ∈ C 1 (Γ − ) as We observe a convergence behavior even slightly better than for the case of constant ⃗ b with a C 1 -boundary function, see Table 3; the curved transport is resolved without artifacts, see Figure 3.   4. L 2 -errors and convergence rates for two-dimensional problem with different boundary conditions and unphysical restrictions of the trial space.Influence of restrictions due to tensor product spaces.So far we investigated the convergence of discrete solutions for cases where the nonphysical boundary restrictions described in §3.2 do not lead to problems.Here we want to compare these results to similar test cases where the restriction indeed is unphysical, i.e., for the exact solution we have u ≠ 0 at the relevant outflow boundary part.We again choose Ω = (0, 1) 2 , ⃗ b ≡ (cos 30°, sin 30°) T , c ≡ 0, and f ≡ 0. We first consider a constant boundary value g ≡ 1, leading to u ≡ 1, where the impact of the unphysical restriction can be observed best, since the shifted version g ≡ 0 leading to u ≡ 0 would of course have no discretization error at all.We compare this to shifted versions of the boundary values considered in §6.1.1,i.e., gi = g i − 1, i = 1, 2, 3 for g 1 , g 2 , and g 3 defined in (6.1)-(6.3).
In the constant case g ≡ 1 we have a convergence of order ≈ 1 (see Table 4).Comparing Tables 2 and 4, we see that indeed the restriction leads to an additional error that converges with order 1: While the problem for the C 1 -boundary value g 1 converges with an order of about 1.65, the shifted problem for g1 = g 1 − 1 converges only with an order of ≈ 1.For the less smooth boundaries g2 ∈ C 0 (Γ − ) and g3 ∈ L 2 (Γ − ) we see that the convergence order stays the same, and thus the full error is not dominated by the restriction artifacts.All in all, for the present test cases, the restriction due to the tensor product structure limits the convergence rate to 1, but does not deteriorate smaller convergence orders for less smooth problems, such that for these problems the additional error is negligible.Recall that we are primarily interested in such non-smooth solutions in L 2 (Ω).
Next, we investigate the approach proposed in §3.2 to use an additional layer for the computational domain.In detail, we extend the data functions onto the larger domain Ω(α) defined in (3.9), solve the problem for a discrete solution u δ Ω(α) ∈ L 2 (Ω(α)) on this extended problem, and then define the restriction u δ Ω(α) Ω ∈ L 2 (Ω) as discrete solution to the original problem.We consider constant boundary values g ≡ 1.
For each discrete space Y δ , δ = (h, h), we compare values of α = mh, m = 1, . . ., 5, i.e., we extend the domain by 1 to 5 layers of grid cells of the original size.The L 2 -and L ∞ -errors of these solutions and the respective solutions computed on the original domain Ω are shown in 4. We see that using extended domains for the computation reduces the L 2 -errors: A first layer of grid cells has the most significant effect, but also larger extensions further reduce the errors.Since the difference is larger for coarser meshes, the L 2rates are slightly lower than for the original solution, which improves however for finer mesh sizes.We obtained similar results for the boundary values g = g 1 − 1.Moreover, the extended domain approach has a positive impact on the L ∞ -error of the solution and thus on the "optical quality": While for the computations on Ω we automatically have an L ∞ -error of 1 for all mesh sizes, the error is reduced to values between about 0.16 for α = h and 0.05 for α = 5h; also the L ∞ -error on the extended domains seems to be relatively independent of the mesh size (see 4).A comparison of the solution computed on Ω and Ω(h) is provided in 5.
We conclude from these experiments that for the current test case the use of an extended domain slightly reduces the L 2 -error while maintaining comparable convergence rates and considerably reduces the L ∞ -error at the boundary.Hence, at the expense of (moderate) additional computational cost a better approximation of the solution on the outflow boundary can be achieved.6.1.3.Assessment of post-processing procedure.We next compare the approximation of discontinuities of a standard solution u δ ∈ X δ to the post-processed solution ũδ described in §3.3.To this end, we again consider the example in §6.1.1 with boundary value g 3 ∈ L 2 (Γ − ) that is piecewise constant with a discontinuity.Note that the choice of a constant advection ⃗ b and no reaction simplifies the postprocessing procedure, such that the post-processed solution ũδ directly is the L 2orthogonal projection of u δ onto the discontinuous first order FE space.Comparing the errors of u δ and ũδ (see Tables 2 and 5), we see that the errors for the postprocessed solutions are by about 8% smaller than for the standard solutions, while the order of convergence stays the same.Figure 6 shows that the post-processing removes the severe overshoots of the standard solution at the jump discontinuity.We also note that the post-processing is computationally inexpensive, since it is only based upon local multiplications of an element projection matrix for each grid cell.A comparison of the computational costs will be given in §6.1.4.
6.1.4.Comparison to approach proposed in [7].Next, we compare the results of our method, which we call Optimal Trial method, with the related approach in [7], which we call Optimal Test method.We use the same test case as in [7], i.e., we set Ω = (0, 1) 2 , ⃗ b ≡ (cos 22.5°, sin 22.5°) T , c ≡ 0, and f ≡ 0. For the boundary condition we again have the discontinuous boundary value g = g 3 defined in (6.3).While our Optimal Trial approach consists of choosing a test space Y δ ⊂ Y which automatically determines the trial space X δ = B * Y δ ⊂ L 2 (Ω) and the corresponding linear system in Y δ , for the Optimal Test method in [7] one chooses a trial space X δ ⊂ L 2 (Ω) and a larger test search space Z δ ⊂ Y, i.e., Y δ ⊂ Z δ .The optimally stable problem in X δ × (B * ) −1 X δ is then substituted by the problem in X δ ×P Z δ ((B * ) −1 X δ ), which in turn is solved approximately by an Uzawa algorithm.Within this algorithm, one iteratively solves problems in the test space Z δ , which are in fact based upon the same bilinear form as for (3.7) to be solved in Y δ in the Optimal Trial method.We therefore choose the spaces such that Y δ = Z δ , which means that the same matrix has to be assembled for both methods.More precisely, we choose for the Optimal Trial method the same spaces as in the experiments above, i.e., Y δ is the space of continuous FE of second order on a rectangular grid with mesh size δ = (h, h).Fitting to that, we choose -as proposed in [7] -for X δ the space of discontinuous bilinear FE on a rectangular grid with mesh size  (2h, 2h), and Z δ = Y δ , such that here the grid for the test search space results from one uniform refinement of the grid of the trial space.
We first compare the relation of L 2 -errors and CPU times for both methods.For the solution of the linear systems, we always use sparse LU factorization and subsequent forward and back substitution implemented in UMFPACK. Figure 7 shows the respective CPU-error plots for the Optimal Test method using 1 iteration and 5 iterations of the Uzawa algorithm (as proposed in [7]) and for the standard solution of the Optimal Trial method as well as the post-processed solution described in §3.3.We observe similar decay rates of the errors w.r.t. the CPU times for both methods.For the chosen linear solver, the Optimal Test methods with 5 iterations performs best, which is mainly due to the fact that assembly of the matrices and LU factorization dominate the computational costs.Therefore, the costs for 5 Uzawa iterations are only slightly higher than for e.g.only 1 Uzawa iteration, while the errors are reduced significantly.If we use iterative methods, e.g. the CG method, instead, the results depend on the used preconditioner: If the computation of the preconditioner dominates, the results are similar to the results using LU decomposition.In contrast, if the iterative solver takes as much or more time than the preconditioner, then the Optimal Test solutions using 5 Uzawa iterations would take considerably more time compared to the other solutions and we speculate that the post-processed Optimal Trial solution might perform fairly equally to the Optimal Test solutions.However, a comparison of different preconditioners is out of the scope of this paper.
Finally, we compare the inf-sup constants of both methods.While for the Optimal Trial method we automatically have an inf-sup constant of 1, this is not the case for the Optimal Test method.Since here not the truly optimal test space (B * ) −1 X δ , but the projection onto the test search space P Z δ ((B * ) −1 X δ ) is used for the discrete test space, the inf-sup constant for the discrete problem as well as for the corresponding saddle-point problem on which the Uzawa iteration is based is suboptimal.Table 6 and Table 7 show the inf-sup constants for the considered two-dimensional problem, i.e., Ω = (0, 1) 2 , ⃗ b = (cos 22.5°, sin 22.5°) T , c ≡ 0 and the corresponding time-dependent problem, i.e., a three-dimensional problem with Ω = (0, 1) 3 and ⃗ b = (1, cos 22.5°, sin 22.5°) T , respectively.We clearly see that the Test Case 1 (see [20]) Test Case 2 (cf.[8]) Test Case 3 (cf.[8])  inf-sup constants decrease with smaller mesh sizes, in both cases they decay roughly with an order of h 1 3 .

Parametric cases:
The reduced basis method.To examine our method in the parametric setting, we consider three different test cases.For all cases, we choose Ω = (0, 1) 2 and a parametrized constant transport direction ⃗ b µ ∈ R 2 , µ ∈ P, such that Γ − = ({0} × (0, 1)) ∪ ((0, 1) × {0}) for all µ ∈ P as well as parameterindependent reaction, source and boundary data, see Table 8.Again, we want to solve for all µ ∈ P ⃗ For all test cases, we choose a training set of 500 equidistant parameter values distributed over P and set ε = 10 −4 .We then generate reduced models with Algorithm 1 for different mesh sizes.The maximum model errors u N (µ) − u δ (µ) L2(Ω) on an additional test set of 500 uniformly distributed random parameter values are shown in Figure 8.
Since we did not derive theoretical convergence results for the greedy algorithm, we investigate the convergence behavior numerically.To that end, we first consider a test case where the best-possible convergence rate of linear approximations is known: In [20], it is shown that the Kolmogorov N -width of the solution set of Test Case 1 decays with an order of N −1 2 .In the corresponding results of our greedy algorithm, we indeed observe the same (and thus optimal) convergence behavior, see Figure 8.
In Test Case 2 we choose constant reaction and source terms that lead to more regular solutions.Here, the greedy algorithm shows a faster convergence of order about N −3 2 .With discontinuous source and boundary data in Test Case 3 we finally observe an order of roughly N −1 .
Similar experiments were also performed in [8], where reduced models are built by the so-called Double Greedy algorithm that chooses reduced trial spaces and uses additional loops to find stabilized reduced test spaces (of larger dimension).To realize a fair comparison with our approach, we also implemented a "strong" Double Greedy algorithm using the model error instead of a surrogate in [8, Algorithm 4].For the full solutions we use the discretization of the Optimal Test method described in Section 6.1.4.We then run the "strong" variant of the Double Greedy algorithm [8, Algorithm 5] for Test Case 3 on a training set of 500 equidistant parameter values distributed over P and with tolerance ε = 0.01 comparing different thresholds β min for the inf-sup stability of the reduced spaces h .
The resulting maximum model errors for 500 test parameter values are shown in Figure 9.For the smaller stability thresholds of 0.3 and 0.6 we observe slight instabilities while for a threshold of 0.7 the maximum model errors are decreasing for increasing model orders.Comparing the approximation properties of the trial spaces of the Double Greedy and Optimal Trial Greedy method, we see that for model orders up to 32 the Double Greedy trial spaces lead to smaller errors than the Optimal Trial spaces of same dimension, while for larger model orders the Optimal Trial reduced spaces perform better.
Since, unlike the new method, for the Double Greedy method the test spaces are significantly larger (for Test Case 3, β N ≥ 0.7, approximately by a factor of 3) than the trial spaces, the test space dimensions are essential for the online complexity of the reduced saddle point problems.In Figure 10 online computation times for both methods are shown, where we use for the Double Greedy solutions a reformulation of the saddle point problem where the inversion of a test space sized matrix dominates the costs i .We clearly see that the Optimal Trial reduced models outperform the Double Greedy models both when comparing the same trial space dimensions and the same model errors j .
h In [8] it is proposed to use β min ∶= ζβ δ , where 0 < β δ ≤ 1 is a lower bound of the discrete inf-sup constants of the full discretizations for all µ ∈ P and some 0 < ζ < 1, such that the desired threshold is guaranteed to be achievable for all reduced spaces.Here, we simply compare different values of β min < 1 without computing β δ .
i Directly solving the larger linear system of size (trial space dim.)+(test space dim.) corresponding to the saddle point formulation leads to comparable results.j Note, however, that as usual online computation times contain only the computation of the coefficients of the reduced solutions in the respective reduced basis.If an assembly of the fulldimensional solution vector is needed, this dominates the costs and is clearly faster for the Double Greedy models, since for the Optimal Trial method the separate parts of the affine decomposition of the trial space have to be assembled, and the trial space vector is usually larger.These results show that for the rather challenging Test Case 3 the Optimal Trial method leads to comparable and for larger model orders even better approximation properties for the same dimension of the trial spaces and to faster online computation times than the Double Greedy method.We note that for smoother cases, e.g.Test Case 2, the Optimal Trial models show the same, but not better convergence order than the Double Greedy models.
Finally, to test the hierarchical error estimator described in §4.5, we use Test Case 2 with mesh size δ = (h, h), h −1 = 512.For the reduced space Y N , we choose a greedy basis with tolerance ε = 10 −2 , which here corresponds to N = 13.For the error estimator reference space Y M ⊃ Y N , we compare spaces with tolerances ε = 10 −2.5 , 10 −3 , 10 −3.5 , and 10 −4 , leading to M = 31, 62, 91, and 127, respectively.The results in Figure 11 show the quantitative good performance.Note, that the values of M are significantly larger than reported for the hierarchical error estimator in [17] which is due to the fact that M is determined differently and transport problems are not considered there.

Conclusions
In this work, we presented a Petrov-Galerkin method for (parametrized) transport equations leading to a computationally feasible optimally stable numerical scheme that is easy to implement.
Numerical experiments show convergence of order about 1/3 for non-smooth L 2solutions.Despite the L 2 -framework, higher convergence orders between 1 and 2 can be observed for smooth solutions, even though tensor product discrete spaces may limit the convergence order to 1 due to unphysical restrictions of the trial space at the outflow boundary.The proposed method shows similar ratios of errors and computational costs to [7], where fixed trial spaces are used.We thus conclude that our non-standard problem-dependent trial spaces have satisfying approximation properties for the considered test cases.
Moreover, the framework allows for an efficient realization and implementation of reduced basis methods for parametrized transport equations while ensuring optimal stability for full and reduced spaces.The suggested (strong) greedy algorithm realizes the convergence order of the Kolmogorov n-width for a non-smooth transport problem.A comparison with the algorithm in [8] that uses fixed trial spaces and therefore needs additional stabilization techniques shows comparable, or even better convergence rates and significantly lower online costs for the new framework.The results suggest that the new framework might be especially beneficial for problems where a stabilization is rather challenging.
Similar to [1,Lem. 7] we show the following lemma.where we have no boundary integral from the partial integration since the traces of v on Γ + and of ρ on Γ − vanish.Further we obtain Using ρv L2(Ω) ≤ ρ L∞(Ω) v L2(Ω) ≤ 2T v L2(Ω) we have For condition (ii), i.e., c − 1 2 ∇ ⋅ ⃗ b ≥ κ > 0, we obtain by integration by parts (see [27,Lem. 3.1.1]) k ρ is in general not continuous: Consider e.g. a non-convex domain Ω where a characteristic curve is tangential to the boundary at some (isolated) x ∈ Γ 0 , but not in a neighborhood of x.

Appendix C. A (strong in time) space-time variational formulation
As an alternative to our approach of an ultraweak variational form in space and time described in Section 2, one could also take the point of view of using an ultraweak variational formulation in space only and keep the first order derivative in time (i.e., not using integration by parts in time).Integrating over time then results in a space-time framework requiring more regularity in time (let us call it "strong").In order to fix notation, we first interpret the time-independent problem (2.1) as an operator equation A ○ u ∶= ⃗ b⋅∇u+c u = f ○ in some function space V ′ , where V ↪ L 2 (D) ↪ V ′ is a Gelfand triple (and V is the L 2 -dual of V ′ ), i.e., A ○ ∶ H → V ′ .Accordingly, B ○ u = u + A ○ u = f ○ is seen as an equation pointwise in V ′ for t ∈ I, which means we can multiply (1.1) with some smooth test function C 0 ( Ī; V ) and integrate over time: b(w, v) ∶=  We get that b ∶ X × Y → R with the trial space X ∶= {v ∈ L 2 (I; H) ∶ v ∈ L 2 (I; V ′ ), v(0) = 0} = L 2 (I; H) ∩ H 1 (0) (I; V ′ ) and the test space Y ∶= L 2 (I; V ).Here, H 1 (0) (I; V ′ ) ∶= {v ∈ H 1 (I; V ′ ) ∶ v(0) = 0} l equipped with the standard graph norm v X ∶= ( v 2 L2(I;H) + v 2 L2(I;V ′ ) ) 1 2 for v ∈ X .Finally, the norm in Y is ⋅ Y ≡ ⋅ L2(I;V ) .This means that g can be chosen in Y ′ ≅ L 2 (I; V ′ ).This results in the variational formulation: Since we do not perform integration by parts w.r.t.time here, we require H 1regularity in time, which is the reason why we call this formulation strong in time.
In order to determine the inf-sup constant of b(⋅, ⋅) w.r.t. the above pair X , Y, we are going to consider the supremizer s u ∈ Y for some given 0 = u ∈ X , which is the solution of the problem (s The first two terms can be estimated from above and from below by u 2 X , which is exactly what we need.For parabolic problems, the operator A is symmetric and this was used in [25,26] to express ( u, Au) Y ′ in terms of the norm of the final time contribution u(T ).This is the key to derive optimal inf-sup and continuity constants for parabolic problems.
For transport problems, however, A is not symmetric.Defining the symmetric and anti-symmetric part of A as usual, i.e., A sym ∶= 1 2 (A + A * ), A asy ∶= 1 2 (A − A * ) we get A = A sym + A asy , A * sym = A sym , A * asy = −A asy .We obtain by u(0) = 0 and the fundamental theorem of calculus 2 ( u, Au) sym u(T ) V ′ ≥ 0, so that this contribution is no problem.The second part, however, may very well be negative since where β a denotes the inf-sup constant of the spatial operator A. Obviously, this estimate is only meaningful for small final times T .

Figure 3 .
Figure 3. Approximate solution for ⃗ b = (1 − y, x), g = g 4 and h = 1 32.Table 4. L 2 -errors and convergence rates for two-dimensional problem with different boundary conditions and unphysical restrictions of the trial space.

Figure 7 .
Figure 7. L 2 -errors versus CPU-times for the Optimal Test method with 1 iteration (Opt.Test 1) and 5 iterations (Opt.Test 5) of the Uzawa algorithm, and for the Optimal Trial method in standard (Opt.Trial) and post-processed (Opt.Trial post-proc.)form.

Figure 10 .
Figure 10.Test Case 3, h −1 = 512.Comparison of online computation times (median of 5000 runs) for reduced models from Optimal Trial Greedy Algorithm and strong Double Greedy Algorithm with β N ≥ 0.7.Left: Computation time versus trial space dimension, right: maximum model error versus computation time.

Table 1 .
1D: L 2 -error and convergence rate as h → 0 for linear and quadratic FE spaces.

Table 5 .
L 2 -error and convergence rate for postprocessed solution ũδ for boundary g 3 .

Table 6 .
Inf-sup constants for the Optimal Test method and the 2D problem

Table 7 .
Inf-sup constants for the Optimal Test method and the 3D problem

Table 8 .
Data for parametric test cases.