In predicting storm impacts on sandy coasts, possibly with structures, accurate runup and overtopping simulation is an important aspect. Recent investigations (Stockdon et al., 2014; Palmsten and Splinter, 2016) show that despite accurate predictions of the morphodynamics of dissipative sandy beaches, the XBeach model (Roelvink et al., 2009) does not correctly simulate the individual contributions of set-up, and infragravity and incident-band swash to the wave run-up. In this paper we describe an improved numerical scheme and a different way of simulating the propagation of directionally-spread short wave groups in XBeach to better predict the groupiness of the short waves and the resulting infragravity waves. The new approach is tested against field measurements from the DELILAH campaign at Duck, NC, and against video-derived runup measurements at Praia de Faro, a relatively steep sandy beach. Compared to the empirical fit by Vousdoukas et al. (2012) the XBeach model performs much better for more extreme wave conditions, which are severely underestimated by existing empirical formulations. For relatively steep beaches incident-band swash cannot be neglected and a wave-resolving simulation mode is required. Therefore in this paper we also test the non-hydrostatic, wave-resolving model within XBeach for runup and overtopping against three datasets. Results for a high-quality flume test show non-hydrostatic XBeach predicts the run-up height with good accuracy (maximum deviation 15%). A case with a very shallow foreshore typical for the Belgian coast at Wenduine was compared against detailed measurements. Overall the model shows correct behavior for this case. Finally, the model is tested against a large number (551) of physical model tests of overtopping from the CLASH database. For relatively high overtopping discharges the non-hydrostatic XBeach performs quite well, with increasing accuracy for increasing overtopping rates. However, for relatively low overtopping rates of less than 10–20 l/m/s, the model systematically underestimates measured overtopping rates.