Classical non-learned algorithms for photoacoustic tomography (PAT) reconstructions are mathematically proven to converge, but they can be very slow and inadequate with respect to model and data assumptions. Recently, learned neural networks have shown to surpass the reconstruction quality of non-learned algorithms, but since analysis is challenging, convergence and stability are not guaranteed. To bridge this gap, we investigate the stability of algorithms in which we combine the structure of model-based algorithms with the efficiency of data-driven neural networks. In the last decade, primal-dual algorithms have become popular due to their ability to employ non-smooth regularisation, which is used to overcome the limited sampling problem in photoacoustic tomography. The algorithm performs updates in both the image domain (primal) and the data domain (dual). These are connected by the photoacoustic operator, which modelling is based on the laws of physics and system settings. In our approach, we replace the updates with shallow neural networks, while maintaining the primal-dual structure and the information from the photoacoustic operator. This greatly improves reconstruction quality, especially in cases of strong noise and limited sampling. This has the additional benefit that a hand-crafted regularisation does not have to be chosen, but is learned in a data-driven manner. We show its robustness in simulation and experiment against uncertainty and changes in PAT system settings. This includes the number, placement and calibration of detectors, but also changes in the tissue type that is imaged. The method is stable, computationally efficient and applicable to a generic photoacoustic system with universal applications.