Most popular deep learning models make predictions based on the model parameters and a single test input. As such, they do not consider the dependencies between data points when generating predictions (although these relationships may represent valuable information). In contrast, Kossen et al. propose Non-Parametric Transformers, which generate predictions by using the entire dataset (training and test) as an input and explicitly learning (through a multi-head self-attention mechanism) the connections between data points (e.g. that arise due to the causal mechanism generating the data). They demonstrate the potential value of this approach on both tabular and image datasets.