Data scientists often have access to a wide range of potentially related features that are associated with a single set of observations (e.g. profile information and clickstream data associated with a user for whom we will predict a set of recommended items). To combine these “data views,” researchers can use early fusion approaches (for instance, by applying autoencoders to project both views into a lower dimensional space) or late fusion approaches where predictions from separate models are trained on each view are combined. Ding et al. propose a third approach called cooperative learning, which combines the usual squared error loss of predictions and an agreement penalty to align predictions from different data views. This approach is particularly valuable for datasets wherein the views are related in a meaningful way, but where each view may contain noise.