Although most data scientists use Python and Pandas dataframes for modeling and analysis, these tools do not scale in the same manner as databases for large analytical queries. However, recent trends (e.g. the shift from polystores to polyengines, the standardization of Python APIs, convergence on Apache Arrow – a common data format) and new technologies (e.g. that map dataframe operations to relational algebra) could enable scalable and efficient data science on cloud backends. To address this problem and opportunity, Jindal et al. present Magpie, which automatically selects the optimal database engine (SQL data warehouse, Spark, SCOPE) as users apply the Pandas API to wrangle data within a data lake.