In recent years, both datasets and models have become increasingly large and complex. Moreover, data practitioners are more and more frequently responsible for developing models that will be operationalized with certain performance constraints. As such, scaling Python and big data processing has become a very important goal. However, most solutions for parallel big data processing invoke the Python interpreter. In comparison, Tuplex, recently OSS’ed by researchers at Brown University, generates optimized LLVM byte code for a given pipeline and input dataset using data-driven compilation and dual-mode processing. With these techniques, Tuplex can deliver Python at native code speed.