Apache Arrow, which is used by projects like Spark, Parquet, and Pandas; has become an indispensable component in the open-source data analytics stack. Unlike file formats like Apache Parquet, Arrow understands how to read and operate directly on serialized data without loading it fully into memory. As such, it can be used when analyzing datasets that are too large to fit in memory. Since modern CPUs can perform the same operation on contiguous data segments in parallel, Arrow’s columnar storage format leads to impressive speedups in column-based computations. Finally, Arrow also provides efficient inter-process communication and a standardized RPC framework so that data can be exchanged efficiently without copying it locally. Its community is working on capabilities to allow users to query data stored in the Arrow format directly.