Many teams are converging on a similar architecture for search and recommendation engines wherein keyword search is used to retrieve a set of candidates, which are then ranked using a Transformer-based model. However, running this retrieve-rerank architecture in the cloud often incurs high management overhead and costs. Anand et al. posit that the serverless paradigm can be applied to making scaling retrieve-rerank pipelines easier and cheaper. They present a prototype that uses Lucene and BM25 for retrieval and a monoBERT model (implemented with HuggingFace Transformers) for reranking. They show that their serverless implementation can match the effectiveness and cost of a server-based deployment for low-load applications while achieving much lower latency by exploiting massive parallelism.