While Transformer models can achieve SOTA performance on information retrieval (IR) tasks, architectures that feed each query-document pair through a big neural network to compute each relevance score are computationally expensive. To make Transformer-based IR architectures more efficient and scalable, researchers from the Stanford Future Data Lab have proposed and OSS’ed ColBERT. ColBERT independently encodes each document and query using BERT and then models their fine-grained similarity. With ColBERT, document embeddings can be computed offline, and scalable vector similarity operators may be applied to retrieve relevant passages.