Often, data practitioners need to evaluate datasets before building models or preparing analysis, including for data security and privacy reasons. Data Profiler, which was recently OSS’ed by Capital One, is a Python library that automatically identifies the schema, statistics, and entities within dataframes. Specifically, it includes pre-trained deep learning models that can find sensitive data, including PII and NPI. Users can extend Data Profiler by adding new entities to the pre-trained model or developing new pipelines for entity recognition.