While several tools for finding and fixing data quality issues exist, it’s difficult to use these tools together. As such, researchers from NYU’s Visualization and Data Analytics Research Center have open-sourced openclean, a unified framework for composing and executing data cleaning pipelines. Openclean has an extensible profiler to detect data quality issues and operators for data cleaning and wrangling tasks (e.g. fixing functional dependency violations, correcting spelling mistakes. It also integrates with Socrata and Reference Data Repository for data enrichment and has a “mini-version control engine” to manage versions of datasets throughout the data preparation process.