Princeton University researchers are releasing a large-scale, longitudinal, curated dataset of over 1M privacy policies from over 100K popular websites. This dataset was collected from Internet Archive’s Wayback Machine using a custom crawler that detects and downloads privacy policies from archived web pages. The team also processed the downloaded policies to extract only relevant text. They are enabling access to this data to facilitate research on the automated analysis of privacy policies, including to support automated policy summarization, question answering, and compliance detection. Their analysis of this data also reveals how privacy policies have changed, including in response to regulations like GDPR and CCPA, and other outstanding issues (e.g. underreporting tracking).