GitHub recently released a new homepage, which includes a digital globe that visualizes pull requests open and merged throughout the world. In this post, Tal Safran discusses how GitHub collects and uses the data underlying the globe. The Github team queries events (defined in protobuf format and representing merge pull requests) coming from Kafka into their data warehouse using Presto. They highlight merged pull requests from repositories identified as healthy through a ranking algorithm that uses 30+ features. They also use Mapbox’s forward geocoding API and Ruby SDK to geocode user-provided locations. To schedule these workflows, the team leverages Airflow.