Several applications including NLP tasks require developers to first extract a complete hierarchical structure from a rendered document (e.g. PDF) or set of documents. While some existing approaches solve simpler problems, like identifying tables or cell locations; these methods do not infer the full hierarchical document composition needed to enable content analysis. In contrast, Rausch et al. propose DocParser, an end-to-end system for parsing complete documents into hierarchical structures. DocParser applies weak supervision to generate noisy labels using the reverse rendering process of LaTex (as such, it can be applied to use cases where annotated documents are not readily available). In addition, the authors release arXivdocs, a dataset based on 127,472 arXiv articles that includes all entities and hierarchical relations in these documents.