GitHub image for PDF Association repo

GitHub repositories

The PDF Association hosts several public and private repositories in GitHub to facilitate a common understanding of PDF and for developing new ideas around PDF technologies. Private repositories are restricted to PDF Association members. Public repositories welcome contributions and comments from anyone however, all contributors must first complete this form acknowledging their acceptance of the PDF Association's Intellectual Property Rights (IPR) policy.

PDF 2.0 issues

This public repository provides developers with a means of openly reporting all issues with any of the latest published PDF 2.0-based ISO standards for review and resolution by industry experts. All issues in PDF specifications are important, from minor typos and formatting issues, to larger ambiguous, unclear or apparently contradictory statements. By reaching a consensus on resolutions as an industry, PDF interoperability and implementation reliability will be improved. This repo supports issues logged against all published PDF 2.0-based ISO standards:

Arlington PDF model

The Arlington PDF Model is a specification-derived, machine-readable definition of the full PDF document object model (DOM) as defined by the PDF 2.0 specification ISO 32000-2:2020 and its related resolved errata. It provides an easy-to-process structured definition of all formally defined PDF objects (dictionaries, arrays and map objects) and their relationships beginning with the file trailer using a simple text-based syntax and a small set of declarative functions. The Arlington PDF Model is applicable to both PDF readers and PDF writers.

PDF 2.0 examples

This is a collection of example PDF 2.0 files that comply with ISO 32000-2:2020. The files in this collection are intended for educational purposes and are intentionally kept relatively simple. Each example illustrates the usage of a new PDF 2.0 feature.

Index of PDF corpora

This index references a number of the more significant public corpora (data sets) that may contain both valid and invalid, real and synthetic PDF files, reflecting the realities of processing PDF files 'from the wild'. In addition, targeted test suites for specific PDF features, ISO subsets of PDF and some of the nested formats used inside PDF files are also listed. It is not intended to be a list of every website where PDFs may be obtained.

Deriving HTML from PDF

This repository is for PDF Association members of the Responsive-PDF TWG and PDF Reuse TWG technical working groups to track work on updates to the document describing the algorithm for deriving HTML from well-tagged PDF 2.0.


Artifacts from the DARPA-funded SafeDocs research program.

PDF 2.0 RichMedia annotations

This repository is for PDF Association members of the RichMedia TWG which is examining support for all forms of rich media including 3D, audio and video in PDF 2.0.