PDF Association logo.


The only digital document format

What is a “document”?

A document is a record of some (typically written) content – a publication, a contract, a statement, a painting – at a moment in time. Until the advent of computers (and scanners), the media typically considered useable for such records included papyrus and vellum, which is basically leather. For a thousand years, more or less, paper has been the media of choice.

Margaret Hamilton standing with a stack of paper.
Margaret Hamilton led a team credited with developing the software for NASA’s Apollo and Skylab. Her  team was responsible for developing in-flight software, which included algorithms designed by various senior scientists for the Apollo command module and lunar lander. The image shows Hamilton in 1969, standing next to the navigation software that she and her MIT team produced for the Apollo project. Credit: NASA / Wikimedia Commons / Public Domain

That began to change in the 1980s.

Digital documents

PDF became the document format of choice for business, government and the general public because it delivers the key qualities of paper in a digital format. PDF is fixed, self-contained, readily shareable and relatively hard to change. It’s not just PDF’s innate characteristics that make it successful, but the fact that PDF pages interoperate smoothly with paper documents. “PDF it, send it, print it, sign it and return it” workflows introduced new efficiencies when the format surfaced into public consciousness in the mid-to-late 1990s. Even then, such workflows utilized only the most basic of PDF’s capabilities, but it was enough to dramatically accelerate the transition to digital documents. Within a few years, PDF files and email decimated document courier services.

Before long, users were scanning the signature page and adding it to (or replacing) the original page in the PDF; the cycle back to a digital document was complete. This new workflow, of course, was an extremely crude approach to facilitating document approvals, but the fact that end-users could do this very easily made PDF very tolerant of variations in workflow and records-keeping practices in a way that’s hard to imagine for databases and HTML.

PDF continues to evolve far beyond a simulacrum for paper. There’s a broad suite of features – tagging, XML-based metadata, attachments, 3D support, digital signatures and more – that support advanced document-handling and consuming workflows. PDF is so capable and so reliable, that some wonder why bother with an archival subset at all.

Documents are for keeping

Not every PDF is designed with reliability in mind. For all its well-deserved reputation for reliably conveying the author’s intent to any viewer, PDF allows developers to make files that rely on external resources, or use encryption; both capabilities are non-starters for the preservation community. If the world preserves PDF files as documents – and it does – then preservationists need PDF/A.

Introduced in 2005 as ISO 19005, PDF/A is now required or best-practice in workflows that generate valuable documents. Filing cabinets and storage boxes are disappearing as ECM systems, cloud storage and local capacity swallow the documents that used to exist only on paper. When new documents are shared, the common-ground is PDF. When finalized for records-retention purposes, ideally, they are PDF/A.

Some think HTML will “beat” PDF because it’s more flexible and less static, but this misconstrues both formats’ respective purposes and fails to appreciate that browser developers are (slowly) augmenting their support for PDF. PDF continues to gain in mind-share: Google’s Trends data shows clearly that the number of searches for PDF documents relative to all other searches continues going up.

PDF’s purpose is to serve in the role of “document”, with all that implies (see above). But that’s not the purpose of HTML. HTML isn’t a document, it’s an experience. PDF is how you keep it, and PDF/A is how you keep it forever.

Preserving the file’s actual bytes, of course, is up to you.

Documents of the future

This is not only the present, it’s also the future. PDF, an open, standardized, broadly-capable digital document technology, has proven equal to the transition from paper to the electronic world. PDF’s advanced metadata, authentication, semantic tagging, attachments, 3D and other features provide a proven framework for future development of digital documents. PDF has no competitors. Even in the world of SharePoint, OpenText, Office 365 and Google Docs, PDF and PDF/A represent the only sufficiently flexible and capable technology for archiving the gamut of digital document content.

(This piece was adapted from a recent blog post)

Categories: Archives & Libraries, PDF/A
Margaret Hamilton

What is a “document”? A document is a record of some (typically written) content – a publication, a contract, a statement, a painting – at a moment in time. Until the advent of computers (and scanners), the media typically considered useable for such r …

Share this article!

About the contributor

Duff Johnson

A veteran of the electronic document space, Duff Johnson is an independent consultant, Executive Director of the PDF Association and ISO Project co-Leader (and US TAG chair) for ISO 32000 and ISO 14289.
More contributions
Talking about electronic documents

We’ve done PDF Day events and technical conferences across Europe, in the US, in Australia, and elsewhere. This Electronic Document Conference is the first PDF Association event that’s open to all technologies pertaining to documents. It’s about explor …

Happy new logo!

2006: The PDF/A Competence Center A new year brings new things, and 2019 is no exception! The “four red blocks” logo was first created for the PDF/A Competence Center in 2006. When that organization became the PDF Association in 2011, the design was ad …

Save-The-Date: PDF Day France, Toulouse, April 4, 2019

PDF Day France will be the first French-speaking event of the PDF Association, organised by our member ORPALIS. It will take place in Toulouse which is the home ground of Airbus and we are very happy that Airbus will present a case study around its usage of PDF in their document management environment!

Electronic Document Conference: Call for Papers

Prospective presenters at the Electronic Document Conference 2019 are invited to submit high-quality original proposals for 25-minute presentations on subjects of interest to developers and technical product managers concerned with electronic document implementations.

Have we passed ‘peak PDF’?

How do we gain insight into how users’ views of documents are shifting? Google Trends is an increasingly interesting source of high-level marketplace data. By aggregating Google’s search data over time, reporting a term’s popularity as compared with all other searches.