The only digital document format

What is a “document”?

A document is a record of some (typically written) content – a publication, a contract, a statement, a painting – at a moment in time. Until the advent of computers (and scanners), the media typically considered useable for such records included papyrus and vellum, which is basically leather. For a thousand years, more or less, paper has been the media of choice.

Margaret Hamilton standing with a stack of paper.
Margaret Hamilton led a team credited with developing the software for NASA’s Apollo and Skylab. Her  team was responsible for developing in-flight software, which included algorithms designed by various senior scientists for the Apollo command module and lunar lander. The image shows Hamilton in 1969, standing next to the navigation software that she and her MIT team produced for the Apollo project. Credit: NASA / Wikimedia Commons / Public Domain

That began to change in the 1980s.

Digital documents

PDF became the document format of choice for business, government and the general public because it delivers the key qualities of paper in a digital format. PDF is fixed, self-contained, readily shareable and relatively hard to change. It’s not just PDF’s innate characteristics that make it successful, but the fact that PDF pages interoperate smoothly with paper documents. “PDF it, send it, print it, sign it and return it” workflows introduced new efficiencies when the format surfaced into public consciousness in the mid-to-late 1990s. Even then, such workflows utilized only the most basic of PDF’s capabilities, but it was enough to dramatically accelerate the transition to digital documents. Within a few years, PDF files and email decimated document courier services.

Before long, users were scanning the signature page and adding it to (or replacing) the original page in the PDF; the cycle back to a digital document was complete. This new workflow, of course, was an extremely crude approach to facilitating document approvals, but the fact that end-users could do this very easily made PDF very tolerant of variations in workflow and records-keeping practices in a way that’s hard to imagine for databases and HTML.

PDF continues to evolve far beyond a simulacrum for paper. There’s a broad suite of features – tagging, XML-based metadata, attachments, 3D support, digital signatures and more – that support advanced document-handling and consuming workflows. PDF is so capable and so reliable, that some wonder why bother with an archival subset at all.

Documents are for keeping

Not every PDF is designed with reliability in mind. For all its well-deserved reputation for reliably conveying the author’s intent to any viewer, PDF allows developers to make files that rely on external resources, or use encryption; both capabilities are non-starters for the preservation community. If the world preserves PDF files as documents – and it does – then preservationists need PDF/A.

Introduced in 2005 as ISO 19005, PDF/A is now required or best-practice in workflows that generate valuable documents. Filing cabinets and storage boxes are disappearing as ECM systems, cloud storage and local capacity swallow the documents that used to exist only on paper. When new documents are shared, the common-ground is PDF. When finalized for records-retention purposes, ideally, they are PDF/A.

Some think HTML will “beat” PDF because it’s more flexible and less static, but this misconstrues both formats’ respective purposes and fails to appreciate that browser developers are (slowly) augmenting their support for PDF. PDF continues to gain in mind-share: Google’s Trends data shows clearly that the number of searches for PDF documents relative to all other searches continues going up.

PDF’s purpose is to serve in the role of “document”, with all that implies (see above). But that’s not the purpose of HTML. HTML isn’t a document, it’s an experience. PDF is how you keep it, and PDF/A is how you keep it forever.

Preserving the file’s actual bytes, of course, is up to you.

Documents of the future

This is not only the present, it’s also the future. PDF, an open, standardized, broadly-capable digital document technology, has proven equal to the transition from paper to the electronic world. PDF’s advanced metadata, authentication, semantic tagging, attachments, 3D and other features provide a proven framework for future development of digital documents. PDF has no competitors. Even in the world of SharePoint, OpenText, Office 365 and Google Docs, PDF and PDF/A represent the only sufficiently flexible and capable technology for archiving the gamut of digital document content.

(This piece was adapted from a recent blog post)

Categories: Archives & Libraries, PDF/A
Margaret Hamilton

What is a “document”? A document is a record of some (typically written) content – a publication, a contract, a statement, a painting – at a moment in time. Until the advent of computers (and scanners), the media typically considered useable for such r …

Share this article!

About the contributor

Duff Johnson

A veteran of the electronic document space, Duff Johnson is an independent consultant. He is Executive Director of the PDF Association and ISO Project co-Leader (and US TAG chair) for ISO 32000 and ISO 14289.
More contributions
What is a “Competence Center”?

The PDF Association started in 2006 as the “PDF/A Competence Center”. The mission was to identify – and thereby establish – a common interpretation of the PDF/A-1 specification. With that accomplished through meetings open to all members, the secondary …

“PDF can do THAT?!”

PDF files deliver a complete package of information that defines a document; everything that’s needed to represent the text, graphics and layout that the recipient receives. To most people, PDF is “electronic paper” – the digital expression of a cellul …

Save the Date: PDF Days Europe 2018, May 14-16, in Berlin

PDF Days Europe is the most popular PDF event of the year. It’s where the PDF industry meets, and where institutional and corporate users come to learn what else PDF could do for them. The first two PDF Days will offer a broad range of educational sessions focussed on current and perennial topics in the world of PDF technology implementation.

The Power of the Page

It’s a question that vexes vendors of web-based solutions everywhere: why do people still insist on PDF files? And why does PDF’s mindshare keep going up? “PDF is such antediluvian technology!” they say. “It’s pre-web, are you kidding me? It’s so old-f …

PDF Association technical resources: an overview

PDF is PDF because files produced with one vendor’s software can be read using a different vendor’s software with no loss of fidelity. Interoperability is key to our industry. The PDF Association is a international membership organization dedicated to …