A document is a record of some (typically written) content – a publication, a contract, a statement, a painting – at a moment in time. Until the advent of computers (and scanners), the media typically considered useable for such records included papyrus and vellum, which is basically leather. For a thousand years, more or less, paper has been the media of choice.
That began to change in the 1980s.
PDF became the document format of choice for business, government and the general public because it delivers the key qualities of paper in a digital format. PDF is fixed, self-contained, readily shareable and relatively hard to change. It’s not just PDF’s innate characteristics that make it successful, but the fact that PDF pages interoperate smoothly with paper documents. “PDF it, send it, print it, sign it and return it” workflows introduced new efficiencies when the format surfaced into public consciousness in the mid-to-late 1990s. Even then, such workflows utilized only the most basic of PDF’s capabilities, but it was enough to dramatically accelerate the transition to digital documents. Within a few years, PDF files and email decimated document courier services.
Before long, users were scanning the signature page and adding it to (or replacing) the original page in the PDF; the cycle back to a digital document was complete. This new workflow, of course, was an extremely crude approach to facilitating document approvals, but the fact that end-users could do this very easily made PDF very tolerant of variations in workflow and records-keeping practices in a way that’s hard to imagine for databases and HTML.
PDF continues to evolve far beyond a simulacrum for paper. There’s a broad suite of features – tagging, XML-based metadata, attachments, 3D support, digital signatures and more – that support advanced document-handling and consuming workflows. PDF is so capable and so reliable, that some wonder why bother with an archival subset at all.
Not every PDF is designed with reliability in mind. For all its well-deserved reputation for reliably conveying the author’s intent to any viewer, PDF allows developers to make files that rely on external resources, or use encryption; both capabilities are non-starters for the preservation community. If the world preserves PDF files as documents – and it does – then preservationists need PDF/A.
Introduced in 2005 as ISO 19005, PDF/A is now required or best-practice in workflows that generate valuable documents. Filing cabinets and storage boxes are disappearing as ECM systems, cloud storage and local capacity swallow the documents that used to exist only on paper. When new documents are shared, the common-ground is PDF. When finalized for records-retention purposes, ideally, they are PDF/A.
Some think HTML will “beat” PDF because it’s more flexible and less static, but this misconstrues both formats’ respective purposes and fails to appreciate that browser developers are (slowly) augmenting their support for PDF. PDF continues to gain in mind-share: Google’s Trends data shows clearly that the number of searches for PDF documents relative to all other searches continues going up.
PDF’s purpose is to serve in the role of “document”, with all that implies (see above). But that’s not the purpose of HTML. HTML isn’t a document, it’s an experience. PDF is how you keep it, and PDF/A is how you keep it forever.
Preserving the file’s actual bytes, of course, is up to you.
This is not only the present, it’s also the future. PDF, an open, standardized, broadly-capable digital document technology, has proven equal to the transition from paper to the electronic world. PDF’s advanced metadata, authentication, semantic tagging, attachments, 3D and other features provide a proven framework for future development of digital documents. PDF has no competitors. Even in the world of SharePoint, OpenText, Office 365 and Google Docs, PDF and PDF/A represent the only sufficiently flexible and capable technology for archiving the gamut of digital document content.
(This piece was adapted from a recent blog post)
What is a “document”? A document is a record of some (typically written) content – a publication, a contract, a statement, a painting – at a moment in time. Until the advent of computers (and scanners), the media typically considered useable for such r …
The PDF Techniques Accessibility Summit’s objective is to establish a broad-based understanding of how PDF files should be tagged for accessibilty. It’s an opportunity to focus on establishing a common set of examples of accessible PDF content, and identify best-practice when tagging difficult cases.Modernizing PDF Techniques for Accessibility
The PDF Techniques Accessibility Summit will identify best-practices in tagging various cases in PDF documents. Questions to be addressed will likely include: the legal ways to tag a nested list, the correct way to caption multiple images, the appropriate way to organize content within headings.Refried PDF
My hospital emailed me a medical records release form as a PDF. They told me to print it, fill it, sign it, scan it and return it to the medical records department, in that order. In 2018? To get the form via email (i.e., electronically), yet be asked to print it? Did the last 20 years just… not mean anything! So I thought I’d be clever. I’d fill it first, THEN print it. Or better yet, never print it, but sign it anyhow, and return it along with a note making the case for improving their workflow. The story continues…Slides and video recordings of PDF Days Europe 2018
You missed the PDF Days Europe 2018? Never mind! Here you can find the slides and video recordings of all 32 stunning sessions!Using PDF/UA in accessibility checklists
PDF/UA, like PDF itself, is internally complex, but used correctly, actually makes things easier.