PDF/A-2 – Technical Overview

4th International PDF/A Conference • Proceedings • PDF/A Forever • Long-Term Archiving with PDF

The Portable Document Format – developed by Adobe Systems and first published in 1993 – offers a series of beneficial and hitherto unavailable characteristics. In addition to platform independence, this includes reliable rendering of page oriented contents. All components of a file such as fonts, texts, images and graphics are included in the PDF file itself. These advantages made PDF a candidate for standardization for selected usage areas and industries. What all standards have in common is that, as subsets, they restrict the extensive features of PDF and, in this way, optimise it for special application options.

PDF/X and PDF/VT: Standards for the Printing Industry

In the printing industry, it is extremely important that digital print material is reliably reproduced. The technical ISO committee TC 130 developed the PDF/X standard, whose first part was published in 2001 as the first international PDF standard. Since then additional parts have been developed as part of the ISO 15930 series of PDF/X standards. The most recent additions – PDF/X-4 and PDF/X-5 – have just been revised as of July 2010. The PDF/VT standard (“V” for variable data printing and “T” for transactional printing) adds a purely PDF based option to the widely used high volume printing formats like AFP and PPML. It enriches PDF for the purpose of variable data printing and has been published in August 2010.

PDF/A for Long-Term Archiving

PDF/A-1 was published by the ISO committee ISO TC 171 in autumn 2005 as the international standard ISO 19005-1 for long-term archiving. PDF/A is beneficial above all in the administration, archiving, library and publishing, as well as banking and insurance sectors, but also in manufacturing industries since any digital documents – from invoices, books and manuals to technical drawings – can be permanently stored using this standard. Work on the second standard part, PDF/A-2, was completed from a technical point of view in Paris in summer 2010. It is expected to be published early 2011. Just having PDF/A-2 completed, the PDF/A committee within ISO TC 171 started work on an additional PDF/A standard part, PDF/A-3.

PDF/UA: Standard for Accessible PDF 

The accessibility of document content is becoming increasingly important, especially in the field of American and European authorities and administrative districts. The PDF/UA standard (UA stands for “Universal Accessibility”) is being developed as ISO 14289-1 within ISO TC 171. It aims to ensure that contents of PDF files are properly structured and thus are sufficiently accessible such that assistive technologies like screen readers can extract and present their content accordingly.

PDF/E: The PDF for Technical Documents

ISO standard 24517-1 was published in 2008. This standard supports the reliable and secure exchange of engineering related content.

Last but not Least: PDF 1.7 is an ISO Standard 

Since summer 2008, the PDF file format has also been standardized as ISO standard 32000-1. It is based on PDF 1.7. New developments are being incorporated into the second standard part – ISO 32000-2 – also known as “PDF 2.0”, which is currently being developed and scheduled to be finalized in the second half of 2011.

Advantages of the PDF/A Standard

As a standard for long-term archiving, PDF/A guarantees the reliable reproduction of documents over many years, regardless of technological developments in hardware and software. PDF/A ensures a homogenous archive in which both digitally created and scanned documents can be stored. As an international ISO standard, PDF/A is valid throughout the world. The environmental and climate aspect is also in PDF/A’s favour. Paper documents can increasingly be replaced by PDF/A documents since PDF/A is able to satisfy all requirements regarding longevity and binding nature (signatures).

PDF/A-2: Central Innovations

Whereas PDF/A-1 is based on PDF 1.4, PDF/A-2 takes advantage of features that only became available in later versions of PDF, up to and including PDF 1.7. PDF/A-2 is no longer based on a specification published by Adobe, but on the internationally approved ISO standard 32000-1. The following overview shows the core innovations of PDF/A-2 which all users should be aware of.

JPEG2000 Image Compression

The powerful compression method JPEG2000 (ISO/IEC 15444) was not supported in PDF/A-1 as it only was introduced into the PDF specification when PDF 1.5 was released. The new JPEG2000 options are interesting for scanned documents, among other things, because higher compression rates can be achieved at a better quality than with the older JPEG format. Furthermore, JPEG2000 offers a highly efficient lossless compression level.

With JPEG2000, libraries and archives can digitize historic maps, books or documents, for example, at the best quality possible, and create size-optimised PDF/A-2 files with JPEG2000 from them. The relevant metadata for the object can be stored directly in the PDF/A-2 document.

Embedded PDF/A Files via Collections

Making use of collections – also called “portfolios” in Acrobat – users can combine several files into one “container PDF”. PDF/A-2 can now be used to compile PDF/A collections from several PDF/A files. File formats other than PDF/A are intentionally not allowed inside PDF/A collections. One possible use of PDF/A collections is the archival of e-mails – e-mail attachments (such as Word files) can be converted to PDF/A and archived alongside, but as a separate entity, the e-mail body.

PDF/A collections can also be advantageous in the social security sector, as signatures can be applied to single scanned pages. The PDF/A collection then combines the signed single pages of the whole document. One or more pages can be subsequently removed without affecting the validity of the signatures for the remaining pages.

Transparency

Although transparency is part of PDF 1.4, at the time of the PDF/A-1 standardization it was not yet well enough supported and thus excluded for standard conforming PDF/A-1 files. In the meantime technology has substantially matured, and transparency has become a common characteristic of numerous PDF files.

Transparency is often used for a design element in the form of drop shadows, crossfades or vignettes. It may appear unintentionally in a PDF file, for example if the original file is a PowerPoint presentation, or where text is marked up with highlighting. The use of transparency is now permitted in PDF/A-2.

PDF “Optional Content” or Layers

PDF/A-2 supports optional content, also often referred to as PDF layers. Optional content is helpful for technical construction drawings and plans, among other things, as the contents can be shown or hidden according to topic, such as the electrics or water supply for a building. Layers can also be used to display multilingual contents – such as an international catalogue – in a single PDF file. With the layers function, users can switch between English, Japanese and German text, for example, while the graphics remain the same.

OpenType Fonts

The cross-platform OpenType fonts themselves are standardized as ISO/IEC 14496-22. These fonts provide extensive support for Unicode. OpenType fonts exist as TrueType (suffix “.ttf”) and PostScript variants (suffix “.otf”).

In PDF/A-2, these fonts can now be directly embedded without first having to convert them – as was necessary with PDF/A-1 – into the older formats PostScript Type 1 or TrueType.

New Conformance Level PDF/A-2u – “u” for Unicode

Conformance level “b” stands for “basic”. PDF/A-1b and PDF/A-2b focus on visual integrity. PDF/A-1a and PDF/A-2a (“a” for “accessible”) contain additional features. These PDF/A documents also include structural information (such as information about paragraphs, headings or columns) as well as semantic information through the use of Unicode and alternate text. The latter is important to ensure that Copy&Paste from PDF/A files works without problems, and to ensure correct indexing of text. New to PDF/A-2 is the conformance level “PDF/A-2u” (“u” for “Unicode”). As a slimmed-down version of level “a”, it offers the advantages of Unicode (text searching and copying text) without having to adhere to any complex structural requirements that may be required by the “a” conformance level. PDF/A-2u is feasible both for digitally-created PDF files and for scanned documents with subsequent text recognition.

Object Level XMP Metadata

In the metadata domain, PDF/A-2 specifies the requirements that are imposed on custom XMP metadata fields for content objects – this goes beyond PDF/A-1 insofar as there only the document level metadata were subject to these provisions. User-defined fields also on the level of content objects must now be defined using an extension schema if they are to be PDF/A-compliant.

New Comment Types and Annotations

PDF/A-2 establishes revised provisions around comments in PDF files. Some new annotation types – like 3D annotations – were added to the list of prohibited annotation types, while other new annotations introduced after PDF 1.4 (like text editing comments) are now allowed.

Digital Signatures

From the beginning, PDF/A intentionally allowed the use of electronic signatures. In PDF/A-2 more specific guidance has been added with regard to how digital signatures should be applied to guarantee interoperability. PDF/A-2 carries over provisions from the ETSI/PadES standard. PAdES (PDF Advanced Electronic Signatures) is a set of restrictions and enhancements to the PDF standard in accordance with ISO 32000-1 in order to improve the integration and use of advanced electronic signatures. ETSI has standardized the PAdES standard under TS 102 778.

Switching to PDF/A-2: Considerations and Strategies 

The most important fact first of all: PDF/A-2 will not replace or supersede PDF/A-1 in any way. PDF/A-1 compliant documents that were already created will remain valid PDF files for long-term archiving. Archived PDF/A-1 files can remain unchanged in the data archive; an “update” to PDF/A-2 is not necessary here and does not usually make sense, since a PDF/A-1 document is always also a valid PDF/A-2 document.

Creating an Individual Requirements Profile

When reviewing the new PDF/A-2 features, users who discover functions already on their personal wish-list are more likely to benefit from upgrading than those who are completely satisfied with the features of PDF/A-1. A functional, successful archive system relying on PDF/A-1 can remain to be based on PDF/A-1. Once the PDF/A-2 standard is published early 2011, in the initial phase, only a few tools will be available that have already implemented the full extent of the new ISO standard. This may impact the individual schedule for organisations intending to move to PDF/A-2.

Anyone dealing with the topic of PDF/A for the first time has to choose whether or not to use PDF/A-2 immediately. Here, too, the availability of the relevant software tools must be verified. In principle, there is nothing objectionable regarding the use of PDF/A-1 for long-term archiving now or in the future, since software is and will remain available, and the know-how that beginners acquire with PDF/A-1 can also be used to a large extent with PDF/A-2.

Even with ongoing projects, if PDF/A-1 satisfies all requirements, then the workflow should remain unchanged. If, however, PDF/A-2 offers crucial features that cannot be implemented based on PDF/A-1, the upgrade should be started at an appropriate point in time.

PDF/A-2 Innovations for Developers

The following new or improved technical functions of PDF/A-2 should be interesting for developers and programmers in particular. 


Feature New in PDF/A-2
Conformance level “A” Extended requirements for conformance level “A”
File header Only file headers from %PDF-1.0 … %PDF-1.7 are allowed
Structure and tags Mapping of user-defined tags and standard tags in a role map
Compressed object streams PDF/A-2 supports compressed object streams, which were introduced with PDF version 1.5
Revision of the restrictions for the implementation Among other things, the limit to 8191 array objects was lifted
Linearized PDF No longer regulated by PDF/A-2
Appearance of comments Annotation appearance is no longer required if the area is empty, or if the annotation is a link or popup annotation
History entry in XMP If the history entry exists, certain rules apply
ICC profiles Latest version ICC v4 is supported
Default CMYK Improved provision
Prepress: overprinting, CMYK Provisions for the use of overprinting mode and ICC-based CMYK
Spot colours Spot colours must be consistent with regard to the alternative colour space
Name objects in valid UTF-8 Certain name objects such as for spot colours or structure types have to be encoded as UTF-8
Subset fonts Provisions for subset of fonts were revised with regard to CharSet and CIDSet
TrueType Encoding requirements for TrueType were revised
(differences array; Adobe Glyph List)
.notdef glyphs The use of .notdef glyphs (placeholder for glyphs in a font that are needed for rendering but are not present in the font) is no longer allowed
Namespace prefixes Fewer stipulations regarding prefixes of namespaces
Document requirements key Not allowed
XFA The XML-based standard XFA (XML Forms Architecture) is now partially allowed

Also New in PDF/A-2

Below is an overview in table form of further changes and improvements to PDF/A-2, mostly related to specific technical details.

Feature New in PDF/A-2
Links  Now also possible in the form of multi-rectangle link annotations
(link in the form of several related rectangles at a line break)
Links with PDF collections Links can be set to, from, or between embedded PDF/A files
Freeform comments  Freeform annotations, such as polygons, are allowed
User unit Page sizes on a 1:1 scale of up to 381 km feed size (previously 5.08 m)
are possible
Units of measure Support of measurement properties; important for technical documents
Structured PDF Extended options for tagged PDF
Encryption Extended options for encryption
Electronic signatures Extended options
Colours: DeviceN Maximum number of colourations/colourants in DeviceN
Colours: NChannel NChannel supported


PDF/A Standard: Literature

  • ISO 19005-1:2005, Document management – Electronic document file format for long-term preservation – Part 1: Use of PDF 1.4 (PDF/A-1), www.iso.org. (2005).
  • ISO/DIS 19005-2, Document management – Electronic document file format for long-term preservation – Part 2: Use of ISO 32000-1 (PDF/A-2), www.iso.org. (2010).
  • PDF/A Competence Center, www.pdfa.org.

PDF Standard: 

  • ISO 32000-1, Document management – Portable document format – PDF 1.7, www.iso.org. (2008).

PDF Reference: 

  • PDF Reference, Third Edition, Adobe Portable Document Format Version 1.4, www.adobe.com. (2001).
  • PDF Reference, Sixth Edition, Adobe Portable Document Format Version 1.7, www.adobe.com. (2007).

JPEG 2000: 

  • ISO/IEC 15444-1:2004, Information technology – JPEG 2000 image coding system – Part 1: Core coding system, www.iso.org. (2004).
  • ISO/IEC 15444-2:2004, Information technology – JPEG 2000 image coding system: Extensions, www.iso.org. (2004)

Signatures

  • ETSI TS 102 778-3, Electronic Signatures and Infrastructures (ESI); PDF Advanced Electronic Signature Profiles; Part 2:”PAdES Basic – Profile based on ISO 32000-1”; ETSI, www.etsi.org. (2009)
  • PDF Advanced Electronic Signature (PAdES), FAQ: www.padesfaq.net

About Olaf Drümmer

Olaf Drümmer is founder and managing director of callas software, a Berlin/Germany based company specializing in PDF analysis and processing. Having first been involved in the development of PDF related standard in 1999, he has since actively participated in the development of the PDF/X (ISO 15930), PDF/A (ISO 19005), PDF/E (ISO 24517), PDF/UA (ISO 14289) and PPML/VDX resp. PDF/VT (ISO 16612) standards series, as well as the PDF standard itself (ISO 32000). callas software was among the first vendors to offer support for PDF/X and PDF/A. In addition, Olaf Drümmer is founder and managing director of axaio software, a Berlin/Germany based company specializing in development of software extensions for Adobe InDesign, Adobe InCopy, Adobe Illustrator, QuarkXPress, QuarkCopyDesk, Quark Publishing System (QPS), WoodWing Enterprise and vjoon K4. Furthermore, Olaf Drümmer is Chairman of the European Color Initiative (ECI) (www.eci.org).

Leave a Reply