The Portable Document Format – developed by Adobe Systems and first published in 1993 – offers a series of beneficial and hitherto unavailable characteristics. In addition to platform independence, this includes reliable rendering of page oriented contents. All components of a file such as fonts, texts, images and graphics are included in the PDF file itself. These advantages made PDF a candidate for standardization for selected usage areas and industries. What all standards have in common is that, as subsets, they restrict the extensive features of PDF and, in this way, optimise it for special application options.
PDF/X and PDF/VT: Standards for the Printing Industry
In the printing industry, it is extremely important that digital print material is reliably reproduced. The technical ISO committee TC 130 developed the PDF/X standard, whose first part was published in 2001 as the first international PDF standard. Since then additional parts have been developed as part of the ISO 15930 series of PDF/X standards. The most recent additions – PDF/X-4 and PDF/X-5 – have just been revised as of July 2010. The PDF/VT standard (“V” for variable data printing and “T” for transactional printing) adds a purely PDF based option to the widely used high volume printing formats like AFP and PPML. It enriches PDF for the purpose of variable data printing and has been published in August 2010.
PDF/A for Long-Term Archiving
PDF/A-1 was published by the ISO committee ISO TC 171 in autumn 2005 as the international standard ISO 19005-1 for long-term archiving. PDF/A is beneficial above all in the administration, archiving, library and publishing, as well as banking and insurance sectors, but also in manufacturing industries since any digital documents – from invoices, books and manuals to technical drawings – can be permanently stored using this standard. Work on the second standard part, PDF/A-2, was completed from a technical point of view in Paris in summer 2010. It is expected to be published early 2011. Just having PDF/A-2 completed, the PDF/A committee within ISO TC 171 started work on an additional PDF/A standard part, PDF/A-3.
PDF/UA: Standard for Accessible PDF
The accessibility of document content is becoming increasingly important, especially in the field of American and European authorities and administrative districts. The PDF/UA standard (UA stands for “Universal Accessibility”) is being developed as ISO 14289-1 within ISO TC 171. It aims to ensure that contents of PDF files are properly structured and thus are sufficiently accessible such that assistive technologies like screen readers can extract and present their content accordingly.
PDF/E: The PDF for Technical Documents
ISO standard 24517-1 was published in 2008. This standard supports the reliable and secure exchange of engineering related content.
Last but not Least: PDF 1.7 is an ISO Standard
Since summer 2008, the PDF file format has also been standardized as ISO standard 32000-1. It is based on PDF 1.7. New developments are being incorporated into the second standard part – ISO 32000-2 – also known as “PDF 2.0”, which is currently being developed and scheduled to be finalized in the second half of 2011.
Advantages of the PDF/A Standard
As a standard for long-term archiving, PDF/A guarantees the reliable reproduction of documents over many years, regardless of technological developments in hardware and software. PDF/A ensures a homogenous archive in which both digitally created and scanned documents can be stored. As an international ISO standard, PDF/A is valid throughout the world. The environmental and climate aspect is also in PDF/A’s favour. Paper documents can increasingly be replaced by PDF/A documents since PDF/A is able to satisfy all requirements regarding longevity and binding nature (signatures).
PDF/A-2: Central Innovations
Whereas PDF/A-1 is based on PDF 1.4, PDF/A-2 takes advantage of features that only became available in later versions of PDF, up to and including PDF 1.7. PDF/A-2 is no longer based on a specification published by Adobe, but on the internationally approved ISO standard 32000-1. The following overview shows the core innovations of PDF/A-2 which all users should be aware of.
JPEG2000 Image Compression
The powerful compression method JPEG2000 (ISO/IEC 15444) was not supported in PDF/A-1 as it only was introduced into the PDF specification when PDF 1.5 was released. The new JPEG2000 options are interesting for scanned documents, among other things, because higher compression rates can be achieved at a better quality than with the older JPEG format. Furthermore, JPEG2000 offers a highly efficient lossless compression level.
With JPEG2000, libraries and archives can digitize historic maps, books or documents, for example, at the best quality possible, and create size-optimised PDF/A-2 files with JPEG2000 from them. The relevant metadata for the object can be stored directly in the PDF/A-2 document.
Embedded PDF/A Files via Collections
Making use of collections – also called “portfolios” in Acrobat – users can combine several files into one “container PDF”. PDF/A-2 can now be used to compile PDF/A collections from several PDF/A files. File formats other than PDF/A are intentionally not allowed inside PDF/A collections. One possible use of PDF/A collections is the archival of e-mails – e-mail attachments (such as Word files) can be converted to PDF/A and archived alongside, but as a separate entity, the e-mail body.
PDF/A collections can also be advantageous in the social security sector, as signatures can be applied to single scanned pages. The PDF/A collection then combines the signed single pages of the whole document. One or more pages can be subsequently removed without affecting the validity of the signatures for the remaining pages.
Although transparency is part of PDF 1.4, at the time of the PDF/A-1 standardization it was not yet well enough supported and thus excluded for standard conforming PDF/A-1 files. In the meantime technology has substantially matured, and transparency has become a common characteristic of numerous PDF files.
Transparency is often used for a design element in the form of drop shadows, crossfades or vignettes. It may appear unintentionally in a PDF file, for example if the original file is a PowerPoint presentation, or where text is marked up with highlighting. The use of transparency is now permitted in PDF/A-2.
PDF “Optional Content” or Layers
PDF/A-2 supports optional content, also often referred to as PDF layers. Optional content is helpful for technical construction drawings and plans, among other things, as the contents can be shown or hidden according to topic, such as the electrics or water supply for a building. Layers can also be used to display multilingual contents – such as an international catalogue – in a single PDF file. With the layers function, users can switch between English, Japanese and German text, for example, while the graphics remain the same.
The cross-platform OpenType fonts themselves are standardized as ISO/IEC 14496-22. These fonts provide extensive support for Unicode. OpenType fonts exist as TrueType (suffix “.ttf”) and PostScript variants (suffix “.otf”).
In PDF/A-2, these fonts can now be directly embedded without first having to convert them – as was necessary with PDF/A-1 – into the older formats PostScript Type 1 or TrueType.
New Conformance Level PDF/A-2u – “u” for Unicode
Conformance level “b” stands for “basic”. PDF/A-1b and PDF/A-2b focus on visual integrity. PDF/A-1a and PDF/A-2a (“a” for “accessible”) contain additional features. These PDF/A documents also include structural information (such as information about paragraphs, headings or columns) as well as semantic information through the use of Unicode and alternate text. The latter is important to ensure that Copy&Paste from PDF/A files works without problems, and to ensure correct indexing of text. New to PDF/A-2 is the conformance level “PDF/A-2u” (“u” for “Unicode”). As a slimmed-down version of level “a”, it offers the advantages of Unicode (text searching and copying text) without having to adhere to any complex structural requirements that may be required by the “a” conformance level. PDF/A-2u is feasible both for digitally-created PDF files and for scanned documents with subsequent text recognition.
Object Level XMP Metadata
In the metadata domain, PDF/A-2 specifies the requirements that are imposed on custom XMP metadata fields for content objects – this goes beyond PDF/A-1 insofar as there only the document level metadata were subject to these provisions. User-defined fields also on the level of content objects must now be defined using an extension schema if they are to be PDF/A-compliant.
New Comment Types and Annotations
PDF/A-2 establishes revised provisions around comments in PDF files. Some new annotation types – like 3D annotations – were added to the list of prohibited annotation types, while other new annotations introduced after PDF 1.4 (like text editing comments) are now allowed.
From the beginning, PDF/A intentionally allowed the use of electronic signatures. In PDF/A-2 more specific guidance has been added with regard to how digital signatures should be applied to guarantee interoperability. PDF/A-2 carries over provisions from the ETSI/PadES standard. PAdES (PDF Advanced Electronic Signatures) is a set of restrictions and enhancements to the PDF standard in accordance with ISO 32000-1 in order to improve the integration and use of advanced electronic signatures. ETSI has standardized the PAdES standard under TS 102 778.
Switching to PDF/A-2: Considerations and Strategies
The most important fact first of all: PDF/A-2 will not replace or supersede PDF/A-1 in any way. PDF/A-1 compliant documents that were already created will remain valid PDF files for long-term archiving. Archived PDF/A-1 files can remain unchanged in the data archive; an “update” to PDF/A-2 is not necessary here and does not usually make sense, since a PDF/A-1 document is always also a valid PDF/A-2 document.
Creating an Individual Requirements Profile
When reviewing the new PDF/A-2 features, users who discover functions already on their personal wish-list are more likely to benefit from upgrading than those who are completely satisfied with the features of PDF/A-1. A functional, successful archive system relying on PDF/A-1 can remain to be based on PDF/A-1. Once the PDF/A-2 standard is published early 2011, in the initial phase, only a few tools will be available that have already implemented the full extent of the new ISO standard. This may impact the individual schedule for organisations intending to move to PDF/A-2.
Anyone dealing with the topic of PDF/A for the first time has to choose whether or not to use PDF/A-2 immediately. Here, too, the availability of the relevant software tools must be verified. In principle, there is nothing objectionable regarding the use of PDF/A-1 for long-term archiving now or in the future, since software is and will remain available, and the know-how that beginners acquire with PDF/A-1 can also be used to a large extent with PDF/A-2.
Even with ongoing projects, if PDF/A-1 satisfies all requirements, then the workflow should remain unchanged. If, however, PDF/A-2 offers crucial features that cannot be implemented based on PDF/A-1, the upgrade should be started at an appropriate point in time.
PDF/A-2 Innovations for Developers
The following new or improved technical functions of PDF/A-2 should be interesting for developers and programmers in particular.
|Feature||New in PDF/A-2|
|Conformance level “A”||Extended requirements for conformance level “A”|
|File header||Only file headers from %PDF-1.0 … %PDF-1.7 are allowed|
|Structure and tags||Mapping of user-defined tags and standard tags in a role map|
|Compressed object streams||PDF/A-2 supports compressed object streams, which were introduced with PDF version 1.5|
|Revision of the restrictions for the implementation||Among other things, the limit to 8191 array objects was lifted|
|Linearized PDF||No longer regulated by PDF/A-2|
|Appearance of comments||Annotation appearance is no longer required if the area is empty, or if the annotation is a link or popup annotation|
|History entry in XMP||If the history entry exists, certain rules apply|
|ICC profiles||Latest version ICC v4 is supported|
|Default CMYK||Improved provision|
|Prepress: overprinting, CMYK||Provisions for the use of overprinting mode and ICC-based CMYK|
|Spot colours||Spot colours must be consistent with regard to the alternative colour space|
|Name objects in valid UTF-8||Certain name objects such as for spot colours or structure types have to be encoded as UTF-8|
|Subset fonts||Provisions for subset of fonts were revised with regard to CharSet and CIDSet|
|TrueType||Encoding requirements for TrueType were revised
(differences array; Adobe Glyph List)
|.notdef glyphs||The use of .notdef glyphs (placeholder for glyphs in a font that are needed for rendering but are not present in the font) is no longer allowed|
|Namespace prefixes||Fewer stipulations regarding prefixes of namespaces|
|Document requirements key||Not allowed|
|XFA||The XML-based standard XFA (XML Forms Architecture) is now partially allowed|
Also New in PDF/A-2
Below is an overview in table form of further changes and improvements to PDF/A-2, mostly related to specific technical details.
|Feature||New in PDF/A-2|
|Links||Now also possible in the form of multi-rectangle link annotations
(link in the form of several related rectangles at a line break)
|Links with PDF collections||Links can be set to, from, or between embedded PDF/A files|
|Freeform comments||Freeform annotations, such as polygons, are allowed|
|User unit||Page sizes on a 1:1 scale of up to 381 km feed size (previously 5.08 m)
|Units of measure||Support of measurement properties; important for technical documents|
|Structured PDF||Extended options for tagged PDF|
|Encryption||Extended options for encryption|
|Electronic signatures||Extended options|
|Colours: DeviceN||Maximum number of colourations/colourants in DeviceN|
|Colours: NChannel||NChannel supported|
PDF/A Standard: Literature
- ISO 19005-1:2005, Document management – Electronic document file format for long-term preservation – Part 1: Use of PDF 1.4 (PDF/A-1), www.iso.org. (2005).
- ISO/DIS 19005-2, Document management – Electronic document file format for long-term preservation – Part 2: Use of ISO 32000-1 (PDF/A-2), www.iso.org. (2010).
- PDF/A Competence Center, www.pdfa.org.
- ISO 32000-1, Document management – Portable document format – PDF 1.7, www.iso.org. (2008).
- PDF Reference, Third Edition, Adobe Portable Document Format Version 1.4, www.adobe.com. (2001).
- PDF Reference, Sixth Edition, Adobe Portable Document Format Version 1.7, www.adobe.com. (2007).
- ISO/IEC 15444-1:2004, Information technology – JPEG 2000 image coding system – Part 1: Core coding system, www.iso.org. (2004).
- ISO/IEC 15444-2:2004, Information technology – JPEG 2000 image coding system: Extensions, www.iso.org. (2004)
- ETSI TS 102 778-3, Electronic Signatures and Infrastructures (ESI); PDF Advanced Electronic Signature Profiles; Part 2:”PAdES Basic – Profile based on ISO 32000-1”; ETSI, www.etsi.org. (2009)
- PDF Advanced Electronic Signature (PAdES), FAQ: www.padesfaq.net