FAQs: You ask, the PDF/A Competence Center answers
What is PDF/A, what does it have to offer, and how can I best use it? Answers to these and other reader questions dealing with long-term archiving and PDF/A are provided on this page. Do you have a question that is not yet addressed here? Just send us an E-mail using the red “Ask Us About…” button in the right-hand column, and you’ll receive an answer from the experts at the PDF/A Competence Center.
If I have PDF/A, do I no longer need paper copies?
This is a primary goal of PDF/A. Digital documents should remain in electronic format, giving the user a wide range of additional features, e.g. like full-text searching instead of manually looking through paper dossiers or file cards.
Up to now we’ve used “normal” PDF for archiving digital documents. Is this not sufficient? Why is PDF/A better?
PDF/A is a subset of PDF that eliminates certain risks threatening the one-to-one future reproducibility of the content. PDF/A forbids dynamic content to ensure that the user sees the exact same content both today and for years to come. Everything that is required to render the document the exact same way, every time, is contained in the PDF/A file: fonts, colour profiles, images etc. PDF/A is also an ISO standard, guaranteeing that future software generations will know how to open and render PDF/A files.
How does PDF/A compare to ODF (Open Document Format)?
Open Document Format is a file format for office use, based on XML. ODF has some of the good properties also found in PDF/A: the specification is publicly available, it is an international standard (ISO/IEC 26300:2006). However, ODF is not self contained and (currently) nowhere nears as widely distributed as PDF.
How does PDF/A fare with respect to corruption?
PDF/A fares just as well as any other file format. More commonly, the data medium itself (CD-ROM, hard disk, etc.) is the reason that files cannot be read, and not the file format.
Are PowerPoint files good candidates for converting to PDF/A?
Yes. You might however have to take into consideration certain preparatory steps, for example ensuring that annotations are also carried over into the PDF/A file.
Can PDF/A files contain copyright information, like TIFF can?
Yes. PDF/A gives you the possibility to save various different metadata (for example the copyright) in the document. Extensible Metadata Platform (XMP), a technology that unifies different metadata methods, is used for the metadata in a PDF/A file.
How are PDF/A and PDF/X related to each other? Is PDF/A a subset of PDF/X, or do the two standards just overlap? If I have a PDF/A file, what is missing to make it PDF/X?
The PDF/A and PDF/X standard were created by the ISO such that they are to a great extent compatible with each other, i.e. a PDF file can be both PDF/A and PDF/X conforming.
The most important differences are:
PDF/A does not require the following aspects that are mandatory in PDF/X, or least are very prevalent:
- Information about page geometry
- Overfill settings
- Trapping annotations
PDF/A has additional requirements for metadata not found in PDF/X-1a and PDF/X-3:
- if certain metadata is contained in the “Document Information”, it must also be included in the XMP metadata in an equivalent manner
PDF/X-1 and PDF/X-3, on the other hand, forbid certain elements that are permitted in PDF/A:
- Comments and form fields in the printable area of a page
- JBIG2 (Bitmap compression method)
In addition, PDF/A does not require that fonts for “invisible” text be embedded (commonly used with scanned pages that have an invisible text from an OCR text recognition placed over them). PDF/X requires all font to be embedded, even for invisible text.
There is another important difference with respect to the so-called OutputIntent:
- An OutputIntent indicates for which output purpose a PDF file was created (e.g. a specific print process like sheet offset on coated paper)
- PDF/X requires that an OutputIntent always be present
- With PDF/X, an OutputIntent must refer to output conditions for printing
- An OutputIntent is not mandatory with PDF/A, it is only required if device dependent process colour spaces are used, for example DeviceCMYK or DeviceRGB. The OutputIntent then provides a colour characterization of the device-dependent colour information.
- Important: PDF/X and PDF/A can have separate OutputIntents. However, PDF/A requires that if two (or more) OutputIntents are present, the destination output profiles must be identical.
If you have a PDF/A file and want to make it PDF/X (without losing the PDF/A characteristics), it could be difficult in the following cases:
- an OutputIntent for a “Monitor” is present
- Annotations or form fields are present
- there are scanned pages that have invisible OCR text incorporated
In all other cases it should work quite easily.
The other way around is usually simple. As a rule, every PDF/X-1a or PDF/X-3 file can be saved as PDF/A-1b.
This can be tested with Acrobat Professional 8. The Preflight function contained therein offers different possibilities, e.g. save a PDF/A as PDF/X, or save a PDF file as PDF/A and PDF/X in one step. The repair feature in Preflight can also help you perform some corrections to the files.
Will future developments to the PDF/A standard make current PDF/A versions obsolete?
The ISO standard requires that future PDF viewing applications must be backward compatible, so that they are capable of correctly displaying older versions of PDF/A.
How can you best make PDF/A files text searchable?
If a PDF/A file is created from a digital text document, the text will automatically be recognized. Normally you don’t have to worry about a PDF/A file being text searchable, unless the file was created from a scanned paper document or image. But even then, OCR can be used to make it searchable. In this case, however, only PDF/A-1b is possible and not the more stringent PDF/A-1a.
Since PDF/A-1a lays especially high requirements on the fonts used and on the file structure, this level of conformance is recommended over PDF/A-1b for an exact text searchability, text extraction, and for the reuse of content. PDF/A-1b supports text search, may however not find all instances of a text if there are anomalies with character coding.
How are (hypertext) links dealt with in PDF/A?
It should be ensured that the links in a document are still legible, even if they don’t point to an active destination.
What about “mixed” objects in PDF, like audio and video. Can these be used in PDF/A?
PDF files with dynamic objects like audio and video cannot be converted to PDF/A. PDF/A must guarantee an exact reproducibility, which is not possible with embedded objects like sound or movies. These types of objects usually require an external player (and quite often in a specific version). There is no guarantee that the player application will be available in the future.
How is PDF/A on memory space? Are PDF/A files much larger than normal PDFs?
A PDF/A file might be marginally larger than the original PDF file it was created from (provided they don’t use different image resolutions or compression methods). Fonts are embedded in a PDF/A file (which is often also the case in “normal” PDF files) and more information is stored in the metadata. Some colour profiles could, in certain cases, lead to a much larger file size, but this is rare and is highly dependent on the particular case.
Can PDF/A be used with CAD?
CAD drawings (Computer Aided Design) are electronic technical drawings that can be produced by several different software applications, each using it’s own proprietary file format. The long-term archiving of CAD documents can cause difficulties due to the native electronic formats. PDF/A is therefore very suitable for archiving CAD drawings, unless the drawings contain 3-D objects (which are not permitted in PDF/A). In this case, a possible solution is the PDF/E standard (”E” for “Engineering”).
Is compression allowed in PDF/A?
Yes. ZIP file compression is permitted, and images can be compressed using JPEG compression. LZW compression is, however, forbidden.
Can PDF/A files be encrypted?
No. Encryption is not permitted in PDF/A files. If, for example, a file requires a password to open it, then either a person who knows the password or a digital key must be available. The content of a PDF/A file must however always be accessible, which is why encryption is not allowed. A possible way to protect sensitive data in a PDF/A file is to put access constraints on the storage medium where the file is located (e.g. password access to the folder).
Do PDF files from Acrobat 8/9/X, that were saved as 1.7 files, lose any properties when they’re converted to PDF/A?
This is quite possible. PDF/A is based on PDF 1.4. If the PDF file is using features that were first introduced with PDF 1.5, 1.6 or 1.7, these could be lost with the conversion to PDF/A.
Can bookmarks in a PDF file cause problems in PDF/A?
No. Bookmarks are permitted in PDF/A.
Can PDF/A documents be made “accessible”? We want to create documents that can be read by a screen reader.
Accessibility and PDF/A go hand-in-hand. Since both PDF/A-1a and “accessible PDF” files have special requirements for the file structure and on the fonts used, it is relatively unproblematic to create accessible PDF files that also conform to the PDF/A standard.
Can annotations and notes be used in PDF/A files?
Most forms of comments (annotations and notes) are permitted. There are however certain requirements, for example the comments must be visible, and they cannot be of the type “sound” or an attached file. Some types of comments that were introduced after PDF 1.4 (the basis for PDF/A) are not permitted, for example Polygon tool.
How to do multiple document authors in the XMP metadata properly for PDF/A-1b files?
We use a dc:author container bag, but in Acrobat 8’s preflight that throws an error: The Author field in the document’s Info dictionary does not match the Author entry in the document’s XMP Metadata. Using a single author does not generate the error.
The issue you are running into is due to a clash between the “old” document info metadata approach in PDF and the more modern XMP based approach, both of which can be used in PDF though the two if used at the same time in a PDF/A file must meet certain requirements of the PDF/A standard.
For the author entry the
- document info field “Author” only has one string to keep the author name or names
- the XMP dc:creator entry has a more structured way of handling the author name(s), as each author has their own entry in the bag
Now while the PDF/A standard intended to focus on XMP
- it did not wish to completely disallow old style document info metadata (mostly because it is so widely used)
- but insisted that if both are present they must agree on (in this example) what the author name(s) is/are.
How can you do that with a one dimensional string field on one side and a not-sorted list of entries on the other side? You only can achieve something reasonable by “downgrading” the list to something that can be mapped one to one to the simple string: only allow one entry in the list. That’s what the PDF/A standard does.
So this is the explanation why you are seeing what you are seeing.
The solution is simple (though admittedly not always easy to achieve): once you do not have the author entry in the document info anymore (but only the entry/entries in the XMP metadata in dc:creator) you are free to use the entry as a bag with one or several author name entries.
Paper Documents to PDF/A
Which text recognition software works well together with Acrobat 8/9/X?
Acrobat 8/9/X Professional comes with its own OCR software that can be used to convert scanned pages into searchable text.
Viewing PDF/A Files
Can the Adobe Reader display PDF/A files?
The Adobe Reader 8 (as well as 9 and X) has a special modus for displaying PDF/A files compliantly. There are also a number of 3rd party products that support PDF/A compliant viewing. The user has a choice of products.
Do I have to purchase a program for displaying PDF/A files?
No, there are some free PDF viewers available for different operating systems.
Creating and Converting PDF/A
Can “normal” PDF files be converted to PDF/A?
Yes, PDF files can be converted to PDF/A. It might be, depending on the original PDF file, that not all features can be transferred to PDF/A. For example: PDF/A is based on PDF 1.4. There are features in newer PDF versions (like transparency and layers) that were not (fully) introduced with PDF 1.4 and are therefore not supported by PDF/A. In a case such as the one mentioned, the transparency has to be removed and the layers flattened in order to create a PDF/A-1 document. The next version of PDF/A – PDF/A-2 – is based on the PDF specification 1.7 and will allow a lot of the newer features.
Is batch processing possible, for example to create PDF/A files from Microsoft Word? I know about batch processing from products like Adobe Photoshop.
Both Acrobat and other software programs can create PDF/A files with batch processing. For example, several files or an entire folder can be processed. There are different solutions available on the market that support high volume, automated processing, suitable for businesses and agencies.
How can I tell if a font is protected by copyright and can’t be copied into a PDF/A file?
Usually a program that creates PDF/A files will give a warning if a font file cannot be embedded. The problem is not very prevalent with standard western fonts, since most font developers allow their forts to be embedded in files. With special fonts (for example Japanese fonts) there could be difficulties, since a lot of the fonts are copyrighted and cannot be copied.
Which programs can create PDF/A-1a files?
There are number of products that support PDF/A-1a, including Adobe Acrobat 8/9/X Professional as well as products from callas software, Compart AG, PDFlib and PDF Tools AG. The list of applications that can create PDF/A-1a is constantly growing.
Can I create PDF/A with Acrobat 6?
No. It is not possible to create PDF/A with Adobe Acrobat 6. The Adobe Acrobat 8 Professional version is the first version of Acrobat that fully supports PDF/A.
Microsoft Office 2007 can create PDF. Is this as good as PDF/A?
PDF is not the same as PDF/A. You can however also create PDF/A with Microsoft Office 2007. Look closely in the properties. PDF/A is recommended over “normal” PDF files. By the way, the PDF conversion is only possible if you download a separately available plug-in (Save-As-PDF) from the Microsoft website.
We currently have an extensive digital archive in TIFF-G4. Can these files be converted to PDF/A with a reasonable effort?
There are solutions available on the market that are geared towards converting large volumes of files (from different formats) into PDF/A. In order to estimate the amount of time you’ll need, you have to take into account the original file format, homogeneity, how the files are stored, as well as several other factors.
What are the different ways to create a PDF/A file?
There are a number of possible ways to create PDF/A files, depending on where the original information is coming from:
- Print to PDF/A on a client computer
- Print to PDF/A using a print stream on a server
- Scan to PDF/A (paper to PDF/A)
- Convert existing image files to PDF/A
- Convert existing PDF files to PDF/A
- Export a document to PDF/A format
- Create PDF/A “on-the-fly” from data or a database
According to the PDF specification, OpenType fonts can only be embedded beginning with PDF version 1.6; PDF/A-1 requires PDF 1.4. This would mean that PDF files with embedded OpenType fonts cannot be converted to PDF/A. Despite this, the conversion seems to have worked.
It is correct that OpenType fonts cannot be embedded in PDF/A files. However, OpenType fonts are often converted to another type (TrueType or Type1) when they are embedded, so creating a PDF/A file should usually not be a problem. If Acrobat 8 recognized the file that you converted to PDF/A as being compliant, then a conversion of the font type probably took place. You can verify the font type of embedded fonts using the Preflight verification report (Check box: “Show detailed information about document”, and then look under “Fonts”).
How can I find out if a font is embedded?
When a PDF/A file is created, the program ensures that the fonts are embedded. If the fonts are not embedded, you don’t have a valid PDF/A file. You can verify if fonts (and which ones) are embedded in any PDF file by checking under “Properties” in Acrobat and the Adobe Reader. In addition, PDF/A validation tools will inform you if fonts have or have not been embedded, and whether the files are therefore PDF/A conforming or not.
Are there programs that check and confirm the validity of a PDF/A file?
There are a few tools on the market that will do this. In addition to Adobe Acrobat Professional 8, there are the pdfaPilot and pdfInspektor4 from callas software, the 3-Heights PDF Validator from PDF Tools, the LuraDocument PDF Validator from LuraTech, PDF/A Live! from intarsys, the PDF/A Longlife Suite from Seal Systems and the PDF Appraiser from Apago.
Can PDF/A files contain an electronic signature?
Yes. It is permitted to digitally sign PDF/A files. There are a number of tools, strategies and software solutions available on the market for signing PDF/A files electronically. Even Acrobat Professional can be used to digitally sign PDF/A files.
The ISO Technical Committe 171, SC2 has also prepared and published a document containing Frequently Asked Questions (FAQs) about PDF/A.
This FAQ may be freely distributed and/or translated in its entirety. The current authoritative version of this FAQ is maintained at both NPES (www.npes.org) and AIIM (www.aiim.org – see especially the section managed by the AIIM PDF/Archive committee).