Validating PDF/AThis article deals with validation software for PDF/A and discusses the PDF/A Competence Center’s plans for developing a test suite. Thomas Merz, author and PDF expert, is president of PDFlib GmbH. PDFlib is located in Munich and has focused on PDF development since 1997. PDFlib’s product family provides components for dynamically creating PDF, including PDF/A-1a (tagged PDF) and PDF/A-1b. PDFlib GmbH is a founding member of the PDF/A Competence Center. In addition, Thomas Merz is involved in the PDF/A Competence Center’s Technical Working Group (TWG), which establishes a consolidated technical position on PDF/A issues. The TWG has written several articles, including the PDF/A Technical Notes (published on pdfa.org) that deal with topics like „PDF/A-1 and Namespaces”, „Colors in PDF/A-1”, „Metadata (XMP/Docinfo) in PDF/A-1” and „Digital Signatures and PDF/A-1”. The TWG also cooperates with the ISO committee, for example in drafting the PDF/A-2 standard and in the development of a PDF/A test suite.
Basics BasicsThe PDF/A-1 standard was published in October 2005 as ISO 19005-1. A Corrigendum was published in the second quarter of 2007 to amend the standard and better explain certain issues. The ISO standard itself comprises only 36 pages, but includes references to several additional, more complex documents (for example the PDF Reference 1.4 and specifications for XMP, font formats, ICC and much more). Conformance to PDF/A involves not only the requirements in the ISO standard itself, but also in the secondary specifications. Currently there exists no implementation that can be used as a reference (documented or software). The PDF/A-1a standard distinguishes itself through more detailed requirements than PDF/A-1b. PDF/A-1a requires amongst other things „tagged” PDF and Unicode. PDF/A conformance requires a significant effort from the producers of PDF software. Aspects of PDF/A ConformancePDF/A conformance encompasses several aspects that can apply to a document at different stages of its lifetime:
PDF/A-Validation: ProductsThere are several applications available on the market for validating PDF/A documents, including the following validators from PDF/A Competence Center members:
Additional products are also available from other vendors. Example: Acrobat 8 PreflightThe first example for PDF/A validation is the Acrobat „Preflight” function, developed by Adobe and callas software. The PDF/A functionality found in Acrobat 7 Preflight is based only on a draft version of the standard, the approved PDF/A standard is implemented in Acrobat 8. Preflight is capable of validating PDF files on their conformance to PDF/A. The above example shows the results of a validation, where the PDF file does not conform to PDF/A. Example: Cabaret Stage 3The Cabaret Stage 3 product (marketed through Intarsys) is a software program for displaying, filling, editing, saving, printing and validating PDF documents. Validation with Cabaret Stage 3: The inspected file does not conform to PDF/A. PDF/A-Validation: AspectsSeveral aspect must be taken into account when verifying the PDF/A conformance of documents. Both the breadth and the depth of the verification are important factors when validating. Breadth of the Verification:Validation must take into consideration all of the requirements in the standard. This means that all of the rules, requirements and restrictions in the standard must be verified. Some of the requirements apply to the entire document, for example XMP and color spaces. Other requirements only apply to special data structures, and are not necessarily present in all PDF documents. Font subsets and annotations are two such examples. Depth of the Verification:When considering the depth of the verification, it must be decided on how exact the data structure will be examined and to what extent the individual rules will be verified. The following secondary specifications must also be considered:
The combination of different properties can also be relevant:
The area of PDF/A-1a and tagged PDF:
Further AspectsAdditional factors play a role in the actual method of validation. Is an interactive or a more batch-oriented environment practical for the verification? The level of detail for reporting errors will differ depending on the intended use. An automatic correction will also often be desirable upon completion of the validation. PDF/A-Validation: Special ConsiderationsA PDF/A Validator must take the Corrigendum to the standard into account. This is the case with Acrobat 8, but not with Acrobat 7. An insufficient breadth or depth in the validation will deliver incomplete and unreliable results. Different types of failures are possible with validation. A validator could report an error, even though the document is actually PDF/A conforming. The opposite is also possible, when the validator does not report an error even though one is present. A further problem that could appear during validation is when a valid input document is rejected before the check is even performed. A report that does not provide enough detail can also be problematical. Test suite for PDF/AThe PDF/A Competence Center and the TWG are preparing a test suite with PDF/A documents. The goal is to create standardized documents that deal with different aspects of PDF/A. The strategy is to then purposely introduce breaches to the specification. The „synthetic test suite” will be created in stages, and the breadth and depth of coverage will be documented in the process. An important final step is the comparison with the different validators that are available on the market. Starting Point:In focus are (presumed or actual) creation programs, as well as different methods for the manual creation of PDF/A. How will the TWG confirm the accuracy of the test suite? There will be a check using different validation tools, and a manual inspection using special analysis tools will also be performed. The combined expert knowledge of the TWG members will be applied for creating the test suite, and an iterative approach will ensure accuracy. Validating the ValidatorThe purpose here will be to check different validation products through use of the test suite. The accuracy of the check will increase as the test suite is expanded. The criteria for assessing the validation tools will be based on the following questions:
Next StepsOnce the test suite is completed, the test procedure will be formalized. The tests should then be conducted by an independent test center. In the end, it will be possible to use the test suite to officially certify validation products. Practical Recommendations for ValidationA reduction in performance can be expected when validating high volumes of documents. For this reason it is recommended to validate products and workflows, instead of each PDF/A document separately. It will however probably be necessary to validate all externally supplied PDF/A documents individually, since the creation process is usually not known. Don’t forget: non-conforming documents cannot always be converted to PDF/A! For example, in order to embed a missing font, the font must of course be present and available.
Thomas Merz, PDFlib GmbH, aoe Appendix: PDFlib GmbH
|
Download
You can also download this article as a PDF Validating PDF/A |
|
|

