PDF/A was the topic of the day at the PDF/A seminar in Oslo on April 17, 2012. Organized by Per Arne Flatberg from Palografen, in cooperation with the PDF Association, and with support from the Riksarkivet, it attracted over 80 attendees from all over Norway.
The seminar venue could not have been chosen any better – the seminar was held at the main building of the Norwegian National Archives in Oslo, built to preserve all important documents of Norway for eternity.
The seminar agenda
The agenda covered all important aspects of PDF/A and its use in the real world:
- PDF/A in law and in the archives
Anthony Lærdahl, Digital Format Manager at the National Archives of Norway
- Introduction to PDF/A (click here to view video recording of the presentation)
David van Driessche, Chief Technical Officer, Four Pees
- From paper to PDF/A
Thomas Zellmann, Managing Director, PDF Association
- Born Digital: PDF/A as the storage format for digital files
Olaf Drümmer, Chairman of the PDF Association and CEO of Callas Software
- Get it to work. Principles of PDF/A workflows
Panel discussion with Olaf Drümmer, David van Driessche and Thomas Zellmann
Attendees were keen to learn about PDF/A and asked numerous questions after each presentation. Of special interest were how PDF copes with «bit rot», how well PDF/ A goes together with digital signatures, how to archive formulas and macros in spreadsheets, when to validate PDFA files for compliance, and what options exist in the field of engineering to accommodate not only two-dimensional views of CAD drawings but also three dimensional models for interactive inspection:
- Some attendees appreciated that TIFF files are not very vulnerable to «bit rot» – i.e. in the rare case where for whatever reason just one bit gets corrupt, a TIFF file will still be usable in most cases, as probaby only one pixel will get affected. It was pointed out though that because PDF is an object oriented format, where if one such object gets damaged through a flipped bit, it is usually still possible to reconstruct everything else in the PDF file. No conclusion was reached whether PDF is as robust as TIFF when it comes to «bit rot» or not.
- While PDF/A fully supports digital signatures it became obvious that in order to archive digitally signed PDFs as PDF/A it is important to first ensure the file to be signed is already conforming to PDF/A. If it is not – which unfortunately is often the case in the real world – it is by definition not feasible to convert a digitally signed PDF into PDF/A without breaking the signature. One fallback approach could be to take advantage of PDF/A-3 and to embed the digitally signed PDF inside a PDF/A conforming rendition of the same file.
- While archiving spreadsheets from programs Microsoft Excel or OpenOffice Calc can already be a challenge in itself, as by default each table on export ends up as a separate PDF and as it can be quite cumbersome to take different sizes of the required print areas into account between files and even between tables inside the same spreadsheet file, it is usually not straightforward to also archive information that would usually not appear on a printout but nevertheless plays an important role and is as important for archival as the table content itself: formulas and macros. Three options were discussed: one is to turn all formulas and macros into text, and associate them with the respective part of a the tables as annotations, where the contents field of the annotation contains the formula or macro. Another way could be to include a virtual printout of formulas and macros, and include them as additional pages in the PDF/A file. Last but ot least it might be useful in some organisations to create a PDF/A-3 file from the spreadsheets and to embed the original spreadsheet file with all formulas and macros intact as an associated file. Associated files are a mechanism specific to part 3 of PDF/A which makes it possibel to include non-PDF/A files inside a PDF/A file. It has to be pointed though that the archival quality of arbitrary associated files is undefined, as in this example nobody would know how well a Microsoft Excel 2010 file could be processed 10 or 50 years from now.
- An important topic in all PDF/A discussions is the question of validation. In all cases where an archive has no control about how incoming files are prepared it is mandatory that each and every incoming file is validated against the PDF/A standard. Several vendors of the PDF Association offer validation tools, and these vendors have worked together during the last years in order to achieve a high degree of inter-instrument agreement. In other words: in almost all cases these validation tools arrive at the same result when carrying out validation of PDF/A files. It is important though to use the most recent version of these tools. Where organisation have highly standardized processes for creating PDF/A files – like it is typical for scanning of paper documents to PDF/A – it is usually sufficient to carry out process validation on an ongoing basis, that is each time anything in the process is changed or a component in the scanning solution is updated, as well as validation of samples in regular intervals.
- Archiving engineering documents can be covered quite well by PDF/A – as long as no 3D models are to be archived. For archiving 3D models – and not just selected 2D views of the model – users will have to wait for the upcoming PDF/E-2 standard, currently being developed by ISO TC 171 SC 2. Expected for release in late 2013 or early 2014, it will enable engineering users to also archive 3D models encoded either as U3D or PRC. U3D is an ECM standard that is supported by PDF 1.7. PRC is about to be approved as an ISO standard in summer 2012, and will be supported by the upcoming PDF 2.0 standard.
About the National Archives (Riksarkivet)
The National Archives in Norway – the Riksarkivet – is responsible for keeping records from government agencies, making material available for use, supervise the work of the archives on the state, county and municipal level and make sure contributions to private archives will be preserved. The National Archives safeguard Norway’s collective memory, and continuously receives large amounts of data.
Palografen is a Bergen-based knowledge center that works a lot with PDF formats and PDF-based workflows. Palografen is the first Norwegian organization that has become a member of Association PDF.