Table of Contents
- Current News:
- PDF/A Conference with Preview of PDF/A-2
- Main Articles:
- Hurdles when Migrating Archives
- Documents must remain printable
- Help, I can’t print my old PDF files anymore!
- Does PDF/A ensure reproducibility?
- Special case: transparency
- Every archive has an end; the documents contained in it don’t
- PDF/A Competence Center Member Introduce Themselves:
- Docucom AG
- New Members
The technological developments of the past years have given us previously unimaginable access to a gigantic flood of information. The digitization of documents has provided and continues to provide an important input for the advancement of the information community. Storing documents electronically is taken for granted even in our private lives. I’m not going to particularly surprise you with this statement. The consequences that result from making documents available for a long period of time, on the other hand, have more than just surprised a lot of people.
For businesses, it’s a well-known fact that the availability of documents can only be achieved through the consequent adherence to standards both in the document format as well as in their storage. If we look at the developments over the past years however, we come to curious conclusions. What has happened? When microfiches were introduced we were forced to accept manufacturer standards based on technological restrictions. The same effect could be witnessed when the first electronic archives were implemented. One concentrated technically on the optimal use of memory space, which was an enormous cost factor at the time. The limited number of available viewers automatically led to a reduction in document formats. And, not least, index information (i.e. metadata) was rigidly maintained due to limited and laborious administration possibilities.
This situation brought highly proprietary solutions to the market. Old systems which had to be migrated due to higher demands or whatever other reason caused a lot of difficult challenges. Once we got over the various and constantly changing hurdles we could transfer the documents to the new archive with relatively ease.
Where do we stand today? Modern and flexible ECM architectures enable the saving of virtually any object. The structures which we access objects through can be quickly and flexibly established or adapted. It is also unavoidable that we soon have to transfer this system once again to an even better and more flexible system. With the transfer we will not only be confronted with larger volumes, but also with less manageable data models.
An archive migration is nothing other that a second archiving of objects in fast-motion. It is foreseeable that current developments will unavoidably lead to extremely higher costs and effort for retaining information and documents. The implementation of PDF/A as an archiving format, and not only a good PDF in version XY, has therefore become an absolute deciding factor for long-term investment protection.
The PDF/A Competence Center has played a vital role in bringing this train of thought to suppliers and customers. The next international PDF/A conference is just around the corner and will serve to provide information about, and a further dispersion of, the PDF/A standard. I would like to take this opportunity extend my heartfelt invitation to you.
President Docucom AG
PDF/A Competence Center Member
PDF/A Conference with preview of PDF/A-2
Just prior to preparation of this newsletter, the ISO committee met at the end of April 2009 in Hamburg. All active countries were represented by their respective national standardization organizations, like for example the DIN in Germany. The PDF/A Competence Center has the status of category A liaison, which in addition to attendance at the ISO meetings permits us to submit advisory contributions and comments. We were represented at the meeting by board members and technical employees from member companies.
Following the conclusion of the Hamburg ISO meeting, a CD (committee draft) of the PDF/A-2 standard will be distributed for comments. At the next ISO meeting in October 2009 in Orlando, as the next step in the approval process, a DIS (draft international standard) will be prepared and subsequently distributed for a five-month voting period. The technical work on the specification is concluded with preparation of the DIS. The national members vote on the standard, and the formal steps to publication then follow, which however requires some time. From where we stand today a publication of PDF/A-2 can be expected around the end of 2010. To give you a preview, PDF/A-2 will be based on the independent ISO standard 32000-1, which was derived from the PDF 1.7 specification. As such, all of the functionality in PDF 1.7 that is meaningful and can be realized for long-term archiving will be embraced. This includes, for example, JPEG2000, layers, transparency and collections.
The PDF/A conference offers an outstanding opportunity to hear experts speak about PDF/A-2 and to personally meet them. This is however a bit of “music for the future”, and quite naturally the current version of the standard PDF/A-1 and its uses is the focal point. The text in our ad brings it right to the point: “Users report, experts inform, exhibitors present”. With over ten user presentations from different sectors, there will be an abundance of reports on practical experiences about how and where PDF/A is being used today in businesses. There will also be presentations in all areas of application for PDF/A, and newer topics such as metadata and e-mail archiving will be dealt with intensively. In case you are currently planning to migrate your archive, PDF/A could be a very important element of your project, as is described in the following article.
Hurdles when Migrating Archives
Within the framework of procuring a new DMS System, our customers generally experience a rather interesting process. They learn about characteristics of the data and data storage in their archive that they were previously unaware of. In many cases these characteristics are not synchronous with the requirements of the new system environment. Due to an absence of standards with respect to data storage, data media and interfaces in the old archive, situations and surprises occur which were unknown of at the beginning of the project. Within the course of the project these cause a massive effort to straighten out.
The origins of the problems can be summarized as follows:
- proprietary data storage
- missing interfaces
- missing or incomplete documentation
- outdated data medium
- high maintenance costs
In this article we will look more closely at some of the hurdles of a migration, for example heterogeneous data storage, cleaning-up data and consolidating data formats.
Documents must remain printable
Here is an example from a medium sized bank. The customer stores document from different applications and in different formats in his archive. This archive contains approximately 11 million objects in the formats TIFF, AFPDS, PDF and text (lists). Over time and in good trust in PDF, the customers stores documents from four different applications as PDF files in his archive. As part of an IT optimization, the archived documents should be migrated to a new central DMS system. The new DMS contains expanded workflow possibilities, including a hold-mail function (hold-mail documents are not sent to customers, but rather they are retained at the bank until the customer retrieves them), which should also be made available to the migrated documents. The hold-mail function allows for all of a customer’s records to be delivered on a specific day. These customer records will be transferred to an output management system which collates the individual records, preparing and coding them for the print and mail streams.
During the migration it was discovered that the migrated PDF documents out of the old archive could not be printed with the central postscript printer and led to postscript errors. The cause of the problem was that prior to loading the exported PDF documents the customer only conducted a visual test based on a small sample of files using the Adobe Reader. These tests were successful, however the direct printing function was not taken into consideration. The merging and coding of documents out of the above mentioned hold-mail application proved to be another issue. This process could not be realized, because the archived PDF documents out of the different PDF application were not legible for the output management system software. Summary: different PDF applications provide different PDF quality.
Help, I can’t print my old PDF files anymore!
The results when printing displayed different behaviors, with certain PDFs only being produced as black pages by the printer, other as dotted clouds, through to a total stoppage of the printer due to a postscript error. The failure analysis of the PDF documents determined problems with several different aspects, which can be grouped into three main causes for error.
- Different PDF generation tools create PDFs of varying quality.
- Fonts were not embedded in the PDF documents.
- Some documents contained an inconsistent PDF format.
We are constantly experiencing such surprises with archive migrations, which through a lack of knowledge or improper handling leads to expensive and time-consuming analysis and file cleansing, and to complex conversions. Our experiences to date demonstrate that these are not isolated cases. Businesses are often operating with a false sense of security.
Does PDF/A ensure reproducibility?
The ISO standard 19005-1 details the conditions necessary to ensure a reliable reproduction of colors, text, images and graphic elements in a PDF document, when reproducing it on a monitor or on a printer.
Private banks rely a lot on colors in customer documents (records, portfolios). These documents are created following a strict application of internal CI/CD (corporate identity / corporate design) regulations. What primarily applies is:
Welcome in the world of color and individual color understanding.
These “color” requirements are extremely diverse for PDF/A., and depend on the environment and purpose. One user concentrates strictly on PDF/A files that have a high resolution. This enables the documents to be optimally reproduced on a printer. Another user lies a high value on the lowest possible resolution, so that the PDF files that have to be downloaded from the extranet remain as small as possible. What remains constant in both applications are the colors that are dictated by the CI/CD.
First some information about resolution. The resolution of a document is not part of the PDF/A standard, since there is no “correct” image resolution. The decision on which resolution is best for which document lies by the company itself. Whether on screen or as a printout; the colors should always look the same. With respect to color, PDF/A accesses a pool of color profiles which are attached to images, graphics or the PDF files. For example, the ISO coated color profile is very suitable for a CMYK color printer. Experience has shown that central output management platforms must increasingly generate different PDF/A files when creating or converting documents. The above mentioned example demonstrates the PDF/A files should be specifically prepared based on the different requirements they are designed for.
Special case: transparency
Up to now, transparent object have been forbidden in PDF/A. When PDF/A-1 was officially published, the algorithms for calculating transparency were not explicitly formulated and are still not permitted today. A simple example: many banks place a watermarks on the documents in their archive. When converting these documents to PDF/A without proper validation, it can occur that the watermarks block out important content in the reprints. The future definition of PDF/A-2 will deal with this circumstance.
Every archive has an end; the documents contained in it don’t
We are convinced that PDF/A format will establish itself for long-term archiving. Many new solutions are based on PDF/A and are no longer creating TIFF files. Our years of experience have shown that archive migration is not, and never will be, easy. The retrieving of masses of documents for example is not foreseen in most archive applications, even though it is technically possible. In migration projects, a lot of time is also lost with the document conversion. Conversions are laborious, and this not only because of the time it takes, but also because of the conversion of single objects like, for example, single page TIFF to multi-page TIFF, PDF to PDF/A according to groups (missing fonts etc.). Standardizing the format in the PDF/A environment is and will remain one of the best protection investments in archiving.
The consequent use of PDF/A by all delivery applications, be it from an output management platform or a scanning process, ensures the document’s reproducibility for years and is therefore the most important factor for a sustainable long-term archiving.
PDF/A COMPETENCE CENTER MEMBERS PRESENT THEMSELVES
Robert Reichmuth, President
Docucom, founded in 1997 and located in Rapperswil-Jona Switzerland, has positioned itself as the first point of contact in the IT-world for system solutions dealing with output management and archive migration. As leading Swiss supplier for the migration of document archives, we have continually developed and improved our products to the point that we now have, amongst other things, one of the fastest products for unloading archives with data on optical data mediums.
We have customers in various market segments, including banks, insurance, IT, logistics, telecommunications, services and the public sector. We generally concentrate on medium- to large-sized businesses.
Using our migration platform we have, in numerous projects, developed solutions and methods for migrating archives that allows for a safe and recorded downloading of the old archive. The migration platform consists of hardware and software components that are customized for the customer’s specific archive migration and guarantee a data export that is independent of the archive manufacturer’s tools. Using the migration platform, this procedure ensures the uninterrupted operation of the old archive during the migration process.
In addition to archive migration we offer complete solutions in the area of output management. We are specialized in receiving documents and data from all types of applications and centrally processing them in a uniform matter according to customer specific standards.
As Swiss market leader in the consulting and conducting of archive migrations, we support our customers in implementing PDF/A when realizing company-wide strategies. The knowledge and expertise that we have gained in the area of archive management legitimates us in a special way when informing our customers about the importance and advantages of PDF/A in long-term archiving.
More information can be found at: www.docucom.ch
NEW MEMBERS IN THE PDF/A COMPETENCE CENTER
We welcome the following companies as members in the PDF/A Competence Center:
- Foxit Software Company, USA
- PDFtron Systems, Canada
- RealObjects GmbH, Germany
- Schätzl Text & Bild GmbH & Co.KG, Germany
- Strategy Partners, UK