At the PDF Europe 2018 Joris Schelekens from iText will hosting a presentation titled “Structure Recognition for Information Retrieval and Layout” – what’s that about?”. In this interview he gives some background information about his presentation.Five reasons developers should participate in PDF Days Europe
PDF Days Europe, the annual PDF technology education event, will take place from 14 to 16 May 2018 in Berlin at the Hotel Steglitz International. Of the many good reasons for developers to participate, here are five of the best.5 reasons why those implementing electronic document technologies should attend PDF Days Europe
PDF Days Europe, the annual PDF technology education event, will take place from 14 to 16 May 2018 in Berlin at the Hotel Steglitz International. Of the many good reasons for users to participate, here are five of the best.2018 PDF 2.0 Interop Workshop
Following the success of our previous interop workshops in Cambridge, England and Boston, Massachusetts, the 3rd PDF 2.0 Interop Workshop takes place on May 16, 2018 as part of the post-conference program immediately following this year’s PDF Days Euro …Post-Conference of PDF Days Europe 2018 in Berlin
On Wednesday, May 16, 2018, directly following PDF Days Europe, the PDF Days Post-Conference offers a variety of workshops on PDF 2.0 Interop or PDF/UA.
The digitalization of paper documents (letters, files, invoices, photographs, and many more) is part of everyday life in many companies and institutions. There are many common processes depending on the various intended uses of the documents concerned.
In the case of documents that only exist in black and white, such as invoices, TIFF G4 has often been used, a format that is still in use today. This format was developed for fax transmissions. If original color documents exist, the JPEG image format is a popular choice. Other less common formats include PNG and BMP. In certain cases, special formats such as JPEG in TIFF are preferred, in order to reduce the file size or create multi-page files, for example.
These older methods are subject to a series of disadvantages in comparison with the digitalization of documents using the PDF/A format. Users who still work with these older formats today will be confronted with problems such as the following:
PDF is a modern, standardized alternative. Digitalization via conversion to PDF is already a popular choice for users who wish to standardize document formats (Image2PDF) or enable full-text searchability. PDF also permits the use of newer, more powerful compression formats, such as JPEG2000.
Many users have switched to PDF in order to achieve metadata uniformity. Using PDF eliminates all the disadvantages of the older formats, but even so the traditional PDF format is not the best solution for every single usage area.
If generating PDF, it makes sense to create PDF/A straight away! If you decide to use PDF as your archive format, it makes sense to use the PDF/A variant, since this is the only format that was developed as an ISO standard for long-term archiving.
PDF enables text searches at file level. This improves the usability of the documents concerned in many areas, such as the following:
An increasing number of customers who process black-and-white documents recognize the advantages provided by PDF/A. In the case of black-and-white documents, the JBIG2 compression format (standardized in ISO/IEC 14492) is particularly effective. This compression format is positioned as an alternative to TIFF G4. JBIG2 allows users to choose between lossy and lossless compression. This technology, which is as yet not well-known, has been implemented in PDF/A-1 and is available in Adobe Reader.
For color documents
Color is an important bearer of information. It can have both content-related and semantic significance. The processing of color documents increases the productivity of employees and thereby helps companies to reduce costs.
A study that was instigated by Kodak found that employees work better with color documents, which bring the following advantages:
If all documents are scanned in color rather than being separated into color documents and black-and-white documents, the pre-sorting effort (which accounts for around 75% of the costs) is drastically reduced. This method also means that there is no need for changed scanner settings or rescans of a single document.
In the case of color documents, powerful compression of the image data can reduce file sizes significantly. MRC compression which is also known as JPEG2000 (JPM) can drastically reduce file sizes without causing a visible decrease in the display quality.
LuraTech uses a procedure that efficiently solves the problem of file size reduction in its Scan-to-PDF/A solutions. The division of each document into three layers that are converted entirely separately from each other enables the separate compression of text, colors, and images.
Three-layer technology produces optimum quality by digitalizing a compressed original that splits the content into text, image, and color layers using modern MRC procedures.
The three case studies below show the advantages of the digitalization of documents via conversion to PDF/A for personnel records, knowledge bases, and credit files.
This company is a services company that has a global turnover of 7.1 billion euros and a turnover of 420 million euros in Germany alone. A total of 220,000 employees work for the company worldwide, 14,500 of them being based in Germany.
The task definition was as follows: 14,000 personnel records of around 150 pages each needed to be digitalized. This corresponds to a total processing volume of 2 million pages. These documents must be available to 200 authorized employees with access from 70 locations. The paper documents existed in black and white, grayscale, and color. The solution was the conversion of the original documents into the ISO future-proof PDF/A variant of the PDF format along with effective compression to reduce file sizes as much as possible. The OCR (Optical Character Recognition) process prepared the scanned text for full-text searching.
The uniform conversion of the document set into PDF/A enabled all personnel files to be safely retained in digital form. The ISO PDF/A standard guarantees the suitability of data for long-term archiving. It makes it significantly easier to use the data, since employees now have access to documents that support full-text searching. The electronic search function replaces visual searching, resulting in a high accuracy of hits at the same time as saving time. Choosing the PDF/A format also results in files that are up to 60% smaller than if using TIFF or JPEG. Lastly, the smaller file sizes cause a significantly lower network load and permit direct access to data.
The advantages at a glance:
The DAKs INFO services needed to be digitalized to provide uniformity. The DAK (Deutsche Angestellten-Krankenkasse) is the second largest health insurance company in Germany, with 6.2 million members and 12,000 employees working in 750 branches.
The internal information archive, which contains around 300,000 pages of text, took the form of image files before the migration. Most of the text was stored in TIFF format, with more recent additions in PDF. The stored information originally stored on microfilm was already partly digitalized, but using a mix of formats. TIFF, for example, neither saves space nor provides full-text search options. This archive is growing constantly, with around 3,000 new documents each year. Each file can have 50 or more pages. The aim of the project was to create a uniform archive with as low a file volume as possible while enabling digital data recall.
In order to optimize the possibilities provided by the info service and to make them future-proof, the DAK decided to archive the knowledge base in PDF/A format. The DAK used LuraTechs PDF/A solutions to carry out the migration. During this initial project, the DAK was able to gain early experience of the new PDF/A format that will be of use in later projects.
The employees of the DAKs INFO service can now enjoy the advantages of easy and quick full-text search functions. The smaller file sizes allow information to be accessed more quickly. Naturally, a program for displaying the data must be installed on employees PCs. The DAK uses Adobe Reader, which can be downloaded from the Internet free-of-charge. Thanks to PDF/A, the DAKs data is now suitable for long-term archiving in accordance with the ISO standard. Lastly, the DAK has gained practical experience from this reference project with regard to further data archiving using PDF/A.
The advantages at a glance:
In Tennessee, the headquarters of an American finance company, check into cash procedure documents were digitalized and stored in a data archive in PDF/A format. The financial service provider concerned has 1,200 payday advance centers in 30 US states.
The service provider required the decentralized scanning of credit files. Documents were to be processed in color throughout. Lastly, the switch to the new system was to improve the transmission of data to headquarters.
The centers now benefit from quick data transmission thanks to the implementation of the LuraDocument PDF Compressor, which creates PDF/A documents via scanning and data conversion procedures. All documents can be processed in color. This means that the centers do not need to sort documents into color documents and black-and-white documents before digitalizing them. This has resulted in a considerable decrease in processing time.
The headquarters, where the PDF/A documents are stored, have benefited from a reduction in the required disk space since the modern data compression procedure used yields significantly smaller file sizes. Smaller file volumes also cause a noticeable reduction in administration costs. Last but not least, the companys headquarters benefit from the long-term readability of data and safe archiving in accordance with the ISO standard.
The advantages at a glance:
PDF/A is the optimum format for scanned documents. It can be implemented in every single company and institution without any major technological problems. Anyone considering digitalizing paper documents today should choose the modern, standardized PDF/A solution straight away. In an environment where other formats have been used to archive scanned paper documents up until now, clearly defined, well arranged projects provide an opportunity to experience the advantages of PDF/A and gain practical experience of this new format.