PDF Association logo.

About the contributor
Carsten Luedtge

More contributions
Have we passed ‘peak PDF’?

How do we gain insight into how users’ views of documents are shifting? Google Trends is an increasingly interesting source of high-level marketplace data. By aggregating Google’s search data over time, reporting a term’s popularity as compared with all other searches.

Participating in the PDF Techniques Accessibility Summit

The PDF Techniques Accessibility Summit’s objective is to establish a broad-based understanding of how PDF files should be tagged for accessibilty. It’s an opportunity to focus on establishing a common set of examples of accessible PDF content, and identify best-practice when tagging difficult cases.

Members supporting PDF features!

The typical adoption curve for PDF technologies until approximately 2007 tended to track with that of the original PDF developer. Since then the marketplace has shifted; it’s no longer clear that Adobe drivesPDF feature support worldwide. Accordingly, we are happy to report that adoption of PDF 2.0 continues apace, with new vendors announcing their support every month.

Modernizing PDF Techniques for Accessibility

The PDF Techniques Accessibility Summit will identify best-practices in tagging various cases in PDF documents. Questions to be addressed will likely include: the legal ways to tag a nested list, the correct way to caption multiple images, the appropriate way to organize content within headings.

Refried PDF

My hospital emailed me a medical records release form as a PDF. They told me to print it, fill it, sign it, scan it and return it to the medical records department, in that order. In 2018? To get the form via email (i.e., electronically), yet be asked to print it? Did the last 20 years just… not mean anything! So I thought I’d be clever. I’d fill it first, THEN print it. Or better yet, never print it, but sign it anyhow, and return it along with a note making the case for improving their workflow. The story continues…

Present on All Channels

Throughout the world, the volume of physical documents is shrinking. Current surveys by the Universal Postal Union (UPU) cite a double-digit percentile drop. Many companies already send invoices, account statements and the like as e-mail attachments or put them on web portals for download.

And the trend is growing. With the elimination of printing and mailing costs, digital distribution holds enormous savings potential. Savings for transaction documents, in particular, are in the six-figure range, which is why digitalization is making the fastest strides in this arena. In all probability, in a few years well over half of all documents will be sent electronically. Even insurance policies, contract cancellations and written documents that for legal reasons are still paperbound will surely one day be sent in digital form. Printing and physical distribution will be reserved for those materials like mailings, premium product catalogs and image brochures. Paper is thus increasingly becoming a premium product.

HTML5 for every occasion

The ratio of physical and digital mailings is changing and influencing their creation overall. The challenge: to prepare every document, no matter what type, and equip it with the structural data so it can be output on any channel. That means that output management systems need to detach themselves from the “letter size” metaphor and supply content for electronic output devices that adjusts to the size of the display or screen. Consider the PC tablet, the smartphone and other devices that increasingly serve as mobile offices for day-to-day business operations. In other words, documents originally designed for print only must now be become multi-channel capable. To this end, they are “enriched” with content not relivent for a paper based channel.   Information like metadata, hyperlinks, and instructions for structuring the text now become a requirement

Against this background, the HTML5 format will play a decisive role in the structuring and semantic description of documents (see glossary). The text-based markup language is already setting the tone with mobile platforms such as the iPhone, iPad and Android devices. And it’s no wonder: HTML5 content can be easily processed for any electronic output channel, be it a smartphone or a web site. An HTML5 document can be printed or otherwise output in physical form if necessary. HTML5 documents can also be converted to PDF files of any page size.

Documents that are multi-channel capable are also intelligent

HTML5 is currently the most intelligent format for the creation and display of documents, independent of their size or the output channel.  It allows reformatting, e.g., from a 8.5”x11” page to smartphone display, or conversion from page formats to text-oriented formats. Individual data can be extracted (including retrieval of invoice items) and table of contents and index lists can be built. And there’s more. With HTML5, even audiovisual elements, web links and charts can be embedded. This creates not only multi-channel capable documents, but intelligent documents that offer users added value beyond just display of text.

What could be more logical than setting up an output management system (OMS) to output all documents in HTML5 – or at least in PDF, considering this format is also quite advanced with respect to integrated structural data. But many companies are still holding back. They usually have invested a lot of time and money in their systems and are understandably less than enthusiastic about new formatting tools. The question they face is how to capture at least some of the structural information from existing (legacy) applications for later processing.  OMS suppliers like Compart are specializing in “reading” the basic data fromtraditional page based documents in legacy systems and then preparing them as HTML5 or XML files that can, in turn, be enriched with as the required metadata, color, video and audio files.

Back to the source

The fact is that data is often lost on the way to page based output, no matter the channel – a situation no longer in keeping with the times.

In order to take legay output and output to HTML5, the first step is to identify the textual sources within document. Where does the data originate? Do the sources allow extraction of sufficient structural information for reuse? Such questions must first be addressed when looking to make documents “intelligent”. Often digital documents that could be read and processed via machine are first converted into analog form, i.e. print, and then scanned into TIF or JPG documents upon their return. Content becomes “pixel clouds”. The actual content is initially encrypted (raster images) and then rendered “readable” through optical character recognition (OCR). Not only is this cumbersome, but also involves the loss of semantic structural data needed for later reuse. This is not the case when documents are created in a format that carries with it all the necessary data for output, allowing the content to be displayed on any channel, whether it is a web page, a mobile end device or even in print.


HTML5 is a text-based markup language for the structuring and semantic description of documents. It is already widely used, especially on mobile devices, and will soon be adopted as the official standard by the World Wide Web Consortium (W3C). What’s special about the new format is that it offers abundant functionalities for graphics (2D/3D) and multimedia (audio/video) that are not directly supported by other standards such as HTML 4.01 and XHTML. HTML5 is also useful because web fonts can be embedded. This allows even corporate fonts to be downloaded from a server using a browser. If a browser does not support the HTML5 fonts, another standard font like Arial or Verdana is used.

Other features of HTML5 at a glance:

Extension of layout-related elements

  • Stronger separation of semantics and layout (CSS)
  • Stringent markup of selected sections of a Web site
  • Additional elements for frequently used page areas such as


  • Scalable Vector Graphics (SVG) is a specification recommended by W3C for creating complex two-dimensional vector graphics in documents. Because SVG is an XML-based format, the content of SVG files is easily accessible for computer-assisted translation and other downstream processes.  They can also be edited directly in a text editor.


  • Mathematical Markup Language (MathML) is a format for depicting mathematical formulas on the Internet.


  • Programmers use it to generate precise pixel-based graphics in the browser window. Extended with JavaScript, Canvas can generate complex animations, games, and dynamic business graphics that previously required the Adobe Flash format with plug-ins.


  • With the new


  • The new “Geolocation” JavaScript function enables a web site to specify the location of the user accessing it from a mobile device. This allows location-specific services to be offered and show the user of the Web site nearby businesses or his or her location on a map.

Offline Web applications

  • Web sites that are also usable offline can be developed using HTML5. The web server just has to let the user’s browser know what data needs to be downloaded. The data is synchronized automatically as soon as the user is back online.


  • This feature gives web sites additional semantic information and converts contact information into a vCard.

Most browsers already support many HTML5 functions, including the latest versions from Safari, Google Chrome, Mozilla Firefox and Opera, as well as Web browsers that are installed on iPhones, iPads and Android devices.

Tags: Digital Mailing, E-Presentment, HTML5, Multi Channel
Categories: PDF