Facebook
Twitter
YOUTUBE
LINKEDIN
XING
Datalogics
Status: Partner Member
Country: US
Sector: All industries
Contact:
Joined at: Feb 08
Website: http://www.datalogics.com/

Linked User
Maryanne Pavlin
Matt Kuznicki
Nicki Bullock
Vel Genov
Emma Kaschke
Leonard Ho
Nancy Panos

PDF/raster: An Overview

Earlier in 2017, the PDF Association and TWAIN Working Group published the first public draft of the PDF/raster specification. In this article, I’ll give an introduction to PDF/raster and its relationship to PDF, and talk about use cases. I’ll suggest you grab a copy of version 1.0 of the specification in order to follow along.

PDF/raster: purpose

The PDF/raster format aims to provide the scanning and document industry with a standardized format for creating and exchanging sets of page images. Typically, these images would be created by a device such as a scanner. Each page of scanned input is represented either by one image, or by a series of image strips that can be put together to form a page image. These images may be compressed, and may be color, gray, or black and white depending on the capture device. A document is a collection of images of scanned input pages.

PDF/raster improves upon the most popular existing formats – TIFF and JPEG – in the following key ways:

  • As PDF files, PDF/raster files follow an open ISO standard. TIFF is based on a vendor standard and a quarter-century of both “generally understood” and proprietary tags (extensions) to the format.
  • PDF/raster files can contain multiple pages and can represent pages as multiple page strips. JPEG files are limited to containing one raster.
  • PDF/raster files may mix color, gray, and black and white page images – and mix multiple color spaces for page images – for maximum compression.
  • PDF/raster files support data encryption in the file format itself, for better security of data in transit and at rest.

The PDF format already contains support for all these features. PDF/raster uses only the syntax from PDF that is required to support its use cases. This syntax is simple enough that a full-fledged PDF parser is not required to create or read PDF/raster files. The PDF/raster format is well-suited for creation and consumption by restricted-CPU environments, such as scanners and MFP (multifunction peripheral) output preview stations.

 

A PDF/raster file is a PDF file

PDF/raster is a substantially restricted subset of PDF syntax. Because it does not rely on any features not already within the PDF specification, all PDF/raster files are valid PDF files. This means that any PDF reader that can handle PDF files properly can open PDF/raster files as well. For unencrypted PDF/raster files, this means that any PDF 1.7 (ISO 32000-1) compatible viewer can view these. Encrypted PDF/raster files, which are based on PDF 2.0 (ISO 32000-2), can be opened and viewed by any viewer that can handle PDF 2.0 files.

Creating PDF/raster files requires explicit support from a PDF creator or processor. There are two reasons for this. First, because PDF/raster files are denoted by a specific PDF language comment in the PDF/raster file. General PDF processors are not required to write any specific PDF comments or to preserve these at all (other than the beginning of file and end of file comments). A PDF/raster file is identified by a comment placed just before the startxref comment indicating conformance. Therefore, to write a PDF/raster file, a processor must understand the importance of writing the comment in its specific location. Second, because PDF/raster allows only very specific page content sequences and commands, PDF creators need to understand and write very specific syntax to create PDF/raster files.

 

PDF/raster restrictions

There are many important PDF concepts and constructs that are not permitted in PDF/raster files. Some of the most important restrictions include:

  • Page contents may only include images in CCITT G4 (FAX), DCT (JPEG), and uncompressed (RAW) formats. These images are restricted to RGB color, grayscale, and black and white (monochrome) format.
  • Page contents may not include anything other than raster images: no text, no line art, no forms, or other graphical elements.
  • No annotations, AcroForms, or XFA are permitted.
  • Transparency and layers are not permitted.
  • No compression of non-image data is permitted. Compressed object streams are disallowed.
  • Only page content streams and document metadata are allowed. Other elements such as interactive actions, bookmarks, search indexes, and marked content are explicitly not permitted.

In other words, PDF/raster is intentionally a very limited subset of PDF. PDF/raster focuses on storing and transmitting scanned page data. PDF/raster is not intended to support updating or annotating scanned page data. But, since PDF/raster files are always valid PDF files, it’s very easy to annotate or update a PDF/raster file and save it as a general purpose PDF file.

Related Products
Adobe PDF Library


The Adobe PDF Library SDK is a low-level PDF library that contains a powerful set of native C/C++ APIs with interfaces for .NET and Java APIs. Systems integrators, independent software vendors (ISVs), enterprise IT developers, and others can integrate Adobe PDF functionality within custom applications in a client and / or server environment.

PDF Java Toolkit


Datalogics PDF Java Toolkit is a native Java library that provides high-level APIs for automating PDF workflows like processing PDF forms, verifying digital signatures, and extracting text. It also offers low-level APIs for working directly with the structure of the PDF for those times you need it.

Adobe Normalizer


Adobe Normalizer, is an API which allows developers to quickly and easily convert Encapsulated PostScript (EPS) and PostScript (PS) files to Adobe’s Portable Document Format (PDF). The industry-standard Adobe Distiller and Distiller Server are themselves built upon PDF Converter SDK; and now this API is available separately to application developers.

Adobe PDF Print Engine


The Adobe PDF Print Engine is a common rendering engine technology, packaged as a software development kit (SDK). It can be the basis for a variety of products for previewing and printing Adobe Portable Document Format (PDF) documents at different stages of the professional print workflow.

PDF2IMG


Datalogics PDF2IMG is a command-line utility that converts PDF files to a variety of image formats including PNG, JPG, TIFF, BMP, and more. It is built upon the Adobe PDF Library and uses Adobe technology for unrivaled color management during the PDF conversion process

PDF Alchemist


Datalogics PDF Alchemist is a new (C/C++) SDK for intelligently extracting text and images from PDFs and exporting to HTML 5 or EPUB. It employs sophisticated techniques to identify and reconstruct “text flows” within the PDF.