Glossary of PDF terms

This resource provides end users and non-technical readers with a glossary of the acronyms and terms with lay-person definitions commonly encountered when discussing or describing the Portable Document Format (PDF). Technical readers should always refer directly to the appropriate ISO publication for precise technically accurate definitions. Additional terms are also defined in many PDF ISO standards which can be previewed in the ISO Online Browsing Platform (OBP)

Action An action refers to PDF features that enable automatic behaviours triggered by a user interaction or event, such as displaying a different page in a document when a bookmark is clicked, performing a calculation with form data, or playing a sound or video.  For technical details see clause 12.6 in ISO 32000-2:2020.
Annotation Annotations are a special PDF feature most commonly associated with commenting and reviewing a document, such as highlighting text, text strikethrough, or sketching on top of a document. However PDF 2.0 defines 28 different kinds of annotations which provide a far richer feature set including URLs (Link annotations), watermarks, form widgets, interactive 3D content, redaction, sound, movies and other rich media. For technical details see clause 12.5 in ISO 32000-2:2020.
AT AT is most commonly referred to in the context of PDF/UA, and means assistive technology. Assistive technology supports those users with disabilities to access and navigate PDF documents, such as via screen readers, color and contrast adjustment, screen magnifiers, etc.
Bookmarks  Bookmarks are an informal term for PDF’s Document Outline feature. Bookmarks are commonly displayed in a separate navigation pane to aid document navigation and are a technically distinct feature from headings in content. For technical details see clause 12.3.3 in ISO 32000-2:2020.
Conformance Level PDF Conformance Levels are represented by letter designators with a PDF ISO subset acronym, such as PDF/A-1b, PDF/A-4e, PDF/X-5pg, PDF/VT-2s. Each Conformance Level relates to a specialized definition in the corresponding PDF ISO subset with its own very specific set of rules and requirements. For example, PDF/A-4 is the PDF-for-archival standard supporting PDF 2.0, with PDF/A-4e being a highly specialized refinement of PDF/A-4 supporting engineering workflows with 3D content (hence the "e" designator) while PDF/A-2b (basic) and PDF/A-2u (Unicode) differ in requirements related to Unicode text extraction capabilities.  Not all PDF ISO subsets use conformance levels.
COS COS is the acronym for "Carousel Object Syntax", which is the syntax used by PDF and FDF files and is fully described in ISO 32000-2:2020. It is what you see if you look inside a PDF file. "Carousel" was the codename for Acrobat 1.0 when this syntax was first introduced.
Direct object
A direct PDF object is an object that occurs inline where it is defined and that does have its own object identifier. In contrast to indirect objects, direct objects cannot be directly referenced as they do not have their own object identifiers.
Fast web view "Fast web view" is an informal term for the Linearized PDF feature that enables the first page of a PDF file to be available for rapid display before the rest of the PDF file is fully downloaded (such as while downloading from the internet).
FDF Forms Data Format is a specialized file format, expressed in the same COS syntax that PDF uses, used for interactive form data that was introduced in PDF 1.2. FDF can be used when submitting form data to a server, receiving the response, and incorporating it into the interactive form. It can also be used to export form data to stand-alone files that can be stored, transmitted electronically, and imported back into the corresponding PDF interactive form. In addition, beginning in PDF 1.3, FDF can be used to define a container for annotations that are separate from the PDF document to which they apply. For technical details see clause 12.7.8 of ISO 32000-2:2020.
Form (AcroForm) PDF supports both interactive and non-Interactive forms. For technical details see clause 12.7 in ISO 32000-2:2020. Interactive forms were introduced in PDF 1.2 as a collection of fields for gathering information interactively from the user and are sometimes referred to as "AcroForms". A PDF document may contain any number of fields appearing on any combination of pages, all of which make up a single, global interactive form spanning the entire document. Arbitrary subsets of these fields can be imported or exported from the document as FDF or XFDF. Non-interactive forms (introduced in PDF 1.7) are a static representation of form fields. Such forms may have originally contained interactive fields such as text fields and radio buttons but were converted into non-interactive PDF files, they may represent form fields and/or data converted from external sources, or they may have been designed to be printed out and filled in manually.
Fragment Identifier
Annex O in ISO 32000-2 defines PDF-specific fragment identifiers that can be added to the end of URLs that provide anchors to specific content or influence the display of a linked PDF file. Fragment identifiers are defined by the W3C and appear after the # symbol in a URL. A simple example is that a URL can refer to a specific page in a PDF by appending page=n (where n starts from 1) to a URL: opens to the 2nd page of this PDF.
Hybrid-reference PDF file
A Hybrid-reference PDF file is a PDF 1.5 (or later) file containing objects referenced by standard cross-reference tables in addition to objects in object streams that are referenced by cross-reference streams. Only PDF 1.5 and later files can be hybrid-reference PDFs because cross-reference streams were introduced in PDF 1.5. Refer to clause Compatibility with applications that do not support compressed reference streams in ISO 32000-2:2020.
Incremental update A PDF file can be updated incrementally without rewriting the entire file. When updating a PDF file incrementally, changes are appended to the end of the file, leaving the original contents unchanged. For example, a PDF-based document review tool may write PDF annotations as incremental updates, ensuring that a digitally signed original document is not invalidated by the adding of comments. Such technical details are typically not visible to end-users. For technical details see ISO 32000-2:2020.
Indirect object
A PDF indirect object is an object that is defined in the body section of a PDF file with an object identifier (comprising an object number and generation number). It will be referenced elsewhere in the PDF file by using an indirect reference (keyword R) with its object identifier.
Integer page index
In PDF the integer page index is a 0-based index of the pages in a PDF file, with the first page having an integer page index of zero. It is commonly used by internal PDF data structures. In contrast, Fragment Identifiers use a 1-based counting system.
Layers Layers is an informal term for Optional Content Groups (OCGs) in PDF. Layers can typically be individually toggled on and off in interactive PDF viewers. Examples include architectural drawings where plumbing, electrical wiring, foundations, etc. might each be represented on separate layers.
Linearized PDF Linearized PDF is the formally defined feature in PDF feature that enables the first page of a PDF file to be available for rapid display before the rest of the PDF is fully downloaded. It is often referred to as "Fast web view". For technical details see Annex F in ISO 32000-2:2020.
OCG Optional Content Groups are the formally defined feature in PDF which enable selectable layers in interactive PDF viewers. For technical details see clause 8.11 of ISO 32000-2:2020.
OCR Optical Character Recognition is the process of recognizing text from an image (photo) of text. It is typically referenced in relation to scan-to-PDF functionality. The accuracy of OCR results can vary depending on the quality of the page image and other factors. PDF does not constrain or limit OCR accuracy in any way.
Page labels
As documents can be long with many pages, humans have invented conventions to label pages more descriptively to assist with navigation. We are used to seeing front matter labelled with Roman numerals: i, ii, iii, iv, etc.; appendices prefixed with uppercase letters such as A.1, A.2, etc. or even chapter/page combinations such as 1-1, 1-2, 2-1, 2-2. In PDF terminology this is what is referred to as a page label - an optional descriptive label of a page that is commonly presented on-screen. This is in contrast to the integer page index used internally in PDF files.
PDF The Portable Document Format is a random access, binary file format for device-independent, paginated documents that defines an accurate appearance model for rendering fully typeset text, images and vector graphics. Overtime PDF has also expanded to include many interactive and specialized features supporting a wide variety of use cases and electronic documents with rich experiences beyond that of "digital paper". It is formally defined by the ISO 32000 family of international standards.
PDF 2.0 PDF 2.0 is the latest version of the PDF specification and is the first PDF specification entirely developed under the ISO consensus-based process. It is formally defined by ISO 32000-2:2020.
PDF/A PDF/A is an ISO defined formal subset of PDF designed to support long-term preservation and digital archiving. PDF/A focuses on accurate preservation of the static visual representation of page-based electronic documents over time and is defined by the ISO 19005 family of standards. "A" stands for archival.
PDF/E PDF/E is an ISO defined formal subset of PDF 1.6 defined to support the engineering sectors with support for interactive 3D models. For technical details see the PDF/E ISO standard ISO 24517-1:2008 Document management — Engineering document format using PDF — Part 1: Use of PDF 1.6 (PDF/E-1). PDF 2.0 support for engineering workflows is now provided via the PDF/A-4e conformance level - see PDF/A. "E" stands for engineering.
PDF/R PDF/R is a small subset of PDF targeting multi-page raster image documents, such as scanned documents. It is based on the PDF Association's PDF/Raster 1.0 specification and is specifically designed to be easy to create in low-end, low memory embedded devices such as scanners. It is defined by ISO 23504-1:2020 Document management applications — Raster image transport and storage — Part 1: Use of ISO 32000 (PDF/R-1). "R" stands for raster.
PDF/UA PDF/UA is the ISO-defined formal subset of PDF to support universal access, enabling high levels of accessibility for electronic documents. It is defined by the ISO 14289 family of standards. "UA" stands for universal access.
PDF/VCR PDF/VCR enables variable data printing applications using PDF template-based variable content substitution whereby a PDF template file containing pages with variable content substitution fields (placeholders) is delivered ahead of a print production run and may be reused across multiple print production runs, and PDF-based variable data substitution content is provided during print production and merged with the PDF template to produce final form variable content page output. "VCR" stands for variable content replacement. It is defined by ISO 16613-1:2017 Graphic technology — Variable content replacement — Part 1: Using PDF/X for variable content replacement (PDF/VCR-1).
PDF/VT is the ISO-defined formal subset of PDF supporting variable data printing and transactional documents, that builds on the capabilities of PDF/X. PDF/VT is defined by the ISO 16612 family of standards. "VT" stands for "Variable Transactional".
PDF/X PDF/X is defined by the ISO 15930 family of standards which supports the graphic arts and professional printing sectors. The "X" in "PDF/X" is for eXchange, indicating specialized support for the exchange of digital data targeting professionally printed products.
PDF version PDF versions are 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 and 2.0, with each version defined by its own PDF specification document. PDF files are generally backwards- and forwards-compatible, enabling modern software to reliably display old PDFs. Every PDF file identifies it’s version via the first line file header %PDF-x.y, but may also update the version via a special key when an incremental update is applied. Later PDF versions define additional features.
Rolemaps are a core Tagged PDF concept that allows any structure type to be conceptually mapped between namespaces in a manner that enables all PDF processors to understand the basic intention of structure types. For example, a custom structure type called Foo might be rolemapped to a paragraph in the standard structure namespace, indicating that semantically Foo is "best matched" as a paragraph. Rolemaps thus support the use of custom structure types in PDF. Rolemaps and their use are described in clause 14.7 Logical structure and clause 14.8 Tagged PDF in ISO 32000.
startxref is a reserved PDF keyword that occurs just before the %%EOF end-of-file comment marker along with the byte offset to the cross-reference data for the PDF file (expressed as an integer in ASCII).
Tagged PDF PDF 1.4 introduced "Tagged PDF" to represent the logical reading order (structure) of a document. It defines a set of standard structure types and attributes that allow page content (text, graphics and images, as well as annotations and form fields) to be extracted and reused for other purposes. PDF/UA uses Tagged PDF to ensure electronic documents are fully accessible. For technical details see clause 14.8 of ISO 32000-2:2020.
trailer is a reserved PDF keyword and defines the start of the trailer dictionary. The trailer enables a PDF processor to quickly find certain special objects and data, such as the largest object number in the PDF, the Document Catalog and the optional encryption dictionary (if the PDF is encrypted). It is an essential part of every PDF file.
Widgets A PDF widget is a specialized type of PDF annotation used with interactive forms and represents the GUI widgets through which data entry is done.
XFA XFA stands for "XML Forms Architecture" which is a family of proprietary XML specifications supporting both static and dynamic forms. As a proprietary format with limited support in PDF processors, XFA was deprecated in PDF 2.0 (ISO 32000-2:2020) but was permitted in PDF 1.5 - 1.7.
XFDF  XFDF is the XML equivalent of FDF. It is defined by ISO 19444-1:2019 Document management - XML Forms Data Format — Part 1: Use of ISO 32000-2 (XFDF 3.0).
XMP XMP stands for the eXtensible Metadata Platform which is an XML-based standard for metadata used in PDF and required by all ISO PDF subset standards. XMP is defined by ISO 16684-1:2019 Graphic technology — Extensible metadata platform (XMP) — Part 1: Data model, serialization and core properties.
xref is a reserved PDF keyword used to identify the start of a conventional cross-reference table. It is also commonly used colloquially in place of the phrase "cross-reference". PDF 1.5 and later files that only use cross-reference streams do not use this keyword. PDF files that have incremental updates may have multiple instances of this keyword.