Writing code on a blackboard.

How to tag titles in PDF documents

Klaas Posselt // November 26, 2019

PDF/UA Article


In the summer of 2019 the PDF Association’s PDF/UA Technical Working Group released Tagged PDF Best Practice Guide: Syntax - to provide developers and expert users with formal advice and best practices for implementing tagged PDF. The guide is also useful to those generating tagged PDF to create accessible documents.

Since its publication the Guide’s recommendations for document titles have generated some questions. This article, authorized by the PDF/UA TWG, sets out to answer them.

NOTE: For readability I will use the common term “tag” to refer to PDF’s standard structure types.

Background

The Tagged PDF Best Practice Guide: Syntax, in clause 4.2.2.2, states as follows:

"Since PDF/UA-1 does not require any specific structure type for title content, it is permissible to structure such content with either <H1> or other structure element types (typically, <P> or structure element types mapped to <P>).

Page content representing the title can - especially in publications - appear several times in the document. If <H1> structure elements are used to enclose such content, it is recommended that only one such instance of the title be structured as <H1>.

Since headings commonly appear in tables of contents, and since document titles do not normally appear in tables of contents, a future-proof (PDF 2.0) approach would be to use a <Title> structure type (which is defined in PDF 2.0, see Annex B) mapped to the <P> structure type."

So which is it? Should authors and remediators tag document titles with <H1> or with <P>, or maybe something else? And what are the consequences of one or the other choice?

The role of headings

The typical visual indicators denoting headings of various levels (font size, weight, color, etc.) help readers to quickly locate content of interest, especially in longer or highly structured content. In the PDF and hardcopy contexts, headings are often used to create a table of contents displaying the document’s structure in a hierarchical tree (on a separate page and/or using PDF’s bookmarks feature).

Users who depend on assistive technology (AT) likewise rely on headings to enable navigation by providing them with the page or document’s structure in an accessible form.

Website vs. document

A table of contents contrasted with a website map.
On the left is a table of contents, on the right a website map.

As it is fundamentally oriented towards web content rather than web-independent content, WCAG can be difficult to apply to PDF, so it's useful to begin by reviewing fundamental distinctions:

  • A website generally consists of one or more webpages consisting of multiple HTML files, CSS, database entries, scripts and other sources. Although links facilitate presentation of webpages as part of a whole body of content, each webpage is an independent entity defined by its URI.
  • A PDF document, by contrast, is a single file containing from one page to thousands (or millions) of pages of text, graphics, annotations and other content.

Accordingly, headings are relatively less significant for navigation in HTML because each HTML page tends to be short, with relatively few headings or deeply nested heading structures. PDF, on the other hand, must be able to accommodate documents of thousands of pages, or deeply nested subheadings.

What does WCAG say?

WCAG is technically neutral, and generally deals with web content; it does not directly address offline documents, making direct application of WCAG rules and techniques less obvious. Nonetheless, the specification is clear (success criterion 2.4.2, Level A) that documents (“pages” in WCAG’s vernacular) must have programmatically identifiable titles. WCAG’s support documentation identifies a PDF technique for this purpose leveraging the document’s metadata, which is a reasonable parallel for HTML’s <title> element in the <head>.

WCAG is silent, however, regarding the document title as expressed in the <body>. The HTML convention is to use <h1> for such content, possibly <h2> for subtitles, and then allocate the remaining heading levels up to <h6> to denote important content.

HTML page ≠ PDF document

The content

In HTML, document titles in the page’s <body> are typically marked <h1>. Subtitles may (or may not) be marked <h2>. HTML’s limit of 6 levels of headings is very rarely considered a significant constraint.

In PDF, documents may have no headings at all, or a thousand headings. If a document includes title content it may be repeated on several pages.

There’s simply no meaningful parallel between the most important (<h1>) content in the <body> of an HTML page and document title content on PDF page(s).

The markup

In HTML <h1> marks the main heading on each webpage irrespective of whether that page is actually a subsection of a larger body of content.

In PDF <h1> marks the top-level logical sections of a larger document. There’s no express or implied relationship between <h1> and any specific URL or physical page.

The metadata

In HTML the <title> concept is limited to the metadata in each webpage’s <head>, enabling coordination with each page’s respective <h1> content.

In PDF there's XMP Dublin Core metadata for the document’s title.

Why using H or Hn tags for document titles in PDF is not recommended

As discussed above, heading tags already have a clear role in document navigation, and it’s not to mark the title! Nothing in WCAG 2.1 or PDF/UA requires - or even implies - that titles in PDF documents should be tagged as if they were headings.

Here are some other reasons why PDF authors and accessibility remediators should shun <H1> for document titles and use <Title> instead.

Consistency is key

Headings allow AT users to move between important subjects in a document. If headings denoting sections of content within the page are confused with titles, which denote the identity of the document itself, the navigational value of headings is compromised.

Assistive technology users don’t benefit from PDF document titles tagged <H1>

AT users rely on headings for navigation within the document. If they need to read the document’s title they can access the metadata (which PDF/UA requires to be present) directly without losing their place, exactly as suggested in the formal support documentation for WCAG 2.1 success criteria 2.4.2.

PDF authors need more than 6 heading levels

HTML’s limit of 6 heading levels isn’t a problem on most webpages, which tend to be short and/or don’t include deeply-nested headings. However the limit is problematic in PDF, and not only because some PDF documents extend to 7 or 8 heading levels, or even more. If titles are tagged with <H1> only five heading levels remain with which to structure the document. Further, if the author uses <H2> to enclose a subtitle (as many do), only four heading levels remain for organizing the document's content!

PDF 2.0 allows any number of heading levels, just one of many important enhancements to the next-generation PDF format published in 2017.

Titles don’t appear in tables of contents

Document authoring tools use headings to build a table of contents, however these usually don’t include the document’s title, as it’s not part of the document’s structure.

This fact highlights one of the problems, referenced above, in using <H1> tags for titles; AT users have no way to distinguish between the title and the document’s top-level structural headings.

Future-proof documents

PDF 2.0, published in 2017, introduces the <Title> structure element to resolve the matter and provide a fully semantically appropriate tag for title content in PDF documents.

Conclusion

How to tag document titles in PDF

The best solution for titles is to tag them with a <Title> tag. For those following PDF/UA-1, the first ISO standard for accessible PDF (based on ISO 32000-1, PDF 1.7), this <Title> tag should be role mapped to <P>. For those using PDF 2.0 rolemapping is unnecessary.

If the document’s metadata clearly identifies the document (as PDF/UA-1 requires), the tagged title on the page may also be safely tagged with <P>, as this information is redundant.

Subtitles, if any, should be simply included within the <Title> tag.

How to tag headings in PDF

Use <H1> to tag the top-most level of organization in the document such as “Introduction”, “Table of Contents”, “Chapter 1”, “Chapter 2” and so on.

Use <H2> and subsequent levels to tag the 2nd and subsequent heading levels below each <H1> in the document.


ABOUT THE AUTHORS

Klaas Posselt

Klaas Posselt is a graduate engineer in printing and media technology who, following a number of lines of inquiry, eventually landed on the subject of universally accessible PDF documents. He trains, assists and supports clients as they implement and optimize publication processes and move towards new digital output channels including ebooks, accessible PDFs and web platforms. As a member of the PDF …

ABOUT THE AUTHORS

Klaas Posselt

Klaas Posselt is a graduate engineer in printing and media technology who, following a number of lines of inquiry, eventually …

© 2019 Assosiation for Digital Document Standards e.V. | Privacy Policy | Imprint