Margaret Hamilton led a team credited with developing the software for NASA’s Apollo and Skylab. Her team was responsible for developing in-flight software, which included algorithms designed by various senior scientists for the Apollo command module …Save the Date: PDF Days Europe 2018, May 14-16, in Berlin
PDF Days Europe is the most popular PDF event of the year. It’s where the PDF industry meets, and where institutional and corporate users come to learn what else PDF could do for them. The first two PDF Days will offer a broad range of educational sessions focussed on current and perennial topics in the world of PDF technology implementation.The Power of the Page
It’s a question that vexes vendors of web-based solutions everywhere: why do people still insist on PDF files? And why does PDF’s mindshare keep going up? “PDF is such antediluvian technology!” they say. “It’s pre-web, are you kidding me? It’s so old-f …PDF Association technical resources: an overview
PDF is PDF because files produced with one vendor’s software can be read using a different vendor’s software with no loss of fidelity. Interoperability is key to our industry. The PDF Association is a international membership organization dedicated to …2022: The last year of paper for records-keeping
NARA (The National Archives and Records Administration) is the final depository for the long-term records generated by all other agencies of the U.S. Federal Government. The agency has a key role in preserving the cultural history of the republic as we …
The idea behind an improved PDF/A-1b is to enhance the conformance level by encompassing the advantages of Unicode. Unicode is obligatory only in PDF/A-1a, but that does not rule out the voluntary usage of Unicode in PDF/A-1b.
PDF/A-1 has two conformance levels that differ with regard to requirements and functions offered.
Level B conformance is the minimum requirement for PDF/A compliance. The focus here is on reliable rendered visual appearance.
Level A conformance is the superset of PDF/A-1b. Over and above the features offered by level B, level A offers the following features that are important for providing accessible content:
+ Tagged PDF
+ Structure tree (hierarchy)
+ Language specification
+ Unicode mappings
Accessibility means that content (text, images, and graphics) is also accessible for visually impaired users via, for example, screen reader software. In addition, accessibility makes it easier to reuse content than with conventional PDF, thanks to functions such as text export.
Overview of accessibility levels
Accessibility levels can be schematically depicted as shown below. The top level is accessible PDF, which can be generated from the level below it – PDF/A-1a. Any PDF/A-1a-compliant file is also a valid PDF/A-1b file, and all variants are based on PDF 1.4 (the PDF version that was introduced with Acrobat 5).
Both PDF/A-1a and PDF/A-1b are based on PDF 1.4. Only PDF/A-1a files that meet additional criteria can be classified as accessible.
Accessibility (primarily for Internet content, but increasingly for other documents, too) is regulated internationally by guidelines and laws.
In Germany, for example, the following decree governs accessibility: ?BITV (Barrierefreie Informationstechnik-Verordnung).
In the US, accessibility has been grounded in law for several years already: ?Section 508 Rehabilitation Act.
Tagged PDF involves tagging text spans and giving them an ID. Alternative text can be given to graphics and images (alternative image captions as in HTML). Replacement text is prescribed for glyphs that do not represent letters, such as the telephone symbol or a logo character.
In the case of structured PDF, content is assigned to a structure tree. The structural elements are predetermined (Document, Header, Article, List, Table, and so on).
Structured PDF: An example layout
This additional information refers to artifacts – elements that do not constitute part of the document content. This includes elements such as page numbers, headers and footers, footnotes, backgrounds, and crop marks.
When ensuring that the content can be used unambiguously later on, the specification of the language is also important. This is governed by the lang attribute, with an entry such as de-CH for Swiss German. The specification must be defined in the structure tree and in the spans.
Structural information is only recognized by a formatter if additional information is available (for example, this is a caption). Converters of complete page data do not have this information, meaning that no assignment is possible. A later automatic interpretation is neither sensible nor permitted by the PDF/A-1a standard. This means that any later assignment must be carried out manually, which can involve a large amount of effort.
PDF/A-1b files do not have to use Unicode, but they are permitted to do so. The use of Unicode brings advantages with regard to reusing PDF/A-1b files.
The term Unicode has several meanings. Unicode is an organization, a system developed by a global team of experts, a large character table, a small database, and the description of many scripts.
Unicode is subdivided into numerous blocks that help to arrange international and historical characters. The table below provides examples of these blocks and character ranges.
Unicode block Character range Number of characters
Basic Latin U+0000 – U+007F (128 characters)
Latin 1 Supplement U+0080 – U+00FF (128 characters)
Currency Symbols U+20A0 – U+20CF (48 characters)
Miscellaneous Symbols U+2600 – U+26FF (256 characters)
CJK (Asian fonts) U+4E00 – U+9FFF (20,991 characters)
For a complete list of blocks, see the Unicode website.
Standard encodings (max. 256 characters) can be easily mapped to Unicode. This includes WinAnsi and MacRoman. Other than symbol fonts, all fonts must have a CMap entry (character set assignment) that references Unicode characters. This includes ligatures, for example:
<005F> <0060> <0061> [<00660066> <00660069> <00660066006C>]
Ligature Combination Individual characters
ff U+005F U+0066 U+0066
fi U+0060 U+0066 U+0069
ffl U+0061 U+0066 U+0066 U+006C
Without Unicode, text cannot be read out loud by screen reader software. A precise and complete text search can only be ensured using Unicode. If Unicode is not used, content analysis is not possible.
This means that functions such as subsequent page break functionality, indexing, and address reading are not possible.
PDF/A-1a cannot always be achieved. This is related to factors such as the source material not being suitable for conversion to PDF/A-1a or the effort involved in converting to level A being too high.
However, the PDF/A-1b format, which is easier to generate, can almost always contain Unicode. There are a few prerequisites for this. Sensible fonts must always be used when creating or converting the files. It may be necessary to provide Unicode information via additional mapping tables. Lastly, the output drivers must be set correctly.
A new compliance level would give users a better overview of their options. In addition to compliance levels A and B, PDF/A-2, which is currently being compiled, will also include PDF/A-2u (u for Unicode).