PDF Association logo

Discover pdfa.org

Key resources

Get involved

How do you find the right PDF technology vendor?
Use the Solution Agent to ask the entire PDF communuity!
The PDF Association celebrates its members’ public statements
of support
for ISO-standardized PDF technology.

Member Area

DocEng 2019 – PDF Association Tutorial

PDF is the most common file format on the Web after HTML, and everyone in the Document Engineering community without exception has to deal with the format in one way or another. Just about everyone is familiar with PDF being “digital paper,” after all, that is really how it started out. But PDF has become “smarter” in the last few years, yet it is still struggling to shake off its reputation as a purely end-of-line, dumb file format that is unsuitable for further machine processing. Comparativel … Read more
About the author: Tamir Hassan ​has over a decade of experience in the area of document engineering. After writing his doctoral thesis on the topic ​User-Guided Information Extraction from Print-Oriented Documents, ​he worked … Read more

PDF is the most common file format on the Web after HTML, and everyone in the Document Engineering community without exception has to deal with the format in one way or another. Just about everyone is familiar with PDF being “digital paper,” after all, that is really how it started out. But PDF has become “smarter” in the last few years, yet it is still struggling to shake off its reputation as a purely end-of-line, dumb file format that is unsuitable for further machine processing. Comparatively few people -- and this includes even many DocEng participants who do not work directly with PDF -- are not familiar with PDF’s additional features, which have grown over the past few years.

The aim of this tutorial is to introduce the audience to the most important of these features and give practical examples on how they can benefit from generating and exchanging PDF files that go beyond digital representations of the printed page.

On September, 23rd, we will be presenting a a half-day tutorial and propose the following topics for the tutorial.

However, we also plan to ask the audience about what interests them the most and spend more time on those topics.

Topics

Tagging/Structure

  • Tagged PDF: Embedding logical structure in PDF for:
    • Accessibility
    • Repurposing of content
  • Hidden, selectable text for scanned documents via OCR

Coffee break, 30 Min

Richer embedded files

  • Structured data (measurements, statistical data) as embedded files
  • DocEng publications (JATS)
  • Object-level metadata; keeping source information for history and license enforcement

Workflow features

  • Digital, fillable forms in PDF
  • Commenting workflows (annotations)
  • Security: Encryption and signatures

Special topics

  • Compression algorithms enabling small file sizes for high-quality images
  • Colour reproduction (if requested by audience)

 

If you have any specific questions please let us know beforehand: DocEng@pdfa.org

Speaker

Dr. Tamir Hassan


Klaas Posselt


Dietrich von Seggern


Thomas Zellmann

 

WordPress Cookie Notice by Real Cookie Banner