How to overcome the different challenges - fields of application
Document understanding is a constantly addressed topic and has become on top of the scene these last years with Deep Learning and NLP evolution. The PDF format is by nature unstructured, which implies sophisticated processes to extract and qualify information from such documents.
In this presentation, we will discuss four ways to address challenges brought by PDF (which are: layout & text understanding, hierarchy & relationships between the different structures):
We will then discuss the many fields of applications of such technologies, including OCR, automatic indexing, tagging & labeling, structured layout conversion, and automatic redaction.
Slides download: https://www.pdfa.org/wp-content/uploads/2022/05/1330-Tellier.pdf