PDF Association: At the PDF Days Online 2021, you will be hosting a presentation titled “Deriving HTML from PDF – lessons learned” – what’s that about?
Roman Toda: In recent years, the interoperability of PDF became a thing. Everyone wants to access content. And we are not talking about simple text extraction, but whole document hierarchy with identifying more complex structures like tables, forms.
Two years ago, PDF Association published a document: “Deriving HTML from PDF”– an algorithm for producing HTML from tagged PDF. After authoring many PDFs, after implementing the algorithm in various scenarios from PDF consumption and annotation in HTML environments to data mining we decided to share our experiences with current state. We will identify where there are gaps in tooling and where we need updates in PDF spec.
PDF Association: Who is your presentation aimed at?
Roman Toda: Mainly software architects, integrators and developers willing to design their systems in an interoperable way. But in principle everyone interested in knowing how real-world use cases change our file format specifications.
PDF Association: What will the people who attend your presentation be able to take away from it?
Roman Toda: Hopefully people already know that PDF can be interoperable, and I believe they will learn a lot of details, best practices in authoring PDFs and lessons learned with implementing derivation algorithm.
PDF Association: The PDF Days Online 2021 has become the leading PDF event. What makes the PDF Days so unique in your mind?
Roman Toda: The unique combination of very business and very technical oriented talks and people that never fail to impress. The openness and friendliness of all participants makes this event so special.
PDF Association: Thank you! We look forward to seeing you at the PDF Days Online 2021.
Check out the overall PDF Days agenda and register for Roman’s session.
The staff of the PDF Association are dedicated to delivering the information, services and value members have come to expect. Staff members of the PDF Association include: Alexandra Oettler (Editor), Betsy Fanning (Standards Director), Duff Johnson (Chief Executive Officer), Matthias Wagner (CFO & Operations Director), Peter Wyatt (Chief Technology Officer), Thomas Zellmann (PDF Evangelist).
The staff of the PDF Association are dedicated to delivering the information, services and value members have come to expect. Staff …