Can computers understand PDF documents as humans, or better?

Alexey Subach, technical lead at Dual Lab, will be hosting a presentation titled “Can computers understand PDF documents as humans, or better?” at the PDF Days Europe 2018. Session Description: Back in 2015 computers for the first time beat the human record in image recognition. A lot of success has been surrounding artificial intelligence recently, including beating world’s best Go player in 2016, recognizing phone speech better than humans in 2016, first self-driving taxis in Phoenix in 2 … Read more

About the author: Nicole Gauger is the managing partner of good news!, a communications agency based in Stockelsdorf near Lübeck, Germany. Communications and IT are in her blood. A graduate (BA) in information … Read more

April 10, 2018

Presentation preview

Alexey-Subach, Dial Lab Alexey Subach, technical lead at Dual Lab, will be hosting a presentation titled “Can computers understand PDF documents as humans, or better?” at the PDF Days Europe 2018.

Session Description: Back in 2015 computers for the first time beat the human record in image recognition. A lot of success has been surrounding artificial intelligence recently, including beating world's best Go player in 2016, recognizing phone speech better than humans in 2016, first self-driving taxis in Phoenix in 2017 and pneumonia detection at a level exceeding practicing radiologists. Still, computers are not flawless and can be struggling even with relatively simple tasks, not to mention adversarial examples that are being developed for a few artificial intelligence applications. Understanding digital documents - and PDF in particular - is a complex yet very important topic needed in fields of accessibility, automation and information retrieval. What is the state of the art and what are the short-term goals? How can we as PDF producers and consumers speed up the process? What are the limitations and ways to overcome them? Let's try to answer these questions together. We propose an approach that includes:

building a database of PDF documents suitable for training
preparing the tagging tree to serve as ground truth
evaluating the performance of the trained model

Presenter: Alexey Subach is a technical lead at Dual Lab, a service provider company known for its expertise in PDF, graphics arts and document workflow systems. Passionate about PDF, Alexey focuses on providing users with straightforward APIs for utilizing low-level features of the specification, not losing richness and flexibility of the format. He is curious about new areas of technology, making effort to use his mathematical and analytical background to dive deeper into their foundations and look for new promising applications.

Check out the detailed programme: https://pdfa.org/pdf-days-europe-2018-schedule-of-sessions/

Direct link for registration: https://en.xing-events.com/pdf-days-europe-2018.html

Featured articles

Discover pdfa.org

Key resources

Get involved

Can computers understand PDF documents as humans, or better?