PDF Association logo.



Get the latest information!

Latest Posts

PDF Day France will be the first French-speaking event of the PDF Association, organised by our member ORPALIS. It will take place in Toulouse which is the home ground of Airbus and we are very happy that Airbus will present a case study around its usage of PDF in their document management environment!

Electronic Document Conference Seattle logo

Prospective presenters at the Electronic Document Conference 2019 are invited to submit high-quality original proposals for 25-minute presentations on subjects of interest to developers and technical product managers concerned with electronic document implementations.

Screen-shot of Google Trends search for "PDF".

How do we gain insight into how users’ views of documents are shifting? Google Trends is an increasingly interesting source of high-level marketplace data. By aggregating Google’s search data over time, reporting a term’s popularity as compared with all other searches.

Can computers understand PDF documents as humans, or better?

Alexey-Subach, Dial LabAlexey Subach, technical lead at Dual Lab, will be hosting a presentation titled “Can computers understand PDF documents as humans, or better?” at the PDF Days Europe 2018.

Session Description: Back in 2015 computers for the first time beat the human record in image recognition.
A lot of success has been surrounding artificial intelligence recently, including beating world’s best Go player in 2016, recognizing phone speech better than humans in 2016, first self-driving taxis in Phoenix in 2017 and pneumonia detection at a level exceeding practicing radiologists.
Still, computers are not flawless and can be struggling even with relatively simple tasks, not to mention adversarial examples that are being developed for a few artificial intelligence applications.
Understanding digital documents – and PDF in particular – is a complex yet very important topic needed in fields of accessibility, automation and information retrieval.
What is the state of the art and what are the short-term goals? How can we as PDF producers and consumers speed up the process? What are the limitations and ways to overcome them? Let’s try to answer these questions together.
We propose an approach that includes:

  • building a database of PDF documents suitable for training
  • preparing the tagging tree to serve as ground truth
  • evaluating the performance of the trained model

Presenter: Alexey Subach is a technical lead at Dual Lab, a service provider company known for its expertise in PDF, graphics arts and document workflow systems. Passionate about PDF, Alexey focuses on providing users with straightforward APIs for utilizing low-level features of the specification, not losing richness and flexibility of the format. He is curious about new areas of technology, making effort to use his mathematical and analytical background to dive deeper into their foundations and look for new promising applications.

Check out the detailed programme:
Direct link for registration: