We’ve done PDF Day events and technical conferences across Europe, in the US, in Australia, and elsewhere. This Electronic Document Conference is the first PDF Association event that’s open to all technologies pertaining to documents. It’s about explor …Happy new logo!
2006: The PDF/A Competence Center A new year brings new things, and 2019 is no exception! The “four red blocks” logo was first created for the PDF/A Competence Center in 2006. When that organization became the PDF Association in 2011, the design was ad …Save-The-Date: PDF Day France, Toulouse, April 4, 2019
PDF Day France will be the first French-speaking event of the PDF Association, organised by our member ORPALIS. It will take place in Toulouse which is the home ground of Airbus and we are very happy that Airbus will present a case study around its usage of PDF in their document management environment!Electronic Document Conference: Call for Papers
Prospective presenters at the Electronic Document Conference 2019 are invited to submit high-quality original proposals for 25-minute presentations on subjects of interest to developers and technical product managers concerned with electronic document implementations.Have we passed ‘peak PDF’?
How do we gain insight into how users’ views of documents are shifting? Google Trends is an increasingly interesting source of high-level marketplace data. By aggregating Google’s search data over time, reporting a term’s popularity as compared with all other searches.
As we noted the Sophos blog has a long piece about modern-day link-farming with PDF documents. Less scrupulous marketers have discovered that Google trusts PDF documents more than HTML pages; they’ve been “poisoning Google search results” accordingly.
The notion that PDF document authors are innately pure of heart as compared to HTML pages is doubtless being re-evaluated right now, especially since PDF files are an enormous proportion of important web content, and interest in PDF continues to grow.
Apart from tweaking search algorithms so that PDF files aren’t receiving undue credit just because they are PDF files, what should Google (or other search engine developers) do about PDF? What are, for example, the benefits awaiting search-engine and other application developers that leverage high-quality PDF files?
Once the PDF specification is fully supported (it’s an ISO standard; it won’t bite!) lots of things get both easier and better.
An idea for Google and other search engine developers: to really impress people with your acumen in handling PDF documents, go beyond simply treating PDF as a page-description model, and support high-quality tagged PDF!
What might be possible if search engines were savvy to PDF’s model for semantics and logical reading order?
Although PDF is a page description format it can include all the necessary instructions to allow consuming software to make other choices. Supporting tagged PDF (ISO 32000-1:2008, 14.8, download it for free), by itself, would generate other fairly dramatic new features for browsers.
Accurate abstraction of tagged PDF’s content to vanilla HTML, much as callas’s pdfGoHTML does today (sadly, it requires Adobe Acrobat), would facilitate total flexibility in using tagged PDF on mobile devices. Apple’s iOS browser, Safari, effectively does this today on some HTML pages with “Reader View” – why not also for PDF?
Besides improved indexing for search and the ability to reliably reuse PDF content in web browsers there are many ways in which complete support for PDF technology would deliver substantial value to content management systems and end users alike:
It’s all ISO-standardized, and thus, inherently interoperable.
PDF is here to stay, and tagged PDF offers tremendous advantages for both search and re-use applications. It’s high time that search engine, browser and other application developers decided to think again about the crusty old format users have loved for over 20 years.