Digging for information by extracting data from a PDF document

Nadine Schuppisser // December 8, 2016

Member News

Print Friendly, PDF & Email

Extracting text from a PDF document is one of the most popular information retrieval function. But how about other information such as images, metadata and more? It can be simple - but also tricky.

Among the easiest things to extract you'll find metadata. The document metadata can usually be extracted as a short XMP stream. Even if the document contains an old fashioned information dictionary then the extraction of the key / value pairs is not a big deal. Similar are outlines (bookmarks), navigation aids such as named destinations, links and the like.

Read more on how to extract information from a PDF document in our PDF expert blog


PDF Tools counts more than 5,000 companies and organizations in 70 countries among its customers, making it one of the world’s leading producers of software solutions and programming components for PDF and PDF/A products. The portfolio of PDF Tools ranges from components to services and solutions. The product range support …

Read more

ABOUT THE AUTHORS

Nadine Schuppisser
ABOUT THE AUTHORS