PDFlib GmbH
Status: Full Member
Country: DE
Sector: All industries
Joined at: Sep 06
Website: http://www.pdflib.com/

Linked User
Rainer Plöckl
Stephan Mühlstrasser
Thomas Merz

XMP metadata support in PDFlib products

Metadata has been described as the business card of a particular digital document. Metadata often comprises a set of properties, where each property has specific meaning in the context of the document, such as the title and creator of a PDF document or the GPS position where an image was taken. Metadata plays a crucial role for handling digital data during its lifetime.

A standard format for metadata – XMP

The common format for metadata, Extensible Metadata Platform (XMP), is based on XML and was designed by Adobe in 2001 and standardized as ISO 16684-1:2012. XMP metadata travels with the file, and can be embedded in many file formats besides PDF, such as TIFF or JPEG. With XMP, metadata will even survive format conversions, e.g. from scanned TIFF to PDF. XMP is implemented in all Adobe publishing products and supported by dozens of independent software vendors and user groups.

Metadata properties are grouped in schemas. In addition to predefined schemas (e.g. Dublin Core), custom schemas can be defined to cover company- or industry-specific metadata requirements. There are various ISO standards which specify PDF subsets for certain application domains, such as archiving (PDF/A) or printing (PDF/X-4 and PDF/X-5). They all include the use of XMP metadata (except for the older standards PDF/X-1 and PDF/X-3), even mandatory in most cases.

XMP support in PDFlib GmbH products

PDFlib products offer extensive support for XMP in PDF :

PDFlib product family: With PDFlib you can create PDF documents with XMP metadata on document, page or image level. PDFlib adds user-friendly support for XMP extension schemas according to PDF/A without any struggle with XMP internals. Advanced users can directly feed all predefined XMP metadata schemas to PDFlib to be included in the generated PDF documents. The output is guaranteed to conform to PDF/A. Since PDFlib is available on all relevant operating systems and does not require any third-party products, it brings XMP support to all platforms.

Injecting XMP in PDF with PLOP and PLOP DS: With PDFlib PLOP and PLOP DS you can insert XMP in existing PDF documents in case PDF documents do not contain all required metadata properties. This is particularly useful in PDF/A workflows since XMP support in PLOP and PLOP DS is PDF/A-aware. For example, custom XMP with extension schemas can be injected in PDF/A documents from workflows which do not support extension schemas.

Extracting XMP with pCOS: PDFlib pCOS is the PDFlib tool for retrieving all kinds of information from PDF documents. pCOS offers a simple programming method for extracting XMP metadata from PDF on document, page or image level. XMP metadata is normalized to Unicode so that you don’t have to worry about encoding issues. XMP retrieval works regardless of compression, encryption, and PDF object structure. As pCOS follows the PDF object structure  in all cases, the correct XMP metadata blocks are always retrieved.

Searching for XMP metadata with TET PDF IFilter: PDFlib TET PDF IFilter implements Microsoft’s IFilter interface and makes XMP metadata searchable with various Microsoft and third-party desktop and enterprise search products, such as Microsoft Search, Microsoft SharePoint, or SQL Server. In addition to page contents, TET PDF IFilter indexes XMP metadata as well as standard or custom document info entries. TET PDF IFilter optionally integrates metadata in the indexed raw text. As a result, even full-text search engines without metadata support (e.g. SQL Server) can search for metadata.

PDFlib Text and Image Exctraction Toolkit (TET): includes XMP in XML that is created from PDF documents.

Related Products
PDFlib FontReporter

PDFlib FontReporter is a free plugin for analyzing fonts in PDF documents.

PDFlib Products for Mobile Devices and Embedded Platforms
PDFlib products for generating and processing PDF documents on smartphones and tablets are available for mobile devices and embedded platforms

PDFlib pCOS – PDF Information Retrieval Tool

PDFlib pCOS provides a simple and elegant facility for retrieving any information from a PDF document which is not part of the page contents.

PDFlib PLOP DS - PDF Linearization, Optimization, Protection, Digital Signature

PLOP DS (Digital Signature) a versatile tool for linearizing, optimizing, repairing, analyzing, encrypting and decrypting and digitally signing PDF documents.

PDFlib PLOP - PDF Linearization, Optimization, Protection

PDFlib TET Plugin

The free TET Plugin provides easy access to the PDFlib Text Extraction Toolkit (TET).

PDFlib TET PDF IFilter - Enterprise PDF Search for Windows


PDFlib TET (Text and Image Extraction Toolkit) reliably extracts text, images and metadata from PDF documents. TET makes available the text contents of a PDF as Unicode strings, plus detailed colour, glyph and font information as well as the position on the page.

PDFlib Personalization Server (PPS)

The PDFlib Personalization Server (PPS) includes PDFlib+PDI plus additional functions for variable data processing using PDFlib Blocks.


PDFlib+PDI includes all PDFlib functions, plus the PDF Import Library (PDI).


PDFlib is the leading developer toolbox for generating and manipulating files in the Portable Document Format (PDF).