PDF Association logo.

Facebook
Twitter
YOUTUBE
LINKEDIN
XING
PDFlib GmbH
Status: Full Member
Country: DE
Sector: All industries
Contact:
Joined at: Sep 06
Website: http://www.pdflib.com/

Linked User
Rainer Plöckl
Stephan Mühlstrasser
Thomas Merz

PDFlib TET



PDFlib TET (Text and Image Extraction Toolkit) reliably extracts text, images and metadata from PDF documents. TET makes available the text contents of a PDF as Unicode strings, plus detailed colour, glyph and font information as well as the position on the page. Raster images are extracted in common image formats. TET optionally converts PDF documents to an XML-based format called TETML which contains text and metadata as well as resource information.

TET contains advanced content analysis algorithms for determining word boundaries, grouping text into columns and removing redundant text. Using the integrated pCOS interface you can retrieve arbitrary objects from PDF, such as metadata, interactive elements, etc.

With PDFlib TET you can:

  • Implement the PDF indexer for a search engine
  • Repurpose text and images in PDFs
  • Convert the contents of PDFs to other formats
  • Process PDFs based on their contents, e.g. splitting based on headings (requires PDFlib+PDI in addition to TET)
  • Check wether an area on the page is empty or contains any text, images, or vector graphics

TET Product Family

The TET family comprises the following products:

Text and Image Extraction Toolkit (TET), the core product for extracting text, images, metadata and other elements from PDF.

TET PDF IFilter extracts text and metadata from PDF documents and makes it available to search and retrieval software on Windows. It is available as a separate product and is suitable for use with Microsoft search products, e.g. Windows Search, SharePoint and SQL Server.

TET Plugin for Adobe Acrobat, a free utility for extracting text and images from PDF. It can be used to evaluate TET interactively.

Location
Franziska-Bilek-Weg 9, 80339 München, Deutschland



Related Products
PDFlib FontReporter


PDFlib FontReporter is a free plugin for analyzing fonts in PDF documents.

PDFlib Products for Mobile Devices and Embedded Platforms
PDFlib products for generating and processing PDF documents on smartphones and tablets are available for mobile devices and embedded platforms

PDFlib pCOS – PDF Information Retrieval Tool


PDFlib pCOS provides a simple and elegant facility for retrieving any information from a PDF document which is not part of the page contents.

PDFlib PLOP DS - PDF Linearization, Optimization, Protection, Digital Signature


PLOP DS (Digital Signature) a versatile tool for linearizing, optimizing, repairing, analyzing, encrypting and decrypting and digitally signing PDF documents.

PDFlib PLOP - PDF Linearization, Optimization, Protection



PDFlib TET Plugin


The free TET Plugin provides easy access to the PDFlib Text Extraction Toolkit (TET).

PDFlib TET PDF IFilter - Enterprise PDF Search for Windows



PDFlib TET


PDFlib TET (Text and Image Extraction Toolkit) reliably extracts text, images and metadata from PDF documents. TET makes available the text contents of a PDF as Unicode strings, plus detailed colour, glyph and font information as well as the position on the page.

PDFlib Personalization Server (PPS)


The PDFlib Personalization Server (PPS) includes PDFlib+PDI plus additional functions for variable data processing using PDFlib Blocks.

PDFlib+PDI


PDFlib+PDI includes all PDFlib functions, plus the PDF Import Library (PDI).

PDFlib


PDFlib is the leading developer toolbox for generating and manipulating files in the Portable Document Format (PDF).