Facebook
Twitter
YOUTUBE
LINKEDIN
XING
Datalogics
Status: Partner Member
Country: US
Sector: All industries
Contact:
Joined at: Feb 08
Website: http://www.datalogics.com/

Linked User
Maryanne Pavlin
Matt Kuznicki
Nicki Bullock
Vel Genov
Emma Kaschke
Leonard Ho
Nancy Panos

Creating Comment Annotations using the Datalogics PDF Java Toolkit

Sample of the Week:

One of the more interesting features of Adobe Acrobat is the ability to review and markup documents using electronic versions of common tools used to mark up paper documents. Acrobat gives you a highlighter, a sticky note tool, even a paperclip to attach additional documents or files. These are stored as annotation on top of the PDF pages. The Datalogics PDF Java Toolkit allows developers to create, modify, extract and manage these annotations in a way that is completely interoperable with Adobe Acrobat.

What You Need to Know First:

Every object on a PDF page that isn’t content is, technically, an annotation; links, printer marks, some types of watermarks, embedded movies and 3D models, as well as lines, arrows, highlights. All are annotations but this article only covers the annotations that appear in the Comments panel of Adobe Acrobat.

All of the annotation types can appear anywhere on the PDF page but some of them only make sense when they are associated with words. For example, in Acrobat DC, you can draw on the page with the highlighter tool and when you are not over a word, an Ink annotation gets created and is defaulted to yellow… just like a highlighter. However, when you are over a word, the highlighter will act as a selection tool and you can highlight specific words. Acrobat will then create Highlight annotations and use the bounding rectangle of the selection as the bounding rectangle of the highlight. The same effect can be created programmatically using the Datalogics PDF Java Toolkit as is demonstrated in the Gist referenced lower in this article. We can use the ReadingOrderTextExtractor and the WordsIterator classes to gather the words in the document, locate the words we want to highlight and then use some of the coordinate information in the Word objects to create new Highlight annotations.

There are basically two ways to position an annotation on a PDF page.

  1. A rectangle. The rectangle can either describe the lower-left and upper-right corner of the rectangle that represents the bounding box of the annotation or it can be a single point where the lower-left and upper-right corner are actually the same point. This second type is good for the Sticky Note annotations where the appearance of the annotation is actually the responsibility of the viewer to create.
  2. A vertex array. Polylines, Polygons, and Ink annotations use an array of X,Y coordinates to define a path that then gets drawn by the AppearanceService. The way that the vertices get connected is defined by the annotation type. Ink annotations get connected with curves, the other two get connected with straight lines.

It’s important to note that several of the Acrobat Commenting tools map to the same PDFAnnotation subclasses. For example, lines and arrows both use the class PDFAnnotationLine, arrows just get an extra couple of properties that define where the arrow, or line ending, goes and what it looks like. The Free Text Tool, Typewriter Tool, Text Box, and Callout all use the PDFAnnotationFreeText class.

And Finally, it’s important to note that not all viewers are equally as capable of interpreting the annotation objects properly and displaying them so it’s good practice to make the Datalogics PDF Java Toolkit generate the appearances for the annotations at the time of creation. This way, even if a view cannot work with an annotation, it will still be visible.

Creating Comment Annotations using the Datalogics PDF Java Toolkit

The code snippets below show how to add most of the annotation types found in Adobe Acrobat, the full Gist shows more and is commented with more detailed information.

Please click on the images below to enlarge.

Stick Notes:

Stick Notes

Highlights:

HIghlights

Strikeout:

Strikeout

Typewriter:

Typewriter

Free Form Draw:

Free Form Draw

Arrow:

Arrow

To get started working with PDF, download this Gist and request an evaluation copy of The Datalogics PDF Java Toolkit.

Related Products
Adobe PDF Library


The Adobe PDF Library SDK is a low-level PDF library that contains a powerful set of native C/C++ APIs with interfaces for .NET and Java APIs. Systems integrators, independent software vendors (ISVs), enterprise IT developers, and others can integrate Adobe PDF functionality within custom applications in a client and / or server environment.

PDF Java Toolkit


Datalogics PDF Java Toolkit is a native Java library that provides high-level APIs for automating PDF workflows like processing PDF forms, verifying digital signatures, and extracting text. It also offers low-level APIs for working directly with the structure of the PDF for those times you need it.

Adobe Normalizer


Adobe Normalizer, is an API which allows developers to quickly and easily convert Encapsulated PostScript (EPS) and PostScript (PS) files to Adobe’s Portable Document Format (PDF). The industry-standard Adobe Distiller and Distiller Server are themselves built upon PDF Converter SDK; and now this API is available separately to application developers.

Adobe PDF Print Engine


The Adobe PDF Print Engine is a common rendering engine technology, packaged as a software development kit (SDK). It can be the basis for a variety of products for previewing and printing Adobe Portable Document Format (PDF) documents at different stages of the professional print workflow.

PDF2IMG


Datalogics PDF2IMG is a command-line utility that converts PDF files to a variety of image formats including PNG, JPG, TIFF, BMP, and more. It is built upon the Adobe PDF Library and uses Adobe technology for unrivaled color management during the PDF conversion process

PDF Alchemist


Datalogics PDF Alchemist is a new (C/C++) SDK for intelligently extracting text and images from PDFs and exporting to HTML 5 or EPUB. It employs sophisticated techniques to identify and reconstruct “text flows” within the PDF.