Facebook
Twitter
YOUTUBE
LINKEDIN
XING
Datalogics
Status: Partner Member
Country: US
Sector: All industries
Contact:
Joined at: Feb 08
Website: http://www.datalogics.com/

Linked User
Maryanne Pavlin
Matt Kuznicki
Nicki Bullock
Vel Genov
Emma Kaschke
Leonard Ho
Nancy Panos

Redaction with Overlay Text using Datalogics PDF Java Toolkit

When publishing documents online, you have to operate under the assumption that someone, somewhere, has made a copy of it and that it will exist forever. Because of that, we need to take extra care to remove sensitive data from those files before they go to the Internet universe. This is why redacting (the removal of the sensitive content) is such an important feature. In a PDF file, you are creating redaction annotations that will eventually be applied to the document and take the place of the content that was redacted. When content is redacted, you can specify what should be put in the space left by removing the sensitive content – this is typically a black rectangle. In some cases, though, you may be required to provide additional information in the document to indicate what type of content was removed, or why the content was removed.

Redaction annotations have a unique property designed to handle those instances where you need to specify what type of content was removed, or why the content was removed, and it is called Overlay Text. By setting the Overlay Text on a redaction annotation, you are saying “this text should be displayed on top of the redaction annotation” which will enable a human to read that text, understand what type of content was removed, and why, without knowing the actual content. If you are redacting content in PDFs that will be published due to a Freedom of Information Act request, Overlay Text is extremely important to you. The Freedom of Information Act requires that content that is redacted is replaced by one of the defined redaction codes to indicate what was redacted. We recently updated the RedactAndSanitize sample for our PDF Java Toolkit to demonstrate setting the Overlay Text property so that those who need to comply with the Freedom of Information Act can do so with ease! Let’s take a look at the updates to the sample so you know how to update your application to specify overlay text.

Redaction with Datalogics PDF Java Toolkit

Specifying the text to be used as Overlay Text is as straightforward as it gets; here is the online that does it assuming you already have a PDFAnnotationRedaction object to work with.

        annot.setOverlayText("Redacted");

There is one issue with this, though. Earlier we said the most common item used to replace redacted content is a black rectangle, and if you only specify the text that should be used as Overlay Text, you will end up with black text on a black rectangle! This is not what you would expect and you would not be able to see or read the Overlay Text, so you still would not be meeting the requirements of the Freedom of Information Act. To change the appearance of the Overlay Text, we just need to work with a few more objects so that our PDFAnnotationRedaction object does not use black text on a black rectangle. Let’s start by setting up the required resources to use Helvetica as the font for the Overlay Text

        final PDFResources resources = document.requireCatalog().procureInteractiveForm().procureResources();
        final PDFContents contents = PDFContents.newInstance(document);
        final ModifiableContent content = ModifiableContent.newInstance(contents, resources);
        final PDFFontSimple font = PDFFontSimple.newInstance(document, ASName.k_Helvetica, ASName.k_Type1);
        final ASName fontName = content.addResource(font);

And then creating a new PDFDefaultAppearance using our resources

        // Helvetica 8pt, color green (3 values == RGB, 4 == CMYK, 1 == grayscale)
        final PDFDefaultAppearance pdfDefaultAppearance = PDFDefaultAppearance.newInstance(document, fontName, 8.0,
                                                                                           new double[] { 0, 1, 0 });

Now with our resources constructed and our PDFDefaultAppearance setup to use Helvetica with a point size of 8 and the text color set to Green, we just need to set the appearance on our PDFAnnotationRedaction object using the PDFDefaultAppearance object we just constructed

        annot.setDictionaryValue(ASName.k_DA, pdfDefaultAppearance);

With those changes made, when content is redacted, the Overlay Text will be visible as it will be green text on a black rectangle.

In the screenshot, you might notice that the text specified as Overlay Text is not fully displayed (“Redact” instead of “Redacted”). Since we are replacing existing content with a new piece of content, there may be a discrepancy between the sizes of the new and old content, this is why the Freedom of Information Act uses redaction codes (like “(b)(1)(a)”) instead of the words that correspond to what was redacted.

If you are currently redacting documents, or you have a need in the future to redact documents, try out Datalogics PDF Java Toolkit to automate this process.

Related Products
Adobe PDF Library


The Adobe PDF Library SDK is a low-level PDF library that contains a powerful set of native C/C++ APIs with interfaces for .NET and Java APIs. Systems integrators, independent software vendors (ISVs), enterprise IT developers, and others can integrate Adobe PDF functionality within custom applications in a client and / or server environment.

PDF Java Toolkit


Datalogics PDF Java Toolkit is a native Java library that provides high-level APIs for automating PDF workflows like processing PDF forms, verifying digital signatures, and extracting text. It also offers low-level APIs for working directly with the structure of the PDF for those times you need it.

Adobe Normalizer


Adobe Normalizer, is an API which allows developers to quickly and easily convert Encapsulated PostScript (EPS) and PostScript (PS) files to Adobe’s Portable Document Format (PDF). The industry-standard Adobe Distiller and Distiller Server are themselves built upon PDF Converter SDK; and now this API is available separately to application developers.

Adobe PDF Print Engine


The Adobe PDF Print Engine is a common rendering engine technology, packaged as a software development kit (SDK). It can be the basis for a variety of products for previewing and printing Adobe Portable Document Format (PDF) documents at different stages of the professional print workflow.

PDF2IMG


Datalogics PDF2IMG is a command-line utility that converts PDF files to a variety of image formats including PNG, JPG, TIFF, BMP, and more. It is built upon the Adobe PDF Library and uses Adobe technology for unrivaled color management during the PDF conversion process

PDF Alchemist


Datalogics PDF Alchemist is a new (C/C++) SDK for intelligently extracting text and images from PDFs and exporting to HTML 5 or EPUB. It employs sophisticated techniques to identify and reconstruct “text flows” within the PDF.