Screen-shot from Biden PDF.

Hunter Biden’s “email” and the potential for deepfakes with PDF

Duff Johnson, Peter Wyatt // October 19, 2020

PDF in general Article

Introduction

Akin to our earlier series on the Mueller Report PDF, this article provides cultural framing and technical background for considering the evidence provided to-date by the New York Post regarding Hunter Biden’s alleged email from Ukraine.

Some journalists are covering this PDF document as if it represents an email from Hunter Biden’s computer; it may or it may not. As we - the CEO and Principal Scientist of the PDF Association and the ISO Project co-Leaders for ISO 32000 (the PDF standard) - will explain, the episode offers many lessons for digital forensics with respect to PDF technology.

The article is intended for journalists, researchers, attorneys, law-enforcement, application developers and other professionals.

Background

On Wednesday, October 14, 2020 the New York Post published an article in which they claimed to be in possession of a copy of a hard-drive belonging to Hunter Biden, the son of the former Vice President of the United States and current Democratic candidate for president, Joe Biden.

As evidence, the newspaper posted a PDF of an email on Scribd. In its story, the New York Post claimed the source of the information as follows:

“Steve Bannon, former adviser to President Trump, told The Post about the existence of the hard drive in late September and Giuliani provided The Post with a copy of it on Sunday.”

The PDF presented by the Post did not include the email’s header information that would allow technical experts to assess whether the email might be genuine.

Instead the newspaper relied on the PDF to substantiate its claim of possessing Hunter Biden’s hard-drive, and has to-date refused any independent analysis of their holdings.

Who made the PDF?

This PDF file is not itself an email; it’s a representation of an email. A cursory examination of the PDF document shows a creation date of October 10, 2019.

Screenshot of Biden PDF showing metadata with October, 2019 date.

The New York Post’s story says the laptop from which this “email” originated was dropped off for repair in April, 2019.

“The computer was dropped off at a repair shop in Biden’s home state of Delaware in April 2019, according to the store’s owner.”

So if the PDF document was made in October, 2019…  who made the PDF? The Post says that they received the hard drive copy on Sunday (October 11, 2020), so the Post didn’t make it.

It wasn’t Hunter Biden, because he wasn’t in possession of the laptop. Notably though, Hunter Biden was in the news at precisely that time. Indeed, on the exact day the PDF was created news of Rudy Guliani’s interest in Hunter Biden and Ukraine featured in the New York Times.

Screenshot of Google Trends search for "Hunter Biden" showing a spike in October 2019.
Source: Google Trends

Beware PDF deepfakes

In observing the media reaction to the New York Post’s story it is clear that most observers refer to the PDF document as if it were literally an email instead of merely a representation of an email. Journalists might well consider taking real care in their terminology when referencing digital media to ensure that the public understands the distinction between an “email” and a “PDF of an email”.

Without email header metadata or digital signatures there is insufficient evidence to assume that this PDF document originated on Hunter Biden’s laptop at all, especially if the chain-of-custody is suspect. Such a PDF could be readily constructed without ever coming into contact with Hunter Biden’s laptop.

PDF is a ubiquitous, reliable format that excels at delivering a precise visual representation. In 2020, PDF documents have in many cases entirely replaced paper documents; and the format is rightly trusted… but that trust can be easily manipulated. This subject — a large one — is being addressed through the work of the Content Authenticity Initiative.

Whoever decided to use the PDF file as “evidence” chose to leverage the trust users typically place in PDF technology. It’s typical to find that end-users believe PDF documents are difficult to fake or are unchangeable; neither is true. The Biden PDF is an example of a document that is exceptionally easy to fake with minimal technical skills.

Fundamentally, a PDF document is not to be confused with its source; a PDF is always a representation of something else. If there’s any question of authenticity a PDF document should not be assumed to be a true representation absent chain-of-custody, digital signatures and/or other assurances. In the case of a PDF purporting to represent an email, minimally, email header information should also be present.

There are efforts towards a specification for email capture to PDF, but the Biden PDF is just… a plain PDF.

How the Biden PDF was made

Every PDF creation process uses certain structures that are typical for it. Due to the rich metadata capabilities in the PDF file format it is often possible to derive hints about its origin.

As part of the PDF Association's effort to educate the professionals, the media and the public on the capabilities, limitations and significance of PDF technology, the Biden PDF was analyzed by PDF Association Principal Scientist Peter Wyatt. We offer this technical assessment to enlighten journalists and others.

Metadata and file creation

Metadata in the Biden PDF indicates that it was created by the “Mail” application on an Apple Mac, which is consistent with the story offered by the New York Post. As mentioned above the file’s metadata includes a creation date in October, 2019.

It should be noted that computer clocks are unreliable and easily manipulated by those who wish to have documents appear to come from a different time. The file doesn’t include a timestamped digital signature, so we simply cannot know when this file was really created.

There are two methods of making a PDF document on a Mac:

  • “Print” to PDF or “Save as PDF” using Apple’s Quartz engine.
  • “Save as PDF” from an application which uses its own internal PDF technology (i.e. not via Apple’s Quartz). This would be the least common approach on macOS, as Quartz is highly convenient, and ensures the PDF exactly matches graphics drawn using the macOS graphics system.

If the file’s metadata is to be trusted, the Biden PDF was created by "Mac OS X 10.13.6 Quartz PDFContext" which is Apple proprietary technology with a public and entirely predictable API. There is no information in the PDF document that would allow us to identify its source beyond the identity of the PDF producer technology and creating application. Our technical assessment of the internal structures of the Biden PDF is that it is highly consistent with other PDFs created by Apple Quartz.

The Biden PDF has no incremental updates so it has not been obviously edited or modified by another standard PDF editor application after creation. If it had been modified one would very likely see either incremental updates or re-written Document Information dictionary with Creator/Producer metadata updated to reflect this new program.

Bad image

The PDF features an image in the top-right-hand corner. As others have pointed out this image is remarkably poor quality. From Apple we would expect a clean, clear image, not this mess. There’s more on this in the technical annex, below.

Worth asking Apple...

We note that as of October 2020 under macOS 10.15.7 a circle with senders’ initials appears only when the email displays in the Mac’s Mail client, and never in documents exported or printed, as the Biden PDF claims to be.

For those interested in validating the authenticity of this PDF, we’d recommend asking Apple engineers whether or not in October 2019 their Mail application driving the "Mac OS X 10.13.6 Quartz PDFContext" engine included the circle in printed or saved PDF files.

UPDATE: Several users have contacted us to confirm that they have recreated this environment and determined that the image's appearance is consistent with the creation software indicated by the file's metadata.

Bad mailto: links on email addresses

We noticed that the links on the email addresses at the top of the page seemed very strange for PDF documents produced from such a common application, as each mailto: link targets an incorrect email address!

For example, the mailto: for the annotation on Hunter Biden’s email address is actually: “Bidenhbiden@rosemontseneca.com”; obviously not his actual email address. The software that creates  the link is mistakenly also grabbing the text immediately prior to the actual email address. Just click on the link in the PDF to find out for yourself.

Worth asking Apple...

This is a straight-forward bug in Apple’s layout engine that apparently (if the Biden PDF’s metadata is to be believed) existed in October 2019. All we can say for sure is that the bug does indeed exist a year later in October 2020 under macOS 10.15.7.

As with the unfortunate image in the upper-right corner (see above), it’s not really feasible for us to look back in time to conduct experiments on a Mac running the older Quartz 10.13.6 engine.

UPDATE: As with the bad image bug noted above, we have confirmed that this problem existed in the 10.13.6 iteration of Quartz.

Conclusion

Without any comment on its authenticity, the Hunter Biden “email” was a PDF, not an email, and it could easily have come from anywhere. With the timeline they provide, the Post’s own story eliminates the possibility that the PDF file was created by Hunter Biden.

One cannot expect the public to be cognizant of the full significance of specific digital media types; whether web pages, email, video or PDF documents. We do believe that it’s important, however, to consider these descriptions carefully and to note when straightforward minimum requirements for credibility - such as the provision of email header data - should be fundamental to any investigation or news story.

Technical Annex: Quartz output

We have not previously had reason to deep-dive into the Quartz engine’s output. What we found led us to some issues worthy of Apple’s attention.

Inefficiency

Quartz generates unnecessary whitespace before each PDF name and unnecessary line breaks; the PDF could be written out much more compactly.

Quartz generates unnecessary indirect references for each Document Information dictionary key value, and also empty values for various keys. Normal practice is to use direct objects for such metadata and eschew writing empty metadata to improve efficiency in storage and retrieval.

In the Biden PDF the circular image’s appearance is created by FLATE-compressed 48x48 RGB raw pixel data with a gray softmask (also 48x48) applied to give the circular outline, most likely implying that it originated from a PNG image. This ultra-low resolution image is then scaled up when painting the PDF page.

The correct and more efficient way to encode such a graphic in PDF would be to create a filled vector graphic circle and then center the 2 letters of text (as text, not rendered to an image). This would result in a simpler and smaller PDF file as well as an appearance that can scale to any resolution or zoom factor without any pixelation.

Very old PDF specification

%PDF-1.3 appears as the file header, but is overridden by a Version key in the document catalog with the value 1.4. The Version key needed to be added to indicate that the softmask transparency effect being used on the circular image was introduced in PDF 1.4 (see above). Why not just move to a PDF specification from this millennium 😊, with many new features including smaller file sizes? PDF 1.3 was published in 1999 with PDF 1.4 published in 2001.


ABOUT THE AUTHORS

Duff Johnson
Duff Johnson

Duff serves the PDF industry as ISO Project co-Leader and US TAG chair for both ISO 32000 (the PDF specification) and ISO 14289 (PDF/UA). As Executive Director of the PDF Association, Duff coordinates several working groups, speaks at a wide variety of industry events and promotes the advancement and adoption of PDF technology worldwide. An independent consultant, Duff Johnson is a veteran …


Peter Wyatt
Peter Wyatt

Peter Wyatt is an independent technology consultant with deep file format and parsing expertise, who is a developer and researcher actively working on PDF technologies for more than 18 years. He is currently Project Co-Leader of ISO 32000 (the core PDF standard), a member of the Board of the PDF Association, and co-Chairs the PDF Association PDF TWG. He is …

ABOUT THE AUTHORS

Duff Johnson

Duff Johnson

Duff serves the PDF industry as ISO Project co-Leader and US TAG chair for both ISO 32000 (the PDF specification) and …


Peter Wyatt

Peter Wyatt

Peter Wyatt is an independent technology consultant with deep file format and parsing expertise, who is a developer and researcher …

© 2020 Association for Digital Document Standards e.V. | Privacy Policy | Imprint