About the contributor
Lori DeFurio

More contributions
PDF 2.0 examples now available

The PDF Association is proud to present the first PDF 2.0 example files made available to the public. Created and donated to the PDF Association by Datalogics, this initial set of PDF 2.0 examples were crafted by hand and intentionally made simple in construction to serve as teaching tools for learning PDF file structure and syntax.

PDF 2.0 interops help vendors

The PDF 2.0 interop workshops included many vendors with products for creating, editing and processing PDF files. They came together in Boston, Massachusetts for a couple of days to test their own software against 3rd party files.

PDF Days Europe underscores the importance of PDF as a key component of business processes

2017 marks a record number of attendees / Experts shared fully-grounded wisdom on PDF standards across the two-day event Berlin. With over 200 attendees, this year’s PDF Days Europe in Berlin was a significant success with the largest attendance of any …

Slides and video recordings of the PDF Days Europe 2017

About 35 informative sessions across a wide range of topics, including the next-generation PDF project. Within the video frames you can use the red “play” button to get a short impression of the talk or can enjoy the high resolution version by clicking …

PDF Days Europe 2017 hits the target!

With more than 200 participants, this year’s PDF Days Europe was the largest to-date. Early feedback from attendees makes clear that it was also a great success.

Life Without Metadata?

What does it mean – “without metadata”? Metadata is everywhere. We have to deal with it each and every day; sometimes it’s critical for us – sometimes it’s in our way. Although metadata is so ubiquitous the term itself has no broad awareness in our society. Ask your friends what metadata is all about and if they are not IT specialists you might not get an answer. But you’ll find metadata on every milk bottle, every pharmaceutical product and on the passport in your wallet. But what would happen if there would be no metadata at all? Try to find an answer on Google and you’ll notice an interesting article written by R. Todd Stephens, Ph.D. In that, he points out three scenarios:

  • Scenario 1: Absence of metadata
  • Scenario 2: Metadata with no context
  • Scenario 3: Too much information

An interesting perspective because it not only highlights the scenario where metadata is absent but also the fact that the lack of context as well as information overload could make it useless.

The first scenario is obvious. For example, if you are in the supermarket and there would be no information on any of the cans – you’d be in trouble. No further data about the ingredients makes it impossible to pick the right products – and the missing price makes it a challenge for the teller to summarize the correct amount of money you have to pay. Similar in the context of digital assets: How effective would be a collection of MP3 sound files without any further information like title, album and artist?

In the second scenario the metadata has no context – or lost context. Usually, metadata is closely attached to the content. In other words, there’s no semantical difference to the label on a can of beans and the metadata properties stored within a MP3 file. However, whenever metadata looses context it becomes useless. For example, when the can label gets pealed off and you don’t find the corresponding can in the shelf or if describing metadata is just stored in a separate database but the assets are transferred out of that environment – you’re lost.

Finally, the third point highlights that there’s either too much information available or the information is not well understood, which makes it difficult to identify the content just by it’s metadata. Beyond the examples in the referenced article this might be also true for the package insert of a pharmaceutical product or a long time archival document if there’s no meta-metadata (for example schema description) available by which a user can judge the meaning of the embedded metadata in the future.

So, the overall goal is to determine the right level of metadata that is descriptive but not overwhelming allowing the user in any situation to find and/or understand the content.

Interestingly enough, the current European Parliament Elections 2009 is currently running the following advertisement campaign:

“The Metadata Dilemma”

But let’s have a closer look into the world of digital asset management – an area where you’ll usually find a lot of metadata savvy people. Here’s an interesting statement related to the challenge of living with metadata as part of assert management products:

From Matt Kloskowski – “Confessions of a Lightroom Addict”:

“I hate metadata. I’m sorry, I had to say it. I can’t stand it when I look at feature lists for Lightroom (or any other product for that matter) and I see anything with the word “metadata” listed as a feature. It’s import-ant stuff, I know, but it’s also very boring. I just assume it should be there but don’t try to sell it to me as a feature.”

But actually in the next paragraph he says:

“I love the benefits of metadata. The Metadata panel rocks and the benefits of good metadata support is very important. That’s what makes me feel bad about the other confession above. It’s an inner struggle I deal with daily ;)”

Get the point?

On one hand metadata is the glue that holds our world and processes together but on the other hand it’s often so cumbersome to deal with. Let me tell you a quick story:

A couple of years ago, at the end of the elementary school of my kids, the parents got together and the question came up on how to reward the teacher. A friend of mine had the idea of producing a small video as a giveaway for everybody and he also suggested who should do the work…

As a result, I took my laptop and visited some parents asking them for analog and digital photo and video material which they had taken during school events. Whenever we tried to find adequate material stored on a computer (or backup CDs) a journey began that took me through years of private family events and vacations before we actually found the relevant material showing the kids as part of school events. The only descriptive metadata have been the surrounding folders of the images. Searching for “school photos” or “class trip 2006” was not possible. However, each one of them had a serious amount of images on their disk already and going forward I expect that searching through their images will get more and more painful.

What is true for personal use sometimes applies to professionals as well. IT knowledgeable people know that an early investment in metadata pays off in the future but what makes it actually so difficult to put this investment in early in the cycle?

As a starting point let me ask the following questions:

  • What are the concerns of metadata?
  • What are the benefits of metadata?

The following list will highlight some of the important themes. It’s not meant to be complete but should give an impression about the landscape and some of it’s challenges:


Concerns Benefits
Manual input of information Rich metadata through automatic creation
No awareness Long term knowledge
No immediate benefit Faster access to digital content
Bad quality and lack of trust Enabling a lot of workflows


Manual metadata entry is time consuming and error-prone. That said, metadata works best whenever it’s being generated automatically (but accurate) and the user experience is adequate. In the domain of digital images the Exif standard is a good example of metadata being generated automatically and being put into context with an asset. For example the information about when a picture has been taken is supported on all cameras and modern software products do honor these properties correctly in subsequent workflows. Going forward, devices and tools will include smarter content analysis techniques to automatically add metadata to an asset. This ranges from automatic geo tagging to face detection/recognition with advanced expression techniques like smile detection, age grouping, predominant colors, etc. As an example, Adobe supports automatic Speech-To-Text transcription in Adobe Premiere Pro CS4 built on top of Adobe’s Extensible Metadata Platform (XMP) to enable enhanced metadata workflows across the production value chain.

Although people live with metadata it’s often not obvious for them to use it and manage their environment effectively. The school example above shows that personal use of metadata is often restricted to the metadata being generated by the devices. But even within business, metadata is often not seen as a separate area of investment that allows budget and resources be assigned to it. Metadata is not a “feature” in itself – but it enables features and allows workflows to connect to each other. In particular, metadata is critical for digital media since we cannot add a sticker or turn a digital photo over to write on it – therefore it is even more critical that metadata be embedded in the media.

One huge advantage of metadata attached early in the workflow is the ability to effectively manage digital assets across the various production steps and for example find asset quickly and at any time. The Return on Investment (ROI) of metadata becomes more obvious if you compare a digital asset library with and without adequate metadata assigned to it’s assets. In the end, metadata is one of the most important production time-savers that will reduce your budget long-term.

Beyond that, content is still king. In other words, you often wouldn’t need metadata if the interpretation of content is fast and convenient enough to be done in real time. For example detected faces haven’t to be stored as metadata if every system and tool would be capable of calculating them on the fly; speech-to-text metadata can be ignored if spoken words could be searched via audio within the content directly. This is related to the approach of search engines which mainly try to gather information about assets by analyzing content. As metadata can be wrong or simply changed this source of information is often not trustworthy for some workflows and systems.

In the end, we will have to deal with metadata and respect it within our workflows as seamless as possible.



  • European Parliament Elections 2009
  • http://www.europarl.europa.eu/elections2009/default.htm
  • “Life Without Metadata” (R. Todd Stephens, Ph.D.):
  • http://www.information-management.com/issues/ 20070301/1076522-1.html
  • Adobe Lightroom Killer Tips – “Confessions of a Lightroom Addict” (Matt Kloskowski):
  • http://www.lightroomkillertips.com/2008/confessions-of-a-lightroom-addict

Tags: 3rd International PDF/A Conference, Proceedings, XMP, metadata
Categories: PDF/A, XMP