Facebook
Twitter
YOUTUBE
LINKEDIN
XING
About the contributor
Lori DeFurio

More contributions
“PDF can do THAT?!”

PDF files deliver a complete package of information that defines a document; everything that’s needed to represent the text, graphics and layout that the recipient receives. To most people, PDF is “electronic paper” – the digital expression of a cellul …

The only digital document format

What is a “document”? A document is a record of some (typically written) content – a publication, a contract, a statement, a painting – at a moment in time. Until the advent of computers (and scanners), the media typically considered useable for such r …

Save the Date: PDF Days Europe 2018, May 14-16, in Berlin

PDF Days Europe is the most popular PDF event of the year. It’s where the PDF industry meets, and where institutional and corporate users come to learn what else PDF could do for them. The first two PDF Days will offer a broad range of educational sessions focussed on current and perennial topics in the world of PDF technology implementation.

The Power of the Page

It’s a question that vexes vendors of web-based solutions everywhere: why do people still insist on PDF files? And why does PDF’s mindshare keep going up? “PDF is such antediluvian technology!” they say. “It’s pre-web, are you kidding me? It’s so old-f …

PDF Association technical resources: an overview

PDF is PDF because files produced with one vendor’s software can be read using a different vendor’s software with no loss of fidelity. Interoperability is key to our industry. The PDF Association is a international membership organization dedicated to …

Life Without Metadata?


What does it mean – “without metadata”? Metadata is everywhere. We have to deal with it each and every day; sometimes it’s critical for us – sometimes it’s in our way. Although metadata is so ubiquitous the term itself has no broad awareness in our society. Ask your friends what metadata is all about and if they are not IT specialists you might not get an answer. But you’ll find metadata on every milk bottle, every pharmaceutical product and on the passport in your wallet. But what would happen if there would be no metadata at all? Try to find an answer on Google and you’ll notice an interesting article written by R. Todd Stephens, Ph.D. In that, he points out three scenarios:

  • Scenario 1: Absence of metadata
  • Scenario 2: Metadata with no context
  • Scenario 3: Too much information

An interesting perspective because it not only highlights the scenario where metadata is absent but also the fact that the lack of context as well as information overload could make it useless.

The first scenario is obvious. For example, if you are in the supermarket and there would be no information on any of the cans – you’d be in trouble. No further data about the ingredients makes it impossible to pick the right products – and the missing price makes it a challenge for the teller to summarize the correct amount of money you have to pay. Similar in the context of digital assets: How effective would be a collection of MP3 sound files without any further information like title, album and artist?

In the second scenario the metadata has no context – or lost context. Usually, metadata is closely attached to the content. In other words, there’s no semantical difference to the label on a can of beans and the metadata properties stored within a MP3 file. However, whenever metadata looses context it becomes useless. For example, when the can label gets pealed off and you don’t find the corresponding can in the shelf or if describing metadata is just stored in a separate database but the assets are transferred out of that environment – you’re lost.

Finally, the third point highlights that there’s either too much information available or the information is not well understood, which makes it difficult to identify the content just by it’s metadata. Beyond the examples in the referenced article this might be also true for the package insert of a pharmaceutical product or a long time archival document if there’s no meta-metadata (for example schema description) available by which a user can judge the meaning of the embedded metadata in the future.

So, the overall goal is to determine the right level of metadata that is descriptive but not overwhelming allowing the user in any situation to find and/or understand the content.

Interestingly enough, the current European Parliament Elections 2009 is currently running the following advertisement campaign:

“The Metadata Dilemma”

But let’s have a closer look into the world of digital asset management – an area where you’ll usually find a lot of metadata savvy people. Here’s an interesting statement related to the challenge of living with metadata as part of assert management products:

From Matt Kloskowski – “Confessions of a Lightroom Addict”:

“I hate metadata. I’m sorry, I had to say it. I can’t stand it when I look at feature lists for Lightroom (or any other product for that matter) and I see anything with the word “metadata” listed as a feature. It’s import-ant stuff, I know, but it’s also very boring. I just assume it should be there but don’t try to sell it to me as a feature.”

But actually in the next paragraph he says:

“I love the benefits of metadata. The Metadata panel rocks and the benefits of good metadata support is very important. That’s what makes me feel bad about the other confession above. It’s an inner struggle I deal with daily ;)”

Get the point?

On one hand metadata is the glue that holds our world and processes together but on the other hand it’s often so cumbersome to deal with. Let me tell you a quick story:

A couple of years ago, at the end of the elementary school of my kids, the parents got together and the question came up on how to reward the teacher. A friend of mine had the idea of producing a small video as a giveaway for everybody and he also suggested who should do the work…

As a result, I took my laptop and visited some parents asking them for analog and digital photo and video material which they had taken during school events. Whenever we tried to find adequate material stored on a computer (or backup CDs) a journey began that took me through years of private family events and vacations before we actually found the relevant material showing the kids as part of school events. The only descriptive metadata have been the surrounding folders of the images. Searching for “school photos” or “class trip 2006” was not possible. However, each one of them had a serious amount of images on their disk already and going forward I expect that searching through their images will get more and more painful.

What is true for personal use sometimes applies to professionals as well. IT knowledgeable people know that an early investment in metadata pays off in the future but what makes it actually so difficult to put this investment in early in the cycle?

As a starting point let me ask the following questions:

  • What are the concerns of metadata?
  • What are the benefits of metadata?

The following list will highlight some of the important themes. It’s not meant to be complete but should give an impression about the landscape and some of it’s challenges:

 

Concerns Benefits
Manual input of information Rich metadata through automatic creation
No awareness Long term knowledge
No immediate benefit Faster access to digital content
Bad quality and lack of trust Enabling a lot of workflows

 

Manual metadata entry is time consuming and error-prone. That said, metadata works best whenever it’s being generated automatically (but accurate) and the user experience is adequate. In the domain of digital images the Exif standard is a good example of metadata being generated automatically and being put into context with an asset. For example the information about when a picture has been taken is supported on all cameras and modern software products do honor these properties correctly in subsequent workflows. Going forward, devices and tools will include smarter content analysis techniques to automatically add metadata to an asset. This ranges from automatic geo tagging to face detection/recognition with advanced expression techniques like smile detection, age grouping, predominant colors, etc. As an example, Adobe supports automatic Speech-To-Text transcription in Adobe Premiere Pro CS4 built on top of Adobe’s Extensible Metadata Platform (XMP) to enable enhanced metadata workflows across the production value chain.

Although people live with metadata it’s often not obvious for them to use it and manage their environment effectively. The school example above shows that personal use of metadata is often restricted to the metadata being generated by the devices. But even within business, metadata is often not seen as a separate area of investment that allows budget and resources be assigned to it. Metadata is not a “feature” in itself – but it enables features and allows workflows to connect to each other. In particular, metadata is critical for digital media since we cannot add a sticker or turn a digital photo over to write on it – therefore it is even more critical that metadata be embedded in the media.

One huge advantage of metadata attached early in the workflow is the ability to effectively manage digital assets across the various production steps and for example find asset quickly and at any time. The Return on Investment (ROI) of metadata becomes more obvious if you compare a digital asset library with and without adequate metadata assigned to it’s assets. In the end, metadata is one of the most important production time-savers that will reduce your budget long-term.

Beyond that, content is still king. In other words, you often wouldn’t need metadata if the interpretation of content is fast and convenient enough to be done in real time. For example detected faces haven’t to be stored as metadata if every system and tool would be capable of calculating them on the fly; speech-to-text metadata can be ignored if spoken words could be searched via audio within the content directly. This is related to the approach of search engines which mainly try to gather information about assets by analyzing content. As metadata can be wrong or simply changed this source of information is often not trustworthy for some workflows and systems.

In the end, we will have to deal with metadata and respect it within our workflows as seamless as possible.

 

References

  • European Parliament Elections 2009
  • http://www.europarl.europa.eu/elections2009/default.htm
  • “Life Without Metadata” (R. Todd Stephens, Ph.D.):
  • http://www.information-management.com/issues/ 20070301/1076522-1.html
  • Adobe Lightroom Killer Tips – “Confessions of a Lightroom Addict” (Matt Kloskowski):
  • http://www.lightroomkillertips.com/2008/confessions-of-a-lightroom-addict

Tags: 3rd International PDF/A Conference, Proceedings, XMP, metadata
Categories: PDF/A, XMP