For publishers, it's a question of semantics
Through semantic technology, publishers are creating content that's easier to mold into new products and easier for users to find.
In the last few years, bigger publishers have invested in semantics, but now semantic technology is much more mainstream, particularly as vendors offer cheap solutions that enable automated tagging of metadata on content. Instead of editors manually saying an article is about trucks, semantic tools can do it for them.
For the uninitiated, the semantic web is a framework of meanings (or metadata) on the Internet, allowing machines to understand the context of information more like humans. Metadata, kind of like data DNA, provides the backstory of information. So, instead of a computer seeing a keyword like “truck,” computers can use metadata to understand what the word actually means, when and where it was created, and how it might relate to words such as “Ford” and “car.”
For media companies, whose businesses are built on content and context, using semantic technology to better translate what content means to the web and other emerging platforms is a no-brainer. Semantic technology can be integrated into a content management system and can automatically generate metatags from the content editors input. Publishers can employ many semantic solutions from providers such as Nstein (now owned by Open Text) and SmartLogic.
But while automated tagging tools were once a luxury among large publishers, a new generation of open-source tools are helping smaller publishers augment their manual tagging methods. For instance, OpenCalais, a semantic web service from Thomson Reuters, can be used by publishers for free up to a certain threshold. Value Added News offers a free system for organizing metadata, using the hNews format developped by the Associated Press and Media Standards Trust.
Here are four ways publishers can harness semantic metadata tools to improve their digital initiatives.
For many publishers, improving search results is the most obvious starting point for leveraging semantic technology. Search engines mine the metadata of content in order to improve search results.
“The underlying principal of semantic technology and applying it to publishing is giving your content enough structure and enough metadata so that the machines out there doing the legwork can more easily find and make sense of their content," said Rachel Lovinger, content strategy lead at the interactive agency Razorfish.
Improving search results was the primary driver for Morris Communications's decision to launch a major semantic initiative, as part of a switch to a new Drupal CMS. Morris is using OpenCalais to tag the archives of its 13 daily newspapers in an effort to “monetize these stories that have really been sitting around collecting dust,” said Ryan Foust, principal engineer at Morris DigitalWorks, the company's technology arm.
Morris is also tagging new stories as they are published, with plans to eventually use the tagging system to build robust topic pages.
“Using OpenCalais, we're getting more users to our site and we’re holding onto those users longer ... and we want to keep them on our site as long as we can,” Foust said.
Improving searchability becomes even more important as publishers expand into mobile platforms, said Tom Wilde, CEO of RAMP, a content optimization company. "The form factor of mobile means you're even more constrained in what you can tolerate in terms of a search experience,” he said.
Flexible content for mobile — and beyond
Mobile platforms are accelerating a trend to focus more on the content itself, not the delivery method, said Wilde. Having well-marked-up content is good “raw material” for shifting paradigms of consumption, he said.
“A reality is that the containers of the content are no longer the most important thing … It’s the objects within those containers,” Wilde said. “That’s putting a lot of pressure on publishers to make sure that their content is semantically seen on the web.”
To illustrate how raw data can be molded into new "containers," look at The New York Times, which publishes its tags to the LinkedData community and allows developers to build applications from its content. Thanks to an application tracking word trends in NYT articles, a user could find out just how many times the publication overused the word "hipster" this year.
Lovinger said having meaningful data helps publishers create new business models, such as content syndication or e-commerce. “To me the main benefit of using semantic technology in publishing is that it helps get the content in better shape structurally and gives it more meaningful metadata, so that publishers can then relate new content products more quickly and easily,” she said.
When properly tagged, archives are readily available to be repurposed, said Seth Earley, CEO of consultancy Earley & Associates, Inc. He gave the example of broadcast companies that can tag and sell excess B-roll footage to stock companies.
"Being able to label that piece of content in such a way that we understand how it can stand on its own and how it can be used in other formats is a way of getting greater utilization of content,” Earley said.
In a recent blog post, MediaShift's Mark Glaser noted that metadata actually eliminates the need for paywalls by opening up content to new revenue opportunities. “Structuring data creates an environment in which invention becomes possible ― in the same way, for example, that library catalogues do," he wrote.
Glaser noted that the Associated Press uses the hNews system in order to better track its news around the web. The company can glean better metrics in order to charge more accurately and work out revenue-sharing agreements for advertising.
Better site functions
Wilde notes that there are two halves to a complete audience engagement strategy: first, help them find you, and second, maximize engagement once they do. In addition to improving SEO, semantics can help publishers improve the internal search functions on their site, as well as the way content is organized and presented ― through “related content” boxes, for example, or topic pages. Topic hubs (e.g. the San Francisco Chronicle's website) are becoming the model for many publishers, because they get good search juice in Google and extend users' time spent on sites.
Some publishers leverage curation platforms to power “related content” suggestions and topic pages, either through automation or a mix of automation and curation. It could be as simple as displaying the tags from OpenCalais, essentially creating a “related content” page, or even simpler with semantic tools like Zemanta that allow bloggers to manually find related content to link to.
Advertising could also benefit from the aggregation enabled by semantic technology. For instance, the most successful ad on Telegraph.co.uk sits next to generated scores on the topic page for the Manchester United soccer team, said Toby Conrad, director of North America for SmartLogic, which provides the semantic platform used by the Daily Telegraph.
Semantic technology has the potential to relieve staff burden on both the technology and editorial sides. While deploying semantic technology takes development work, it's a lot less than creating a tagging system from scratch. As Foust explained, it's the difference between sequestering three engineers for three weeks to develop something versus having one developer knock out integration of a third-party platform in a week.
Once the technology is in place, publishers should begin to see more efficient content creators.
“A number of the larger organizations began by asking journalists to manually tag content to push together topic pages, and then realized the manual process wouldn't scale,” said Conrad. “That’s what we’ve seen at the higher end and I think now it’s starting to trickle through to the smaller organizations."
In addition to making journalists more efficient, semantic technology might make them more productive as well. Tom Tague, the OpenCalais initiative lead at Thomson Reuters, believes journalists can better leverage tools such as OpenCalais to comb through documents for research and investigative purposes. For instance, reporters could use the CalaisViewer, which is an entity extractor (a fancy name for a tool where you input text and get back the metadata) in order to extract relevant information about a topic. Tague said the tool could help reporters spot nepotism in contracts, for example, or important relationships in death or wedding announcements.
"It can't write the article for you, but in the future this sort of technology might be able to paint a heat map where you might want to investigate some of your resources,” he said.
What's next for metadata
The next step for semantic technologies in publishing: how about audience targeting?
“We’re seeing some interest now from media firms and publishers to analyze the website of their competitors and peers because it helps them to understand the coverage that they are providing,” Conrad said.
Whatever the objective, publishers should carefully evaluate their options before investing heavily in semantic technology.
Razorfish's Lovinger said she looks at what her publishing clients are trying to accomplish and determines if semantic technology can help. “I don't feel like anyone has to be using semantic technology for the sake of semantic technology,” she said.
On the other hand, publishers who stay away from semantic technology tools could lag behind their competitors. "We're rapidly going to get to a point where if you're not displaying your metadata on your page somewhere, you're going to start to lose the SEO war,” OpenCalais' Tague said. "It's going to become the standard."