News headlines & the semantic web

News headlines & the semantic web
BY Chris BlatchfordPlatform Group, Thomson Reuters

The subject of my recent thesis was focused on news headline development, specifically looking at ‘best practice’ elements which form an objective, relevant, descriptive and cognitively cost-effective headline, to see if a best practice framework, or methodology, could be derived from from user preference. Initial results appeared to confirm my hypotheses, but along the way, I began to explore the interesting parallel with our linked data initiatives.

Typically, news headlines tend to follow the Subject–verb–object – Wikipedia, the free encyclopedia (S-V-O) structure utilised in general linguistic typology; for example, taking a collection of Reuters headlines (Subject-Verb-Object), we can see that same structure present –

  • Islamic State battling Kurdish forces in Northeast Syria
  • Pakistan paramilitary raids HQ of major party MQM in volatile Karachi
  • Obama announces changes for student loan repayment
  • PayPal sets up Israeli security center, buys CyActive

Each of these headlines initially follows the S-V-O triple structure, with a little more information appended to the end of the initial triple. Prior research indicates that this structure is somehow more initially ‘obvious’ to human psychology, easier to process cognitively, and interestingly, this same structure is used in the RDF specification, a declarative language influenced by ideas from knowledge representation i.e. language classification. Within the RDF world, information is presented in a Subject-Predicate-Object triple, identical to the linguistic Subject-Verb-Object triple.

For example, if we take one of the headlines above i.e. the PayPal entry, run this through and entity extraction tools such as Calais Viewer and parse the resulting RDF using the W3C RDF Validation Service, I end up with a set of triples that look very similar to the linguistic subject-verb-object triple –

Roughly translated from the URI, this is telling us “PayPal Inc ticker symbol is EBAYP“.
So within the news headline triple, we can see embedded RDF triples based upon the particular entity, and in many cases, RDF triples can be directly transposed into headlines themselves (although in this case the RDF triple is probably not exactly news worthy!).

I’m not breaking any new ground here, simply re-stating what is already known, but conceptually thinking about how we access triples helps us understand how we can derive value from that information. In the same way we use language to retrieve information from a S-V-O triple in our people interactions i.e. Chris works at Thomson Reuters, we can access similar information from RDF triples using a query language such as SPARQL Query Language for RDF.

And that’s really the idea behind the semantic web, to promote a common framework that allows us to share data – making the connection between something ‘real’ like our Reuters news editorial function, and the work our Big, Open, Linked Data (BOLD) team is undertaking around Linked data.


– Writing for the web: An editorial framework for the development of objective news headlines
Download Dissertation_Blatchford (.PDF)

By Chris Blatchford
Submitted to
The University of Liverpool
in partial fulfilment of the requirements for the degree of

Writing for the web: An editorial framework for the develop- ment of objective news headlines
Chris Blatchford
Much has been written on how to write for the web, and in some cases this has extended to news headline development, but for the most part literature has focused on generic best prac- tice writing principles. These principles can be applied to digital news headline development, along with additional subjective factors, which influence the structure of the text; these influen- tial elements are manifested through structural elements such as content type, layout and for- mat, but are on the whole applied randomly. This paper investigates the assumption that these elements can be identified, ranked and ordered in a particular sequence which, when applied to a news headline or story, results in the most objective, consistent and factually accurate news headline for the associated news content – it is also the assumption of this paper that these „best practice‟ headlines are the most preferred by end users. To test this assumption, an inter- view was conducted with Thomson Reuters employees, results of which are presented in this paper, and used to build a best practice framework.
A subsequent survey of Thomson Reuters employees was conducted to test the best practice framework, through generating news head- line options for end users to indicate a preference. Results show that certain best practice ele- ments are more influential on end users than others, certain combinations of elements are more effective than others, and that the content of the news story (financial versus scientific in this case) can influence end user preference, and therefore which element should be considered „high priority‟.

You may also like...