Advertising testing remains controversial for its tendency to consensus, especially among creatives, and as it becomes even more automated - Faris Yakob tests the question of testing.

It is a truth universally acknowledged, that advertising creatives hate testing. In his classic guide to being one, Luke Sullivan makes his feelings clear: 

“There are few things I hate more than listening to focus group people complain about my ideas. I think pretesting concepts by showing rough layouts and storyboards to people off the street is a bane to the industry.” [Hey Whipple, Squeeze This] 

He believes this because “when it’s used to determine how to say it {rather than what to say}, great ideas suffer horribly.” He is particularly dismissive of “permission research” in which “agencies show advertising concepts to customers and ask if they like them or not” in order to get ‘permission’ to run them. [As a point of information, an ARF meta-analysis of pretesting showed that likability was the most predictive of all historically used metrics, likely because it doesn’t ask consumers to explain how advertising works or predict their future behavior].

This point of view has historically been enshrined as a point of principle at Wieden & Kennedy, who were (are?) famous for refusing to do creative testing (the power of strong brands applies to agencies, too).“We don't focus-group because pretesting gets you a consensus view. Someone likes red, another blue, so you get beige.” [W+K Global Book, 2010].

Like all debates in advertising, this one is older than anyone currently working in the industry. Reflecting on decades of research, the IPA published the classic “Testing to Destruction” in 1974 and the author, Alan Hedges, had a very specific goal, one he reflected on when he wrote the updated preface: “In one respect my book has been a signal failure. I sought to delete the word ‘testing’ from the advertising research vocabulary. A quarter century on it is still in widespread use.” It is arguably more widespread today, another quarter century later, so perhaps using it in the title was a creative misstep, strategically speaking.

In the same piece, from 1997, he laments that “over the past 10 years or so we have seen a serious loss of collective confidence in our society - a loss of faith in our ability to shape the future to our own ends, increasing short-termism and a withering of vision” and thus an increasing reliance on metrics, which strongly suggests we’ve been recycling this doom loop narrative for decades, despite the halcyon way in which the 90s are now viewed retrospectively. The Golden Age fallacy always applies because there never was one. 

Hedges’ point was that claiming to ‘test’ advertising without actually running it was inherently misleading because the world is complex and inherently unpredictable. He considers the only acceptable use of research during the creative development process to be ‘illumination’ about consumers through qualitative feedback for creatives to consider and despairs of the use of “numbers-based testing services”. 

Despite his insight that we were moving further towards a short-term metrics-driven financialised economy, he valiantly attempted to hold the global corporation back, to fight for the side of nuance and creative risk against the inherent fungibility of numbers, just as everything was about to go digital and global and was thus, inevitably, unsuccessful. Super massive corporate entities cannot communicate functionally across hundreds of markets and teams except with numbers, so numbers will always prevail and thus it’s necessary to find ways to get to better, more directional, more predictive ones. That said, it’s important to understand the difference between evaluative and diagnostic research. Hedges argues that “evaluative judgements about advertising can only to a limited extent be based on research; the best and most useful basis comes from a proper programme of research which is ‘diagnostic’ in the sense that it attempts to heighten understanding rather than to provide a verdict.” 

When we develop tools and frameworks for helping clients consider creative work, we build them from the best marketing science, academic and industry thinking available. When we develop them for FMCG conglomerates that own dozens of brands, they need to be broadly applicable. However, we constantly reinforce they are a tool for better creative discussions that get to better ideas. Whether numeric or not, we cannot outsource judgements to research, as much as we’d like to. Critical thinking is required.

As every generation of practitioners works out, some things change faster than others, and people change most slowly of all. What drives a person has remained the same since we became people, but the context has rapidly changed, even if we are still mostly using email to coordinate and deliver work. The guardrails and incentives of the industry depend on what we value and measure. As WARC has pointed out, the gap between what is awarded in creative and effectiveness now has an overlap of only 18%. 

Hedges believed that “striving for attention at all costs takes over from relevance of message” but the attention environment is an order of magnitude more cluttered and complex than it was in the 1990s. As Hedges establishes in his opening line, the function of research is “to make advertising expenditure more effective” and more recent developments in attention tracking have shown that it clearly does that, but that ‘award-winning’ creative no longer seems to. 

We now have more sophisticated methods for measuring and modeling creative work, at various stages of development. Beyond the top boxing of a Likert scale rendering an opinion into a weighted but perhaps misleading metric (since it leaves out a lot of data from the test itself) we have various different research methodologies being used to pry more insight out of people, from eye and facial tracking, to implicit association response times in online surveys, machine learning and, of course, AI, and turn them into numbers. 

Kantar’s LINK provides a tripartite score based on how the ad is predicted to drive awareness, short-term sales impact and brand equity, through various proxies, and has the validation of having the most tests under its belt to calibrate. System1 has equivalent metrics in their Star, Spike and Fluency ratings. 

Newer entries into the category are pioneering new approaches and business models to create new value propositions for clients. Zappi worked with Pepsi to scale their approach to develop “solutions that could be applied to the thousands of pieces of content that our brands create daily” by putting a global platform together to get fast consumer feedback at every stage of development. Its Amplify product uses in context creative through forced exposure to better replicate advertising in the wild. 

As the volume of creative increases with every new channel (whether or not it should) and the time to do it decreases, budgets are put further under pressure, the key benefits many new model research companies offer are cheaper, faster and (hopefully) better, by using up to date thinking, research and analysis techniques. AI clearly comes into play here and it powers creative effectiveness company DAIVID [get it?]. 

DAIVID also measures attention, emotion and brand using eye tracking, facial coding, and surveys and then uses that to train and continually update their AI, which allows it to make assessments and predictions. As practitioners we should be cognizant of the tools at our disposal and have experimented enough to develop a point of view. 

Perhaps machine learning will eventually uncover an insight that makes it into an IPA effectiveness award. As it was generated through statistical inference, it may have remained invisible to a human planner but we should also remind ourselves what planning was for. It was to keep us as close as possible to the consumer, as well as to impose the rigor of efficacy. An often overlooked and important role for creative research is de-risking advertising to balance out the unfortunate flavors of homogeneity that still dominate the industry and avoid another one of those Pepsi disasters. The opposite of an insight is a blindspot. 

Ultimately, as Hedges intuited, the word test is misleading. To test something is to try it and see if it works, which means creating the effect you are looking to see in the world. You cannot run an experiment on the world in ChatGPT; it is only a blurry snapshot of humanity’s utterances, not the world itself, but you can still learn things there. True ‘tests’ in advertising are done by spending money, in markets and channels, against control groups, over time, and measuring the intermediate and commercial effects.

The only legendary creative to have been a researcher, David Ogilvy, was, perhaps unsurprisingly, the exception to the aforementioned universally acknowledged truth. He was a passionate supporter of research and suggested it led him to good ideas and stopped him making “horrendous mistakes”. He was equally bullish on testing and favored a holistic approach: “The most important word in the vocabulary of advertising is TEST. If you pretest your product with consumers, and pretest your advertising, you will do well in the marketplace… Test your promise. Test your media. Test your headlines and your illustrations. Test the size of your advertisements. Test your frequency. Test your level of expenditure. Test your commercials. Never stop testing, and your advertising will never stop improving.”