With the arrival of virtual reality technology, the Star Trek Holodeck does not appear to be so far into the future. The Holodeck is a good, new metaphor for the human imagination, aka Memory Theater.
In face-to-face communication, two human beings continuously react and adapt to the content of non-verbal signals in our facial expressions. For example, a good salesperson will adapt her sales pitch, in mid-stream, based on the emotional responses played back from the prospect’s face. In the future robots will be able to do the same thing—but right now they are still Artificial Infants.
Human-machine interaction, such as watching television or looking at the screen of a computer, smartphone or tablet, is like face-to-face communication in that a human user reads a screen much like they read a face. We rapidly take in information and images conveying emotional content on a non-verbal level, in a process that is highly mediated by System 1, or the unconscious mind.
And, just as humans can deconstruct screens into meaningful chunks of useful information through pre-conscious processing, machines can deconstruct faces with eye-tracking cameras, facial tracking software and AI. There’s a reason the most important meetings are held face-to-face—because face-to-face communication is the most effective way evolution could come up with for reading human minds.
An interesting theory about how we learn to do just that is described in the book The First Idea, by Stanley Greenspan, a child development psychologist, and Stuart Shanker, an evolutionary biologist. They describe the process of training a brain to be human as co-creation.
On Day One, we start by staring into our mother’s face.
As our face resonates with the vibrations of expressions appearing in her face, she (or the caregiver) teaches us:
This is the difference between love and fear.
These are the emotions you are feeling—anger, disgust, sadness, fear and joy.
This is the law of causality—if you smile, you can make Mommy smile!
These are the sounds that go with the different parts of You–“nose”, “mouth”, “chin”, “hand.”
These are the names of different things in a picture book —“cat”, “tree”, “train”, “running”, “climbing”, “working”, “pretending.”
And the learning process continues, by example, through songs and stories, with trial and error, from other people we meet, as we move to higher and higher levels of skill in reading other people’s minds—and learn, through co-creation, how to share with others our own memories, thoughts and feelings, voices and images from our inner world of experience.
At this stage of development, AI in the business world is like an infant learning to correlate the many sequences of signals and body actions needed to live and act in the physical world. It’s focused on improving System 1 performance.
Learning how to live in the social world of people requires System 2.
For example, in media applications, AI is fast developing its System 1 abilities in addressable media, delivering the right ad to the right audience at the right time, optimizing opportunities-to-see.
But AI is a long way from being able to tell a story, or interpret the meaning of a movie, or adapt the sales message in an ad to the sequence of emotions displayed on a consumer’s face as they stare into a screen. I, personally, think AI will get there—we will someday have sentient robots. But getting there is not a “big data” problem. It’s a training problem.
Facial tracking software, for example, is still at the baby stage of recognizing the primary colors of emotions—joy, anger, sadness, disgust and fear. But humans can recognize a color pallet of hundreds, if not thousands, of emotions on a face—disappointment, curiosity, confusion, skepticism, wistfulness, irony, and so on. How can AI be taught to recognize these nuanced, compound emotions?
The film editor Walter Murch, who has worked on such movies as American Graffiti, The Godfather Part II, Apocalypse Now, Ghost, and The English Patient, describes the work method of keeping track of emotions in the film he is working on—
“But in addition to the usual procedures, I also would select at least one representative frame from every set-up and take a still photograph of it off the workprint…and put them into panels arranged according to scene…The most interesting asset of the photos for me was that they provided the hieroglyphs for a language of emotions. What word expresses the concept of ironic anger tinged with melancholy? There isn’t a word for it, in English anyway, but you can see that specific emotion represented in this photograph.”—Walter Murch
When we research a piece of advertising film at Ameritest, we also use still photographs, to probe the emotional meaning conveyed by the film. We have respondents from the target audience sort these pictures into different categories of meaning, based on the specific emotions and strategic ideas intended by the advertiser.
These data are reports from the interior, from inside the mind of the audience—the most that we can actually know about what another person actually thought and felt as they watched a movie.
Moreover, the data is very granular, covering the moment-by-moment experience that each viewer in the audience has filtered for and how it is labeled in memory.
And, because each picture represents a fixed point of time in the story, it’s easy to correlate with the moments when tracking cameras capture different expressions playing on the faces of the people watching the movie.
Therefore, for training purposes, the film strip can be used as a zipper, linking together two fundamentally different kinds of data:
(self-reported data from inside a human brain) + (what a camera can see on a face from the outside)
This is what a training loop for your AI robot might look like—
So, how can we teach an advertising robot to recognize more complex human emotions?
By watching movies together.
Thanks to the rapid growth in technology, the advertising research ecosystem is expanding with a bewildering variety of new ideas and methods. Users just want to know, “Which is the right approach for the job I want it to do?” Continue reading
For the Experiencer Self, each “now” is equally important, while the Remembered Self focuses on peaks and endings. So, how do you decide which moment is the right time to take a Selfie?
Imagine you were going to make a documentary called, “A Day of My Life.” How would you do it?
You could wear a body camera and film every second of your day; that would be very long and take a whole day to watch. Next, you might make a sequel movie of you watching the movie of how you spent that day in your life.
Instead of video, you might make a documentary movie like Ken Burns and use still pictures. That works because many of our memories come to mind in lightning flashes of still images, and not full motion video.
With your smart phone, it’s easy to take selfies juxtaposed to things and scenes that you find interesting as you move through your day.
You could take a selfie looking back from the mirror as you brush your teeth. Click! A picture of your closet as you pick out the clothes you’re going to wear today. Click! A picture of what you’re eating for breakfast. Click! But again, by the end of the day, there are still hundreds of pictures—far too many to post on your Facebook Timeline.
As the curator of your story, how would you sort through all the pictures?
You could sort them chronologically.
You could sort through them by categories—work, play, friends, things I like, things I don’t like.
Or, you could sort through them by emotion—happy, sad and angry moments.
As you sort through the pictures, you realize that many of the pictures look very similar. Perhaps one or two will be enough to document the parts of your day, and you can delete the rest from your camera’s memory.
These questions are the hard problems the episodic memory system in your brain rapidly solves all the time when making the autobiographical movie of your life.
Along with these images, your episodic memory also sorts through every sound and word you heard, all the smells you noticed, all the things that physically touched you in order to record all that you experienced. It’s a lot of information to store!
One theory of memory is that your Remembered Self doesn’t actually record experience. Rather, it documents the meaning (or lessons learned) from each experience.
In the words of T.S. Eliot—
“We had the experience but missed the meaning, and approach to the meaning restores the experience in a different form.”
How the brain converts experience into meaningful memories is one of the great mysteries.
To gain insight into that mystery, researchers have, in general, two approaches. One is observing someone’s reactions to a new experience or stimuli and correlating that reaction with past and future behavior. The second is testing what someone remembers or takes away from an experience or “episode” after it’s over.
In advertising and media research, there are a number of new techniques that observe audience reactions to creative in real time, including facial response, heart rate, skin conductance, EEG, eye-tracking, etc. Collectively, you might think of these techniques as measures of the Experiencer Self—that self that lives in the now.
There are also a variety of classic approaches that measure the effectiveness of advertising and communications by interviewing respondents after exposure, in order to make predictions about future advertising ROI, based on the memory effects. Researchers look to likability, recall, message takeaway, changes in perceptions, purchase intent, and brand linkage. These techniques focus on probing the Remembered Self, or what the mind kept after the experience was over.
The research we do at Ameritest primarily uses the after-exposure research method. We interrogate the short-term memories left behind by a video ad, for example. In part, we use our Picture Sorts® (Flow of Attention, Flow of Emotion and Flow of Meaning) to measure, image by image, the various factors that impact the long-term, brand-building memories left behind.
To gain insight into how the Experiencer Self becomes the Remembered Self, let’s combine two research approaches. Facial response and picture sorts and look at a case study.
In the limited number of experiments where I’ve had access to both kinds of data, I’ve found a lot of similarities.
As an example, here is a Super Bowl ad, called “The Dog Strikes Back” that we tested with both approaches—
Below are two graphs of audience response to this ad. One graph is observational—a facial response measure of overall positive reaction to the ad as the audience watched the ad. The other graph is the audience reconstruction from memory, frame-by-frame, of their positive emotional responses to the images in the ad. Can you tell which graph was constructed from memory?
The nuanced graph at the top shows the level of positive emotion people experienced at different times in the ad, re-constructed from their memories of the ad.
People are actually quite good at remembering how strongly they feel about an experience (but not why), especially after it recently happened.
In general, both approaches seem to tell the same overall story. But it only seems that way.
If you dig below the level of overall positive or negative response, to get at the meanings associated with these emotions, you get a different story.
In Ameritest’s test of this ad, one of the things looked for were the moments when the audience remembered feeling “happiest.” We then compared this to the observational data that the experts at Real Eyes identified as “happy” faces.
The observational data, in the bottom chart, shows that the moment when the audience shows the happiest emotions on their faces comes about a third of the way into the spot, when the dog begins to work out. That was the point at which the audience began to anticipate the joke that was coming.
The happiest memory, however, comes at the end of the story.
We cannot, after all, interpret the true meaning of an experience, or episode, of our lives until we know how things end. For the ending provides the context for interpreting the meaning of the experience.
The meaning of this ad was that VW now sells powerful cars. So, the payoff moment for memory is the image of the happy, slimmed down, newly-empowered dog chasing the VW.
And that is the moment when VW chose to place its brand-identifying “selfie” in the ad.
The meaning of an image is determined by layers of memories that filter our perceptions. An image must be interpreted in the context of an ad, and an ad in the context of a program, and a program in the context of a culture. Continue reading
There were two main ideas in Nobel Prize winner Daniel Kahneman’s best-selling book Thinking Fast and Slow. The first idea was that there are two modes of thinking—System 1 versus System 2. The second was the distinction between our Experiencing Self and our Remembered Self. By putting these two ideas together, you get a nice framework for thinking about the difference between Brand Positioning and Brand Image. Continue reading
A brand’s “image” is an attempt to project a coherent identity, with a unique personality and distinctive style. A brand’s “positioning” is an attempt to “own” one or two key words in the mind of the consumer that sets your brand apart from your competitors. These are two sides of the same coin. While a picture may be worth a thousand words— a word is also worth a thousand pictures.