Perceiving the emotions of Pokémon

Ben J. Jennings1

1 Centre for Cognitive Neuroscience, Brunel University London, London, U.K. E-mail: ben.jennings (at) (dot) uk

Download PDF

The ability to reliably perceive the emotions of other people is vital for normal social functioning, and the human face is perhaps the strongest non-verbal cue that can be utilized when judging the emotional state of others (Ekman, 1965). The advantages of possessing this ability to recognise emotions, i.e., having emotional intelligence, include being able to respond to other people in an informed and appropriate manor, assisting in the accurate prediction of another individual’s future actions and additionally to facilitate efficient interpersonal behavior (Ekman, 1982; Izard, 1972; McArthur & Baron, 1983). In the current experiment the consistency with which emotions display by a human female face and a Pokémon character are investigated.

General Methods

The current study employed 30 hand drawings of Pikachu, a first generation electric-type Pokémon character, depicting a range of emotions (images used with permission from the illustrator,  bluekomadori []; based on the video game characters belonging to The Pokémon Company); see Fig. 1a for examples. Also, 30 photo-quality stimuli displaying a range of emotions, expressed by the same female model, were taken from the McGill Face Database (Schmidtmann et al., 2016); see Fig. 1b for examples. Ratings of arousal (i.e., the excitement level, ranging from high to low) and valence (i.e., pleasantness or unpleasantness) were obtained for each image using a similar method to Jennings et al. (2017).  This method involved the participants viewing each image in turn in a random order (60 in total: 30 Pikachu and 30 of the human female from the McGill database). After each image was viewed (presentation time 500 ms) the participants’ task was to classify the emotion being displayed (i.e., not their internal emotional response elicited by the stimuli, but the emotion they perceived the figure to be displaying).

The classification was achieved via “pointing-and-clicking” the corresponding location, with a computer mouse, within the subsequently displayed 2-dimensional Arousal-Valence emotion space (Russell, 1980). The emotion space is depicted in Fig. 1c; note that the red words are for illustration only and were not visible during testing, they are supplied here for the reader to obtain the gist of the types of emotion different areas of the space represent. Data for 20 observers (14 females) was collected, aged 23±5 years (Mean±SD), using a MacBook Pro (Apple Inc.). The stimuli presentation and participant responses were obtained via the use of the PsychToolbox software (Brainard, 1997).

Figure 1.  Panels (a) and (b) illustrate 3 exemplars of the Pokémon and human stimuli, respectively. Panel (b) shows the response grid displayed on each trial for classifications to be made within (note: the red wording was not visible during testing). Panels (d) and (e) show locations of perceived emotion in the human and Pokémon stimuli, respectively. Error bars present one standard error.


The calculated standard errors (SEs) serve as a measure of the classification agreement between observers for a given stimuli and were determined in both the arousal (vertical) and valence (horizontal) directions for both the Pokémon and human stimuli. These are presented as the error bars in Fig. 1d and 1e. The SEs were compared between the two stimulus types using independent t-tests for both the arousal and valence directions; no significant differences were revealed (Arousal: t(58)=-0.97, p=.34; and Valence: t(58)= 1.46, p=.15).

Effect sizes, i.e., Cohen’s d, were also determined; Arousal: d=0.06, and Valence: d=0.32, i.e., effect sizes were within the very small to small, and small to medium ranges, respectively (Cohen, 1988; Sawilowsky, 2009), again indicating a high degree of similarity in precision between the two stimuli classes. It is important to note that the analysis relied on comparing the variation (SEs) for each classified image (reflecting the agreement between participants) and not the absolute (x, y) coordinates within the space.


What could observers be utilizing in the images that produce such a high degree of agreement on each emotion expressed by each stimulus class? Is all the emotional information contained within the eyes? Levy et al. (2012) demonstrated that when observers make an eye movement to either a human with eyes located, as expected, within the face or non-human (i.e., a ‘monster’) that has eyes located somewhere other than the face (for example, the mythical Japanese Tenome that has its eyes located on the palms of his hands; Sekien, 1776) the observers’ eye movements are nevertheless made in both cases towards the eyes, i.e., there is something special about the eyes that capture attention wherever they are positioned. Schmidtmann et al. (2016) additionally showed that accuracy for identifying an emotion was equal when either an entire face or a restricted stimulus showing just the eyes was employed. The eyes of the Pikachu stimuli are simply black circles with a white “pupil”, however they can convey emotional information, for example, based on the positions of the pupil, the orientation of the eye lid, and by how much the eye is closed. It is hence plausible that arousal-valence ratings are made on the information extracted from only the eyes.

However, for the Pokémon stimuli Pikachu’s entire body is displayed on each trail, and it has been previous shown when emotional information from the face and body are simultaneously available, they can interact. This has the result of intensifying the emotion expressed by the face (de Gelder et al., 2015), as perceived facial emotions are biased towards the emotion expressed by the body (Meeren et al., 2005). It is therefore likely that holistic processing of the facial expression coupled with signals from Pikachu’s body language, i.e., posture, provide an additional input into the observers’ final arousal-valence rating.


Whatever the internal processes responsible for perceiving emotional content, the data points to a mechanism that allows the emotional states of human faces to be classified with a high precision across observers, consistent with previous emotion classification studies (e.g., Jennings et al., 2017). The data also reveals the possibility of a mechanism present in normal observers that can extract emotional information from the faces and/or bodies depicted in simple sketches, containing minimal fine detail, shading and colour variation, and use this information to facilitate the consistent classification of the emotional states expressed by characters from fantasy universes.



Brainard, D.H. (1997) The psychophysics toolbox. Spatial Vision 10: 433–436.

de Gelder, B.; de Borst, A.W.; Watson, R. (2015) The perception of emotion in body expressions. WIREs Cognitive Science 6: 149–158.

Ekman, P. (1965) Communication through nonverbal behavior: a source of information about an interpersonal relationship. In: Tomkins, S.S. & Izard, C.E. (Eds.) Affect, Cognition and Personality: Empirical Studies. Spinger, Oxford. Pp. 390–442.

Ekman, P. (1982) Emotion in the Human Face. Second Edition. Cambridge University Press, Cambridge.

Izard, C.E. (1972) Patterns of Emotion: a new analysis of anxiety and depression. Academic Press, New York.

Jennings, B.J.; Yu, Y.; Kingdom, F.A.A. (2017) The role of spatial frequency in emotional face classification. Attention, Perception & Psychophysics 79(6): 1573–1577.

Levy, J.; Foulsham, T.; Kingstone, A. (2013) Monsters are people too. Biology Letters 9(1): 20120850.

McArthur, L.Z. & Baron, R.M. (1983) Toward an ecological theory of social perception. Psychological Review 90(3): 215–238.

Meeren, H.K.; van Heijnsbergen, C.C.; de Gelder, B. (2005) Rapid perceptual integration of facial expression and emotional body language. Proceedings of the National Academy of Sciences 102: 16518–16523.

Russel, J.A. (1980) A circumplex model of affect. Journal of Personality and Social Psychology 39(6): 1161–1178.

Schmidtmann, G.; Sleiman, D.; Pollack, J.; Gold, I. (2016) Reading the mind in the blink of an eye – a novel database for facial expressions. Perception 45: 238–239.

Sekien, T. (1776) 画図百鬼夜行 [Gazu Hyakki yagyō; The Illustrated Night Parade of a Hundred Demons]. Maekawa Yahei, Japan.

About the Author

Dr. Ben Jennings is a vision scientist. His research psychophysically and electrophysiologically investigates colour and spatial vision, object recognition, emotions, and brain injury. His favourite Pokémon is Beldum.

Check other articles from this volume


Why (and how) Superman hides behind glasses: the difficulties of face matching

Kay L. Ritchie1,2 & Robin S. S. Kramer1

1 Department of Psychology, University of York, York, UK.

2 School of Psychology, University of Lincoln, Lincoln, UK.

Emails: kritchie (at) lincoln (dot) ac (dot) uk; remarknibor (at) gmail (dot) com

Download PDF

As a mild-mannered reporter, Clark Kent is able to blend into human society without drawing much attention to himself. Although he utilises several methods of disguise (clothing, posture, hair style), perhaps his most famous is a simple pair of glasses (see Figure 1). We know that wearing glasses can make you look more educated and intelligent (e.g., Hellström & Tekle, 1994), but for Superman, the goal is primarily to hide his true identity. Of course, one of the cornerstones of enjoying superhero fiction is that we suspend our disbelief and try to ignore the obvious questions (for example, how useful or plausible is it that Squirrel Girl can communicate with and understand squirrels?!). However, the scientist inside us sometimes breaks through and we are given the opportunity to investigate. Here, we tackle the question that comic book fans have been asking for decades – could Superman really hide his identity using a pair of glasses?


Figure 1. Clark Kent’s transformation into Superman. [Image downloaded from Flickr; labelled CC BY 2.0.]

Photos of faces appear on almost all official forms of identification, from passports and driving licences to university staff and student cards. We have this intuition that our face is a good way to identify us, but a growing body of evidence suggests otherwise. Of course, if we consider the people we know personally (friends, family, partners), it’s almost impossible to find a picture of them that you wouldn’t recognise. Even in their passport photos, which could be up to ten years old in the UK, you would probably recognise them straight away. Studies have shown that we can even recognise people we know from very degraded images, such as CCTV footage (Burton et al., 1999). Therefore, it’s no surprise that the presence or absence of a pair of glasses wouldn’t stop you from being able to recognise your sister or husband. This amazing tolerance for the way a familiar person’s face can vary across different photos leads us to think we are good at recognising all faces. In fact, we are significantly worse when asked to consider unfamiliar people’s faces (e.g., Clutterbuck & Johnston, 2002, 2004), even when the photos are taken from real university ID cards (Bindemann & Sandford, 2011).

A common task used in psychology studies to examine photo-ID-style face identification is a face matching task. Typically, participants are shown two images side-by-side and asked whether the photos show the same person or not. Usually, only half of the image pairs show the same person in both photos, although depicted in different poses, lighting, expressions, etc. In the remaining image pairs, the two photos show two different but similar-looking people (e.g., two young, brunette women).

Participants do very well (often perfectly) at the task when they are familiar with the person (or one of the people) pictured, but are much worse when they are unfamiliar with the people (see Figure 2). When we see two photos of someone we know, we even seem to be blind to how difficult the task would be for people who don’t know that person, over-estimating other people’s performance with faces we recognise (Ritchie et al., 2015).

So why are we so bad at this task for people we are unfamiliar with? To answer this, we need to start with why we are so good at it for people we are familiar with.


Figure 2. Example face matching task images. Top: Two photos of the same familiar person. Despite changes in pose, lighting, and expression, it is seems easy to tell that the two photos show the same person. [Images downloaded from Wikimedia Commons; labelled CC BY-SA 3.0 (left) and CC BY 2.0 (right).] Bottom: Two photos of the same unfamiliar person. It is more difficult to tell that the two images show the same person when we are not familiar with them. [The person pictured has given consent for her images to appear here.]

While we are getting to know someone’s face, we experience a lot of variation in their appearance. We see them from different angles, in different lighting, wearing their hair in different ways, etc. This variability seems to be important for learning new people (Murphy et al., 2015; Ritchie & Burton, 2016). But this same variability gets in the way when we are presented with two images of an unfamiliar person – the photographs can look very different and this might lead us to think they show two different people.

Why is any of this actually important? Coming back to the example of photo-ID, try to consider the task given to Jenny, a fictional passport controller. Jenny’s job is to decide whether the person standing in front of her is the same person as the one pictured in the passport they hand over. The passport photo may be up to ten years old, and more importantly, Jenny has never seen this person before. We know already that this unfamiliar face matching task is a hard one for regular people who do not do this as a routine part of their job, but researchers have also shown that even passport controllers do not outperform students on this sort of task (White et al., 2014b).

Now let’s get back to Superman and his glasses. In our new study (Kramer & Ritchie, 2016), we showed participants pairs of images where both wore glasses, pairs where neither face wore glasses, and ‘mixed’ pairs where one wore glasses and one did not. Half of the pairs in each of these image conditions showed the same person, and half depicted two different (but similar-looking) people. Participants were simply asked to indicate whether they thought the images were of the same person or two different people. Importantly, we only used images of people who were unfamiliar to our participants (and we confirmed this at the end of the study). In addition, all our images were collected from Google Image searches and showed natural variation in pose, lighting, etc. (see Figure 3 for an example of face images that naturally vary).

Figure 3. Images of Brandon J. Routh with and without glasses. The image on the left shows him as Clark Kent, in the film Superman Returns (2006); the image on the right is more recent and familiar to fans of the TV series Arrow (2012–present) and DC’s Legends of Tomorrow (2016–present). Of course, in our study, we only used images of unfamiliar people. [Left image downloaded from Flickr; labelled CC BY-NC-SA 2.0. Right image downloaded from Wikimedia Commons; labelled CC BY 2.0.]

When neither image wore glasses, accuracy (percentage correct) was 80.9%, and when both images wore glasses, accuracy was 79.6%. Statistically, performance in these two conditions did not differ, and these levels of accuracy are in line with those reported elsewhere (e.g., Burton et al., 2010). However, in the ‘mixed’ image condition, where one face wore glasses and the other did not, accuracy dropped to 74%. This drop in performance (although it sounds quite small) was statistically lower than in the ‘no glasses’ and ‘glasses’ conditions. This means that we can be confident that our ‘mixed’ condition really did make people worse at the task. For this reason, Superman may have hit upon a disguise that isn’t just easy but might actually work. By simply donning a pair of glasses, he may well make it that little bit harder for strangers to tell that he also doubles as a reporter living among them.

This effect of glasses might be hugely problematic for photo-ID in security settings. In the USA, people are allowed to wear glasses in their passport photos but may not be wearing glasses when they go through passport control. The 6% drop in accuracy found in our study, which could also be phrased as an increase in misidentifications, quickly scales up to thousands of potential mistakes when we consider the vast numbers of people going through passport control every day.

This all seems fairly bleak when it comes to photo-ID so many researchers have been working on ways that we might improve the situation. One recent suggestion has been to provide multiple images (White et al., 2014a; Menon et al., 2015). By including several photographs as reference images for comparison, instead of just the one typically found on IDs, scientists have produced significant improvements in accuracy. This is an area of ongoing investigations and other types of improvements to photo-ID will continue to be explored.


Bindemann, M. & Sandford, A. (2011) Me, myself, and I: Different recognition rates for three photo-IDs of the same person. Perception 40: 625–627.

Burton, A.M.; Wilson, S.; Cowan, M.; Bruce, V. (1999) Face recognition in poor quality video: Evidence from security surveillance. Psychological Science 10: 243–248.

Burton, A.M.; White, D.; McNeill, A. (2010) The Glasgow Face Matching Test. Behavior Research Methods 42: 286–291.

Clutterbuck, R. & Johnston, R.A. (2002) Exploring levels of face familiarity by using an indirect face-matching measure. Perception 31: 985–994.

Clutterbuck, R. & Johnston, R.A. (2004) Matching as an index of face familiarity. Visual Cognition 11(7): 857–869.

Hellström, A. & Tekle, J. (1994) Person perception through facial photographs: Effects of glasses, hair, and beard on judgments of occupation and personal qualities. European Journal of Social Psychology 24: 693–705.

Kramer, R.S.S. & Ritchie, K.L. (2016) Disguising Superman: How glasses affect unfamiliar face matching. Applied Cognitive Psychology: advance online publication (DOI: 10.1002/acp.3261). Available from: (Date of access: 14/Sep/2016).

Menon, N.; White, D.; Kemp, R.I. (2015) Variation in photos of the same face drives improvements in identity verification. Perception 44(11): 1332-1341.

Murphy, J.; Ipser, A.; Gaigg, S.B.; Cook, R. (2015) Exemplar variance supports robust learning of facial identity. Journal of Experimental Psychology: Human Perception and Performance 41: 577–581.

Ritchie, K.L. & Burton, A.M. (2016) Learning faces from variability. Quarterly Journal of Experimental Psychology: advance online publication (DOI: 10.1080/17470218.2015.1136 656). Available from: http://www.tandfonline. com/doi/abs/10.1080/17470218.2015.1136656 (Date of access: 14/Sep/2016).

Ritchie, K.L.; Smith, F.G.; Jenkins, R.; Bindemann, M.; White, D.; Burton, A.M. (2015) Viewers base estimates of face matching accuracy on their own familiarity: Explaining the photo-ID paradox. Cognition 141: 161–169.

White, D.; Burton, A.M.; Jenkins, R.; Kemp, R.I. (2014a) Redesigning photo-ID to improve unfamiliar face matching performance. Journal of Experimental Psychology: Applied 20(2): 166–173.

White, D.; Kemp, R.I.; Jenkins, R.; Matheson, M.; Burton, A.M. (2014b) Passport Officers’ errors in face matching. PLoS ONE 9(8): e103510.


Dr. Kay Ritchie wears glasses on a daily basis. But is adamant that she has no secret identity…

Dr. Robin Kramer frequently collaborates with Bruce Wayne in various crime-fighting adventures but states for the record that the current research is neither funded by Wayne Enterprises nor does it represent any ulterior motives of Batman.

Check other articles from this volume