Emotion Recognition

The following is a narrative description of the project and the story behind it. For a brief technical analysis you may refer to the projects GitHub ReadMe.

It's late Spring in a college apartment dimly lit by a TV. I’m exchanging the evening homework study for a crack at inspiration, watching Ex Machina. I was a film snob and a sophomore coder coming off the back of a failed project. I needed ideas, but tonight was just another movie. Enjoying cinema that wasn’t another cape flick.

Our protagonist Caleb enters the glass chamber for his third interview with Ava. His mannerisms and body language indicate a strange disbelief at the android on the other side of the cell and his feelings towards her. Ava makes an attempt to reach him. Are you attracted to me? Caleb doesn’t know how to respond. You give me indications that you care. Microexpressions. The way your eyes fix on my eyes and lips. The way you hold my gaze.

In the LCD blue light I opened myself to late night thoughts on AI. Would it exhibit full automata? Does an AGI have rights? If it can not feel in the same way that we do, through biochemical hormonal processes, then could it at least attempt to understand emotions? If it could, then would it change its output in the same way that our conversations evolve with the mood and atmosphere surrounding the discussions framework? I believed that AGI was capable of appreciating the full spectrum of human emotion, thought, and creativity.

I found Caleb’s attachment to Ava unjustified. Why would someone share these feelings with an android? Even if Ava exhibited full autonomy so as to say that she is merely an artificial human, why would he fall for someone that he’s only met a handful of times and knows so little of? Given that it is later revealed that Nathan, Ava’s creator, designed Ava around Caleb’s porn history and chose Caleb for his moral compass and perhaps his relationship situation, is it not out of the question that Nathan perhaps exploited Caleb’s emotions for his experiment? That Ava was in fact, able to understand Caleb’s emotions at such a high level that it could be weaponized to her advantage.

It stuck to me. I enjoyed the dystopian thriller, the emotional manipulation of our characters, the artistic expression in color identity and theological representations. But that word, microexpressions, lingered like morbid curiosity. She was able to manipulate him because she had an understanding of his emotions, but how did she understand his emotions? Was it purely from microexpressions or are there other ways in which we express how we feel? In the present sense, what tools do LLM’s use today to utilize our emotions or do such tools even exist? In the quiet dark of that late night college apartment, I then found my new project.

Skin-Deep

I went straight to the drawing board. Like any project I’ve pursued I was nose diving into unfamiliar territory. I knew large language models were essentially vectorizing text as part of some training measure, but whether or not emotion could be represented as an additional input vector was unknown to me. What I could infer however is that before I could ever enhance a large language model through this capacity, I would first need the tools necessary to understand human emotion.

This was my first software project that I felt transcended the realm of simply coding a solution to a problem that dipped into learning new subject matters I’ve never studied before. I was combining coding with psychology and anatomy. It presented this vastly entertaining quest where I allowed myself to learn more about my skills while also discovering people in a manner I’ve never thought about previously.

I read several papers, some of which indicated that the spectrum of emotions could be determined via brain waves, skin conductivity, and heart rate. In particular I was interested in facial expression associations and their relative muscle activations.

Central Control of Muscles of Facial Expression, Fig. 1. J A Stephens, A A Root

If we were to apply mainstream technology towards our emotion recognition solution, then facial recognition seemed like the simplest approach given smart phone accessibility. With a prioritization on facial recognition for analyzing emotion, the next step would be to write a program for getting us facial data.

To this end I ended up using Adam Geitgey’s face-recognition library. This library utilized dlib’s recognition to assist us with identifying facial landmarks. Once I can identify the key points I’m looking for in conjunction with the research papers I’ve found, I can move towards mapping out facial muscles in association with these landmark locations.

DLIB Facial Landmarks

I took a shot at this on a hot weekend back home. There’s a python function that I wrote using this library; I enabled the webcam on my laptop, set some color preferences, and the function ran faster than expected. My project directory contained two new image files.

It's not common unless you’re a front end developer to receive immediate visual feedback on your code. The code wasn’t doing anything new, I just modified the output, but it felt so invigorating to see my face get captured and these datapoints populate. I had performed facial recognition with dlib and I saw that it was possible to isolate these facial data points. I felt that this project might be something that I could accomplish.

Every smile, wince, grimace, and beautifully manifested composure can be itemized and fragmented into small components to be sorted and algorithmically judged. Myself and the computer now sought the electric feeling of soft eyes and a bright smile meeting our own.

It Grows from Within

I got back to the books. My program will need to identify the seven universal expressions (anger, contempt, disgust, fear, happiness, sadness, and surprise) as well as an eighth “neutral” expression. The easiest of these being happiness which we can associate with the activation of the zygomaticus major (ZMM).

Investigating the Contraction Pattern of the Zygomaticus Major, Fig. 4. Daniel J. Rams.

The ZMM is a thin muscle running from the edge of the upper lip to the zygomatic bone (also known as the cheek bone). It’s amongst a group of muscles known as the laughter muscles for its contribution to smiling and laughter. For our case we will draw a line that runs parallel through the ZMM rather than tracking the shifting 3D muscle in its entirety. To do so I took a shortcut connecting the upper lip points (48 and 45), past the connecting point at the zygomatic bone, to the upper jaw points (0 and 16). Given a video input where we can measure the change in the ZMM relative to time, we can determine if at a given frame the ZMM is activated and if so, infer that the user is smiling and thus happy.

The image below is the result of applying our code wherein we sketch the facial landmarks with cyan and the ZMM with violet over a medical diagram consisting of a portrait and a depiction of the ZMM muscle over the person's face. The program is able to draw the line through the ZMM. Contraction of the muscle would result in the length of the line decreasing, thus allowing us to determine if the user is smiling. This supports our earlier assumptions where it may in fact be possible for AGI to understand human emotion via the mathematical deduction of facial structures. If the user is happy they will smile and we can calculate that they are smiling. An AI may not be able to feel happiness in the same way that we do, but it’s possible for it to understand that we are happy.

After drawing the line in our output we determine the size of the ZMM using the distance formula between the two points, allowing us to track each ZMM activation separately. This however introduces a new problem: we’re only determining the size of a muscle in pixel space. If the user moves closer to or farther away from the camera, the pixel length of the ZMM will change, giving us a false flag of ZMM contraction. Additionally the pixel length will differ between image resolutions when utilizing different cameras or source image sizes. This leads us into an age-old problem of real space in computer vision. So now we have the new problem of determining the metric size of facial components given an image.

To obtain the size of an object on a 2D plane would require some static point of reference, say a quarter. Having such a guide would allow me to create a pixel to metric ratio. The question I then raised was would it be possible to forgo the static reference for a standard image of the face without the use of additional technologies or metadata?

Window to the Soul

The human eye is the least growing organ on the human body. Its size is also roughly the same regardless of other genetic factors across all humans. Can we measure the pixel size of the user's eye, then cross-reference other anatomy papers to determine a pixel to metric ratio that can be used to determine the size of other facial features?

The “size of the eye”, or rather the size of the exposed eye can be determined via the palpebral fissure length (PFL). This can be measured via the distance from the edges of the lacrimal caruncle to the lateral commissure. Since facial recognition models often base their assumptions around changes in shade and color, our model does a good job at isolating these points at 36/39 for the left eye and 42/45 for the right.

Appl. of Deep Learning for Investigation of Neurological Diseases, Anatomy of the Eye. Mohammed Hamoud.

A quick Google search however on how long exactly the PFL is yields signs of ethnic bias. For instance, the Wikipedia entry dictates that the PFL is “approximately 30mm horizontally” which appears to coincide with this study from the University of Washington of ~500 participants who were all Caucasian. Gathering this data requires the use of digital calipers for an accurate reading. This can prove troublesome when performing a study across hundreds of patients, as the calipers have a pointed metal tip that must be placed on the edges of the eye for an accurate reading, which can cause potential damage. Alternatively you could utilize cadavers, however the embalming process or previous experimentation of the body can damage the area of the eye needed for study. I read several papers that conducted this study across their respective ethnic and gender groups. My results were the following:

Chinese Male	23.9mm	Chinese Female	23.2mm
Indian Male	29.1mm	Indian Female	27.4mm
Black Male	32.3mm	Black Female	31.5mm
White Male	29.5mm	White Female	29.4mm

Utilizing this information would allow us to more accurately portray the sizing of particular muscles when using PFL as a point of reference. We can then find the PFL in a similar manner to the ZMM:

I could then begin tracking ZMM contraction and by fall of 2021 compute our first emotion: happiness. We can now determine an estimated metric length simply by looking them in the eye. Transforming the digital into the real and in this simulated realness recognize human feelings.

Onward

In the winter I conducted a study where I gathered data on 100 universities and their respective computer science programs, writing an algorithm to tell me where I should transfer to following my sophomore year at Francis Marion University. In the summer of 2022 I packed my bags and moved across the country to Iowa State. Over the year I applied to nearly a hundred research programs and landed an opportunity at Old Dominion University to build a large language model that could be used to deter Russian disinformation.

I haven’t gotten back to this project since beginning my work with my research partner Iryna on this Russian disinformation model. I was given the opportunity to build something that has a chance of actually helping people, rather than designing a tool for measuring them. Hearing her stories I could see that this project was more than just a PhD dissertation. Being a Ukrainian migrant herself this was about her, her family, and her country. Our likelihood of success was irrelevant to me. Through her stories I could see a catching determination to try and help people and I couldn’t help but feel the same.

It’s spring of 2025 as of writing this and I want to finish this project once I’m done with the disinformation model. I feel that while the objective is the same the purpose is entirely different. I want to design this program not for the betterment of an AGI, but to find applications where measuring emotion can help people or perhaps bring us closer together. Technologies have changed and developments have been made. The possibilities have broadened greatly since my time away from this project and I’m all the more excited to return. When I do I think I’ll finally help a computer understand how we feel.

Emotion Recognition

Mathematically deducing emotions using facial recognition data.

Summer 2021

Skin-Deep

It Grows from Within

Window to the Soul

Onward

GitHub

Email

Download Resume