The average face, what does it mean?

The link to the final work:

This work was before corona. I want to show it because a like the process. I was somewhat structured and keeping up with the schedule. And I think it gives a good insight in my thought process.

I think this was the most difficult assignment. Everyone was struggling, but the results were awesome.

The assignment:

So picking the dataset: CelebA. 30000 images of celebrity faces. I was in doubt which set to pick. I like information design, and there were so many interesting datasets.
Images > numbers. I felt like this was one of the more unique datasets.

This is how the data was structured. Images and tags files. A really interesting mix of objective and subjective tags.

What does the image mean.
So we had to make 2 posters. So first I was looking for different groups to compare. Try to broaden the research. Interesting but not what I wanted. For example the hats if you have not so much underlying structe, lots of variation. The average color is grey.

So I wanted to stick with mustache. What do I want to tell about the mustache. Capture the essence. So I generated the inverse and subtracted it.

Thinking of ways how to present this. Presented the heatmaps and got some feedback. But no still not a real narrative, no 2 different stories.

Taking a step back. I was calculating an average, but this is a bit of a vague statistic. Multiple measures of central tendency can be called an average.

This is a difficult concept. Can I visualize this? What does it look like for image data?

Median: the exact value that is in the middle. Which is actaully a bit arbitrary. You can see it is similar to mean image, but noisier.

Mode (=modaal in dutch): most frequent value. This algorithm was trickier to implement, as you can see i have different versions,  with different visual outputs. I figured that some of the luminance values were really outliers, but actually very frequent. The extremes, white and black, occur a lot because of shadows and flash/over exposure. This is why i chose the use the interquartile range of the data instead of all the data points.

I chose to compare the mode and the mean. Both could make sense. Mean is the most common, all datapoints have an equal weight. More fair? It can be skewed. What if there is someone with an elmo suit in the dataset? This will make the average a lot more red, but it might not be representative.
Mode the color is actually existing in the dataset. Might be more close to face that actually exists. However you ignore big part of your data here.

The hair and the skin color, can be manipulated by using different measurements of averaging.  This was now my message.

The last step. Making it interactive . I wanted to bring the face data back to the face. For one experiment i used a webcam with facial recognition. But what to change about this quick sketch? I did'n think it was informative enough.

I chose to go for a website as final work.