The Big Y #241

Are you happy or sad?

Dec 16, 2024

Hi! 👋 Welcome to The Big Y!

Emotions are a fickle thing, but there is always someone in the tech world trying to tame them.

Google has released an updated open source PaliGemma 2 family of models (based on their Gemma models) which analyzes images and can tell users what is in the image. The captions that PaliGemma can generate includes descriptions of “actions, emotions, and the overall narrative of the scene.”

On the audio side of things, a former OpenAI employee has launched a startup that has the goal of “emotional general intelligence”. (🥴) WaveForms is planning to build an audio language model that can be used to analyze voices in real-time and can capture the emotional nuance of a person’s voice.

Like our parents said about the internet, just because it’s AI doesn’t necessarily mean it’s accurate or true. Emotions vary from person to person and can be expressed differently across cultures. Humans already have difficulties reading the emotions of others accurately, it’s weird to think that an AI would be any good at it if trained on human data.

My point is, is that taking a fuzzy concept, like emotions, and throwing it into a transformer and training a large model doesn’t turn the fuzzy concept into an accurate science. What is the ground truth in the training data for emotive applications? Who is to judge the actual emotional state of another human being?

The Tidbit: Models continue to pop out like crazy. Last week we saw a new generation of Gemini models come from Google, as well as a new compact model launched by Cohere (Yay!).

Know someone who might enjoy this newsletter? Share it with them and help spread the word!

Thanks for reading! Have a great week! 😁

The Big Y

The Big Y #241

Are you happy or sad?

Discussion about this post