Hi! đ Welcome to The Big Y!
Emotions are a fickle thing, but there is always someone in the tech world trying to tame them.
Google has released an updated open source PaliGemma 2 family of models (based on their Gemma models) which analyzes images and can tell users what is in the image. The captions that PaliGemma can generate includes descriptions of âactions, emotions, and the overall narrative of the scene.â
On the audio side of things, a former OpenAI employee has launched a startup that has the goal of âemotional general intelligenceâ. (đĽ´) WaveForms is planning to build an audio language model that can be used to analyze voices in real-time and can capture the emotional nuance of a personâs voice.
Like our parents said about the internet, just because itâs AI doesnât necessarily mean itâs accurate or true. Emotions vary from person to person and can be expressed differently across cultures. Humans already have difficulties reading the emotions of others accurately, itâs weird to think that an AI would be any good at it if trained on human data.
My point is, is that taking a fuzzy concept, like emotions, and throwing it into a transformer and training a large model doesnât turn the fuzzy concept into an accurate science. What is the ground truth in the training data for emotive applications? Who is to judge the actual emotional state of another human being?
The Tidbit: Models continue to pop out like crazy. Last week we saw a new generation of Gemini models come from Google, as well as a new compact model launched by Cohere (Yay!).
Know someone who might enjoy this newsletter? Share it with them and help spread the word!
Thanks for reading! Have a great week! đ