The Big Y #204

Ensemble LLMs

Apr 01, 2024

Hi! 👋 Welcome to The Big Y!

What would happen if you take the best pieces of two LLMs and create a new and improved, targeted LLM? We’re already seeing it happen. Startup Sakana AI has been developing what they call, Evolutionary Model Merge, a method to combine different foundation models together to create an improved model.

Sakana has combined a Japanese LLM (Shisa-Gamma) and two math-specific LLMs (WizardMath and Abel) to create an improved LLM that is specifically good at Japanese language math problems. The combo model is a result of 100-150 trial and error ways to put these models together, and it performs better than any of the original models at the specified Japanese language math problems.

The benefits are twofold: You get a new model that is better at the more narrow use case you wanted to tackle, and because you aren’t training a model from scratch, you are saving a lot of money. These merged models don’t need to be trained from zero, so you can leverage the compute already spent to train these base models and only need limited compute on top.

History repeats itself, naturally, and this is a revisiting of ensemble learning. Ensemble learning is a technique in machine learning where you take two or more weaker models and combine them to create a better performing model, basically a similar approach to what we’re seeing with LLMs.

As LLMs and foundational models continue to get more flushed out, I think we’ll continue to see old techniques applied to them in new ways to make them more efficient and performant.

OpenAI has released an interesting new product, Voice Engine, which they are currently keeping in private beta. They say it can recreate a person’s voice from a 15 second clip, generating output that sounds like that person’s real voice.

Know someone who might enjoy this newsletter? Share it with them and help spread the word!

Thanks for reading! Have a great week! 😁

🎙 The Big Y Podcast: Listen on Spotify, Apple Podcasts, Stitcher, Substack