The Big Y #70
Hi! 👋 Welcome to The Big Y!
As we begin to understand the importance of knowing and keeping track of all the places our personal data is gathered, we should also be thinking about the Machine Learning models that are trained on our data. If you request a company to delete your personal data, they might still be running models that were trained on your data, or if they are doing it properly, the company would need to completely retrain any model that your data was used in.
As we begin to hold companies to higher standards with regards to privacy and data, they will need to figure out how to more efficiently remove data and its influence from models without building the model from scratch every time. From a sustainability perspective, having to retrain the full models is also a waste of resources with the large amount of compute power necessary to retrain.
Here is where the idea of machine unlearning comes into play. Researchers are working on building techniques that would allow for data to be deleted and influence on the model to be removed without retraining the whole system while maintaining performance. One such technique silos data into smaller batches where the model is trained and then merged at the end into one model (reminds me of federated learning in a way). Then if data needs to be deleted you only need to re-run the small batch.
At the moment, these techniques still have a way to go before they can be used widely, but they will be important and necessary to keep high-performing models running with increasing data protection.
Facebook continues to have deeply problematic issues with its AI. Most recently, under a video featuring Black men, their system suggested: “keep seeing videos about Primates.” This is likely a result of biased data and a continuation of biased facial recognition technology.
Thanks for reading! Share this with a friend if you think they'd like it too. Have a great week! 😁
🎙 The Big Y Podcast: Listen on Spotify, Apple Podcasts, Stitcher, Substack