Explainable AI: What the Hell is Even That?

Aug. 10, 2022

Machine Learning Memes for Convolutional Teens

Machine Learning Memes for Convolutional Teens

Some time ago I wrote a (non-technical) short essay on AI explanation for a uni course. I didn’t know anything about the topic at the time (and things haven’t changed much since then) but it already seemed quite clear that the problem of explanation would become a huge one as AI research advanced.

There were already many ideas on how to do it, from using simpler algorithms like decision trees to approximate a neural network to producing AIs that can learn how to explain themselves.

However, some of the limitations were also already clear. For example: how do you measure the quality of an explanation? Is it about how faithful they are (but then, the most faithful explanation is the code itself) or how convincing they are? If this is the case, isn’t that a clear incentive to lie? This is basically what my essay is about.

Clearly, the reasoning is very abstract and only scratches the surface, but I think the core of the question is still interesting and worthy to investigate.


Fast-forward in time: it’s now 2023, ChatGPT and LLMs are huge on Twitter (maybe less so in the real world) and obviously the research on explainable AI is now hotter than ever. Also, Twitter is now X, but that’s another story.

Here’s a (very random and outdated) collection of interesting articles about the topic:

Interpretability vs Accuracy of modern AI models

Interpretability vs Accuracy of modern AI models

The Machiavelli benchmark

The Machiavelli benchmark