Feb 6

It’s a while since I last wrote a Deep Dips post, so I’m going to broach another topic in the area of deep learning and LLMs that is becoming increasingly talked about — Mechanistic Interpretability, or MI to its friends.

5 Comments

Nick Taylor

Feb 9

This is also rather worrying https://www.theregister.com/2026/01/30/road_sign_hijack_ai/

Reply (1)

Michael Lones

Feb 9

Yes, the lack of distinction between data and instruction is a fundamental issue for LLMs!

Nick Taylor

Feb 7

Very interesting. Making "steering" easier worries me a bit though as it could be used to introduce subtle biases. Also, did you mean Google's Gemini, rather than Gemma?

Reply (1)

Michael Lones

Feb 7

Yes, good point. It could be used as a relatively accessible mechanism for developers to override trained behaviour, and perhaps is being used in this way? Certainly one owner of an AI company who might use it in this way springs to mind! Gemma is Google’s family of open weight models, derived from Gemini.

Reply (1)

Nick Taylor

Feb 7

Thanks for clarifying.

Fetch Decode Execute

Deep Dips #6: Mechanistic interpretability