Yeah, it says that we write a lot of fiction about AI launching nukes and being unpredictable in Wargames, such as the movie Wargames where an AI unpredictably plans to launch nukes.
Every single one of the LLMs they tested had gone through safety fine tuning which means they have alignment messaging to self-identify as a large language model and complete the request as such.
So if you have extensive stereotypes about AI launching nukes in the training data, get it to answer as an AI, and then ask it what it should do in a wargame, WTF did they think it was going to answer?
Yeah, it says that we write a lot of fiction about AI launching nukes and being unpredictable in Wargames, such as the movie Wargames where an AI unpredictably plans to launch nukes.
Every single one of the LLMs they tested had gone through safety fine tuning which means they have alignment messaging to self-identify as a large language model and complete the request as such.
So if you have extensive stereotypes about AI launching nukes in the training data, get it to answer as an AI, and then ask it what it should do in a wargame, WTF did they think it was going to answer?