Language learning models are all about identifying patterns in how humans use words and copying them. Thing is that’s also how people tend to do things a lot of the time. If you give the LLM enough tertiary data it may be capable of ‘accidentally’ (read: randomly) outputting things you don’t want people to see.
I can’t wait until we find out AI trained on military secrets is leaking military secrets.
I can’t wait until people find out that you don’t even need to train it on secrets, for it to “leak” secrets.
How so?
Language learning models are all about identifying patterns in how humans use words and copying them. Thing is that’s also how people tend to do things a lot of the time. If you give the LLM enough tertiary data it may be capable of ‘accidentally’ (read: randomly) outputting things you don’t want people to see.
But how would you know when you have this data?
In order for this to happen, someone will have to utilize that AI to make a cheatbot for War Thunder.
I mean even with chatgpt enterprise you prevent that.
It’s only the consumer versions that train on your data and submissions.
Otherwise no legal team in the world would consider chatgpt or copilot.
I will say that they still store and use your data some way. They just haven’t been caught yet.
Anything you have to send over the internet to a server you do not control, will probably not work for a infosec minded legal team.