Language learning models are all about identifying patterns in how humans use words and copying them. Thing is that’s also how people tend to do things a lot of the time. If you give the LLM enough tertiary data it may be capable of ‘accidentally’ (read: randomly) outputting things you don’t want people to see.
How so?
Language learning models are all about identifying patterns in how humans use words and copying them. Thing is that’s also how people tend to do things a lot of the time. If you give the LLM enough tertiary data it may be capable of ‘accidentally’ (read: randomly) outputting things you don’t want people to see.
But how would you know when you have this data?