Japan determines copyright doesn't apply to LLM/ML training data

ericjmorey@programming.dev · 1 year ago

Japan determines copyright doesn't apply to LLM/ML training data

LWD@lemm.ee · 1 year ago

Painting and selling an exact copy of a recent work, such as Banksy, is a crime.

… however making an exact copy of Banksy for personal use, or to learn, or to teach other people, or copying the style… that’s all perfectly legal.

And that was the bait and switch of OpenAI! They sold themselves as being a non-profit simply doing research, for which it would be perfectly legal to consume and reproduce large quantities of data… And then, once they had the data, they started selling access to it.

I would say that that alone, along with the fact that they function as gatekeepers to the technology (One does not simply purchase the model from OpenAI, after all) they are hardly free of culpability… But it definitely depends on the person trying to use their black box too.

abhibeckert@lemmy.world · edit-2 1 year ago

Huh? What does being non profit have to do with it? Private companies are allowed to learn from copyrighted work. Microsoft and Apple, for example, look at each other’s software and copy ideas (not code, just ideas) all the time. The fact Linux is non-profit doesn’t give them any additional rights or protection.

iegod@lemm.ee · 1 year ago

They’re not gatekeeping llms though, there are publicly available models and data sets.

LWD@lemm.ee · edit-2 1 year ago

If it’s publicly available, why didn’t Microsoft just download and use it rather than paying them for a partnership?
(And where at?)

IIRC they only open-sourced some old stuff.

iegod@lemm.ee · 1 year ago

Stability diffusion is open source. You can run local instances with provided and free training sets to query against and generate your own outputs.

https://stability.ai/

Japan determines copyright doesn't apply to LLM/ML training data

Japan determines copyright doesn't apply to LLM/ML training data

Taggart :donor: (@mttaggart)