The Data-Score is data sourcing label and Copyright rating system designed to help companies and professionals make better choices when using machine learning (ML) models. Higher ratings means lower legal risks, better protection, and more freedom of use.
Score | Description |
---|---|
🟩 🅐 | Full Ownership / Assignment |
🟩 🅑 | Exclusive Contracts |
🟨 🅒 | Non-Exclusive Agreement |
🟧 🅓 | Permissive Licenses |
🟥 🅔 | Fair Use & Exceptions |
⬛️ 🅕 | Illegal Content / #DarkData |
Each rating can be assessed according to the following categories:
- Outputs — What level are the risks associated with using generated outputs?
- Derivatives — Do you have the right create derivatives of Copyrighted works in the training data?
- Collections — Are you allowed to make & store collections of the outputs?
- Licensing — Can you license any derivative works you create downstream?
- Protection — Would the works you create be granted Copyright protection?
- Remedies — Do you benefit from remebies in the legal system under Copyright?
- Overheads — Which are the administrative overheads for this rating?
This rating system is based on my Copyright Traffic Light System. The purpose of this repository is to collect references to laws and regulations worldwide, as well as applicable caselaw for each rating and each category.