DataLicenses.org
A community-curated catalog of public initiatives shaping AI data access, licensing, controlled retrieval, and enforcement.
Rows emphasize public sources, dated evidence, and clear paths for updates.
- entries
- 54
- categories
- 8
- dated sources
- 54
- latest evidence
- May 14, 2026
Recently active
Entries with the newest dated public evidence.
Recent activity is based on dated sources, not an overall score.
| row | initiative | mechanism | adoption signal | latest evidence | update |
|---|---|---|---|---|---|
| 01 | Stack Data Licensing Licensed access to Stack Overflow's developer knowledge corpus for AI training, fine-tuning, RAG, and agentic use cases. | Marketplace live | 83M+ human-verified questions and answers | stackoverflow.co May 14, 2026 | profile update |
| 02 | Creative Commons Signals Creative Commons framework for communicating expectations and building governance infrastructure around AI use of shared knowledge. | Preference signal in progress | not recorded | creativecommons.org May 13, 2026 | profile update |
| 03 | SPUR (Standards for Publisher Usage Rights) Publisher coalition building shared standards and licensing frameworks for responsible AI use of journalism. | Licensing collective in progress | 6 founding publisher members | mediahuis.com May 11, 2026 | profile update |
| 04 | SourceAudio AI Dataset Licensing Opt-in music dataset licensing program that pays rights holders for AI training use of tracks and catalogs. | Marketplace live | 3,000+ music catalogs | sourceaudio.com May 06, 2026 | profile update |
| 05 | Created by Humans Rights-cleared book licensing platform for AI training, reference, and transformative use. | Marketplace live | 100+ bestselling authors | createdbyhumans.ai May 02, 2026 | profile update |
| 06 | Publishers' Rights Organization Coalition pursuing licensing, compensation, and enforcement for publisher content used by AI systems. | Licensing collective live | not recorded | publishersrights.org May 02, 2026 | profile update |
| 07 | IETF AI Preferences (AIPref) Internet Engineering Task Force is working on a standardized preference signal for AI agents and crawlers ("building blocks that allow for the expression of preferences about how content is collected and processed for Artificial Intelligence (AI) model development, deployment, and use.") | Preference signal in progress | not recorded | datatracker.ietf.org Apr 27, 2026 | profile update |
| 08 | Cloudflare AI Crawl Control A set of tools to block or charge for scraping; includes AI Audit dashboard, managed robots.txt, and pay-per-crawl marketplace. | Tollgate live | 3.8M+ domains on managed robots.txt | blog.cloudflare.com Apr 17, 2026 | profile update |
| 09 | Fastly AI Bot Management Edge bot-management layer for detecting and blocking AI crawlers and fetchers that scrape website content. | Technical blocking live | not recorded | fastly.com Apr 16, 2026 | profile update |
| 10 | Akamai Content Protector Enterprise anti-scraping product that detects and blocks persistent content scrapers, now positioned as part of broader AI and LLM bot management. | Technical blocking live | not recorded | akamai.com Apr 08, 2026 | profile update |
All current entries
Grouped by primary mechanism. Each entry links to a profile, latest evidence, and a pre-filled update issue.
Metric cells show recorded values only when backed by public evidence.
Preference signal 12 rows
Publishes machine-readable AI-use preferences.
| row | initiative | status | adoption signal | latest evidence | update |
|---|---|---|---|---|---|
| 02 | Creative Commons Signals Creative Commons framework for communicating expectations and building governance infrastructure around AI use of shared knowledge. | in progress | not recorded | creativecommons.org May 13, 2026 | profile update |
| 07 | IETF AI Preferences (AIPref) Internet Engineering Task Force is working on a standardized preference signal for AI agents and crawlers ("building blocks that allow for the expression of preferences about how content is collected and processed for Artificial Intelligence (AI) model development, deployment, and use.") | in progress | not recorded | datatracker.ietf.org Apr 27, 2026 | profile update |
| 12 | IPTC + PLUS Data Mining Metadata Embedded image and video metadata fields for expressing whether assets may be used in data-mining and generative-AI training datasets. | live | not recorded | iptc.org Mar 30, 2026 | profile update |
| 29 | TDMĀ·AI Asset-level protocol for binding machine-readable TDM and AI-training preferences to digital works. | in progress | not recorded | docs.tdmai.org Nov 04, 2025 | profile update |
| 33 | TDMRep (W3C) W3C specification for expressing text and data mining permissions via a well-known JSON file, designed for EU DSM Directive compliance. | live | not recorded | w3.org Oct 01, 2025 | profile update |
| 35 | DIY robots handling (robots.txt++) robots.txt can be used to express AI crawler access preferences. See example from OpenAI. Additionally, the X-Robots-Tag response header allows servers to send crawler directives via HTTP response headers. | live | not recorded | developers.openai.com Sep 17, 2025 | profile update |
| 38 | Adobe Content Authenticity Content Credentials-based preference system for signaling that generative AI should not train on or use a creator's files. | live | not recorded | helpx.adobe.com Sep 02, 2025 | profile update |
| 39 | Spawning ai.txt A proposed convention for AI-specific crawler directives via an `ai.txt` file. | in progress | not recorded | spawning.ai Aug 28, 2025 | profile update |
| 47 | User Intents Proposed AT Protocol mechanism for users to declare data-reuse preferences such as generative-AI training. | in progress | not recorded | github.com Mar 08, 2025 | profile update |
| 48 | trust.txt Publisher-oriented trust file that can also declare whether AI training is allowed through a machine-readable `datatrainingallowed=` field. | live | about 3,000 participating publishers | rjionline.org Feb 21, 2025 | profile update |
| 53 | DeviantArt NoAI / NoImageAI Platform-level HTML and HTTP directives that tell external AI datasets and models not to use artists' work unless they opt in. | live | not recorded | deviantart.com Apr 17, 2024 | profile update |
| 54 | TK Labels Local Contexts labels that let Indigenous communities express culturally specific conditions for access and reuse of knowledge and data. | live | not recorded | localcontexts.org Jun 17, 2022 | profile update |
Formal license 3 rows
Uses explicit license terms for AI reuse.
| row | initiative | status | adoption signal | latest evidence | update |
|---|---|---|---|---|---|
| 25 | Really Simple Licensing (RSL) A machine-readable licensing schema for clearly signaling reuse permissions and conditions (including payment or use restriction). | live | 1,500+ organizations | rslstandard.org Dec 10, 2025 | profile update |
| 30 | copyright.sh Machine-readable licensing layer that lets websites declare AI usage terms and pricing. | live | not recorded | blog.copyright.sh Oct 28, 2025 | profile update |
| 46 | AI-Ready Licenses Research-backed proposal for modular standard data licenses tailored to AI data sharing. | in progress | not recorded | mlcommons.org Mar 17, 2025 | profile update |
Licensing collective 5 rows
Coordinates licensing across many rights holders.
| row | initiative | status | adoption signal | latest evidence | update |
|---|---|---|---|---|---|
| 03 | SPUR (Standards for Publisher Usage Rights) Publisher coalition building shared standards and licensing frameworks for responsible AI use of journalism. | in progress | 6 founding publisher members | mediahuis.com May 11, 2026 | profile update |
| 06 | Publishers' Rights Organization Coalition pursuing licensing, compensation, and enforcement for publisher content used by AI systems. | live | not recorded | publishersrights.org May 02, 2026 | profile update |
| 14 | CCC AI Licensing Suite Voluntary collective licensing program covering internal AI reuse, external AI training, and transactional AI uses for copyrighted works. | live | not recorded | copyright.com Mar 03, 2026 | profile update |
| 15 | CLA Generative AI Training Licence UK collective licensing offer for AI model training, fine-tuning, and RAG over published text content. | live | not recorded | pls.org.uk Mar 01, 2026 | profile update |
| 50 | Dataset Providers Alliance (DPA) Trade alliance of dataset licensors pushing for legal clarity, ethical sourcing, and scalable licensing markets for AI training data. | live | 12 announced members | thedpa.ai Dec 10, 2024 | profile update |
Marketplace 13 rows
Matches data suppliers with AI buyers.
| row | initiative | status | adoption signal | latest evidence | update |
|---|---|---|---|---|---|
| 01 | Stack Data Licensing Licensed access to Stack Overflow's developer knowledge corpus for AI training, fine-tuning, RAG, and agentic use cases. | live | 83M+ human-verified questions and answers | stackoverflow.co May 14, 2026 | profile update |
| 04 | SourceAudio AI Dataset Licensing Opt-in music dataset licensing program that pays rights holders for AI training use of tracks and catalogs. | live | 3,000+ music catalogs | sourceaudio.com May 06, 2026 | profile update |
| 05 | Created by Humans Rights-cleared book licensing platform for AI training, reference, and transformative use. | live | 100+ bestselling authors | createdbyhumans.ai May 02, 2026 | profile update |
| 16 | Protege AI training data platform for compliant exchange of proprietary, real-world datasets across sectors. | live | hundreds of organizations | withprotege.ai Feb 12, 2026 | profile update |
| 17 | Microsoft Publisher Content Marketplace Paid marketplace routing premium publisher content into Microsoft Copilot, MSN, and Discover experiences. | live | 7 launch publisher partners | about.ads.microsoft.com Feb 03, 2026 | profile update |
| 18 | Defined.ai Marketplace for ethically sourced, annotated datasets used to train and fine-tune AI systems. | live | several partners generate $1M+/year | defined.ai Jan 27, 2026 | profile update |
| 20 | Human Native AI data marketplace for licensed multimedia datasets, now being integrated into Cloudflare's AI crawl and content-access stack. | live | not recorded | blog.cloudflare.com Jan 15, 2026 | profile update |
| 31 | DataSeeds.AI Rights-cleared image dataset marketplace that uses Zedge and GuruShots creator networks to supply AI training data. | live | approximately 30 million rights-cleared images | accessnewswire.com Oct 20, 2025 | profile update |
| 34 | Bria Artist Program / Licensed Training Catalog Contributor compensation and licensed visual training-data program tied to Bria's commercially safe generative AI stack. | live | 30+ data partners | bria.ai Sep 18, 2025 | profile update |
| 37 | ProRata / Gist A 50/50 revenue-share platform connecting publishers with AI companies, with 700+ publishers signed up including major news outlets. | live | 700+ publishers | businesswire.com Sep 05, 2025 | profile update |
| 40 | Dappier Rights-cleared content marketplace and monetization layer for RAG, assistants, and other AI applications. | live | not recorded | dappier.com Aug 18, 2025 | profile update |
| 43 | GCX (Global Copyright Exchange) Music licensing platform for rights-cleared AI training data and audio assets. | live | 4.4M+ hours of audio / 32B metadata text pairs / 3PB music data | rightsify.com May 15, 2025 | profile update |
| 49 | Gloo AI Licensing Licensed-content marketplace for the faith ecosystem, seeded with a pooled guarantee for AI assistants and search experiences. | live | $5M pooled guarantee | prnewswire.com Feb 20, 2025 | profile update |
Tollgate 2 rows
Makes AI access conditional on payment or metering.
| row | initiative | status | adoption signal | latest evidence | update |
|---|---|---|---|---|---|
| 08 | Cloudflare AI Crawl Control A set of tools to block or charge for scraping; includes AI Audit dashboard, managed robots.txt, and pay-per-crawl marketplace. | live | 3.8M+ domains on managed robots.txt | blog.cloudflare.com Apr 17, 2026 | profile update |
| 23 | TollBit Add subdomains to make content accessible to AI with blocking and monetization. | live | 4,000+ premium publishers | tollbit.com Dec 16, 2025 | profile update |
Technical blocking 4 rows
Blocks or constrains automated access.
| row | initiative | status | adoption signal | latest evidence | update |
|---|---|---|---|---|---|
| 09 | Fastly AI Bot Management Edge bot-management layer for detecting and blocking AI crawlers and fetchers that scrape website content. | live | not recorded | fastly.com Apr 16, 2026 | profile update |
| 10 | Akamai Content Protector Enterprise anti-scraping product that detects and blocks persistent content scrapers, now positioned as part of broader AI and LLM bot management. | live | not recorded | akamai.com Apr 08, 2026 | profile update |
| 22 | easy-dataset-share A simple anti-scraping tool intended to protect datasets from basic crawlers/scrapers. | live | not recorded | arxiv.org Jan 09, 2026 | profile update |
| 41 | Hugging Face Gated Datasets Hugging Face Hub access-control feature that requires users to request approval before downloading a dataset. | live | not recorded | github.com Aug 18, 2025 | profile update |
New infrastructure 14 rows
Builds new rails for governed data sharing.
| row | initiative | status | adoption signal | latest evidence | update |
|---|---|---|---|---|---|
| 11 | IETF Web Bot Auth Working group standardizing cryptographic authentication for bots and AI agents on the web. | in progress | not recorded | datatracker.ietf.org Apr 01, 2026 | profile update |
| 13 | IAB Tech Lab CoMP Standards initiative for machine-readable commercial agreements, access policies, and monetization workflows before AI crawling or content use. | in progress | not recorded | iabtechlab.com Mar 10, 2026 | profile update |
| 19 | CommonsDB Registry for public-domain and openly licensed works using verifiable rights declarations and content-derived identifiers. | live | 300,000+ declarations | commonsdb.org Jan 20, 2026 | profile update |
| 21 | Wikimedia Enterprise Enterprise-grade APIs and structured dumps for Wikipedia and sister projects, designed for large-scale reuse in AI, search, and knowledge graphs. | live | 10+ announced partners | enterprise.wikimedia.com Jan 15, 2026 | profile update |
| 24 | Amlet AI content registry for publishers and authors that links ownership proof, TDM registration, and licensing rules for AI reuse. | live | not recorded | blog.amlet.ai Dec 15, 2025 | profile update |
| 26 | Mozilla Data Collective Community-centered dataset platform for sharing AI-relevant data under contributor-controlled licenses, access rules, and governance terms. | live | 470+ datasets | community.mozilladatacollective.com Nov 25, 2025 | profile update |
| 27 | European Books Data Commons Proposal for a commons-based infrastructure for large-scale access to digitized European books with conditional commercial access. | in progress | not recorded | openfuture.eu Nov 20, 2025 | profile update |
| 28 | SyftBox Open-source protocol for privacy-preserving AI and analytics across distributed datasets without centralizing the underlying data. | live | not recorded | github.com Nov 12, 2025 | profile update |
| 32 | Attribution-based control OpenMined architecture for permissioned data contribution and attribution-based access in AI systems. | in progress | not recorded | openmined.org Oct 06, 2025 | profile update |
| 36 | Spawning Do Not Train Registry Registry and opt-out workflow for marking works that should not be used in future AI training datasets. | in progress | not recorded | spawning.ai Sep 15, 2025 | profile update |
| 42 | FlexOlmo Distributed language-model training approach that lets data owners contribute experts without sharing raw data or giving up opt-out control. | in progress | not recorded | allenai.org Jul 09, 2025 | profile update |
| 44 | Social License for Data Reuse Participatory governance framework for communities to define conditions for data reuse, including AI training. | in progress | not recorded | blog.thegovlab.org May 13, 2025 | profile update |
| 45 | Credtent Independent creative registry for opting out of AI use, licensing content, and certifying human-created work. | live | thousands of creators | globenewswire.com Mar 31, 2025 | profile update |
| 51 | Spawning Data Diligence Python package and API helpers for checking whether works are opted out before model training. | live | not recorded | pypi.org Oct 09, 2024 | profile update |
Certification 1 row
Verifies or signals compliant sourcing practices.
| row | initiative | status | adoption signal | latest evidence | update |
|---|---|---|---|---|---|
| 52 | Fairly Trained Certification program for AI models and companies that meet stated consent and licensing criteria for training data. | live | 16+ announced certified entities | fairlytrained.org Aug 01, 2024 | profile update |