DataLicenses.org

A community-curated catalog of public initiatives shaping AI data access, licensing, controlled retrieval, and enforcement.

Rows emphasize public sources, dated evidence, and clear paths for updates.

entries
54
categories
8
dated sources
54
latest evidence
May 14, 2026

Recently active

Entries with the newest dated public evidence.

Recent activity is based on dated sources, not an overall score.

Recently active initiatives
row initiative mechanism adoption signal latest evidence update
01 Stack Data Licensing Licensed access to Stack Overflow's developer knowledge corpus for AI training, fine-tuning, RAG, and agentic use cases. Marketplace live 83M+ human-verified questions and answers stackoverflow.co May 14, 2026 profile update
02 Creative Commons Signals Creative Commons framework for communicating expectations and building governance infrastructure around AI use of shared knowledge. Preference signal in progress not recorded creativecommons.org May 13, 2026 profile update
03 SPUR (Standards for Publisher Usage Rights) Publisher coalition building shared standards and licensing frameworks for responsible AI use of journalism. Licensing collective in progress 6 founding publisher members mediahuis.com May 11, 2026 profile update
04 SourceAudio AI Dataset Licensing Opt-in music dataset licensing program that pays rights holders for AI training use of tracks and catalogs. Marketplace live 3,000+ music catalogs sourceaudio.com May 06, 2026 profile update
05 Created by Humans Rights-cleared book licensing platform for AI training, reference, and transformative use. Marketplace live 100+ bestselling authors createdbyhumans.ai May 02, 2026 profile update
06 Publishers' Rights Organization Coalition pursuing licensing, compensation, and enforcement for publisher content used by AI systems. Licensing collective live not recorded publishersrights.org May 02, 2026 profile update
07 IETF AI Preferences (AIPref) Internet Engineering Task Force is working on a standardized preference signal for AI agents and crawlers ("building blocks that allow for the expression of preferences about how content is collected and processed for Artificial Intelligence (AI) model development, deployment, and use.") Preference signal in progress not recorded datatracker.ietf.org Apr 27, 2026 profile update
08 Cloudflare AI Crawl Control A set of tools to block or charge for scraping; includes AI Audit dashboard, managed robots.txt, and pay-per-crawl marketplace. Tollgate live 3.8M+ domains on managed robots.txt blog.cloudflare.com Apr 17, 2026 profile update
09 Fastly AI Bot Management Edge bot-management layer for detecting and blocking AI crawlers and fetchers that scrape website content. Technical blocking live not recorded fastly.com Apr 16, 2026 profile update
10 Akamai Content Protector Enterprise anti-scraping product that detects and blocks persistent content scrapers, now positioned as part of broader AI and LLM bot management. Technical blocking live not recorded akamai.com Apr 08, 2026 profile update

All current entries

Grouped by primary mechanism. Each entry links to a profile, latest evidence, and a pre-filled update issue.

Metric cells show recorded values only when backed by public evidence.

Preference signal 12 rows

Publishes machine-readable AI-use preferences.

back to entries
Preference signal initiatives
row initiative status adoption signal latest evidence update
02 Creative Commons Signals Creative Commons framework for communicating expectations and building governance infrastructure around AI use of shared knowledge. in progress not recorded creativecommons.org May 13, 2026 profile update
07 IETF AI Preferences (AIPref) Internet Engineering Task Force is working on a standardized preference signal for AI agents and crawlers ("building blocks that allow for the expression of preferences about how content is collected and processed for Artificial Intelligence (AI) model development, deployment, and use.") in progress not recorded datatracker.ietf.org Apr 27, 2026 profile update
12 IPTC + PLUS Data Mining Metadata Embedded image and video metadata fields for expressing whether assets may be used in data-mining and generative-AI training datasets. live not recorded iptc.org Mar 30, 2026 profile update
29 TDMĀ·AI Asset-level protocol for binding machine-readable TDM and AI-training preferences to digital works. in progress not recorded docs.tdmai.org Nov 04, 2025 profile update
33 TDMRep (W3C) W3C specification for expressing text and data mining permissions via a well-known JSON file, designed for EU DSM Directive compliance. live not recorded w3.org Oct 01, 2025 profile update
35 DIY robots handling (robots.txt++) robots.txt can be used to express AI crawler access preferences. See example from OpenAI. Additionally, the X-Robots-Tag response header allows servers to send crawler directives via HTTP response headers. live not recorded developers.openai.com Sep 17, 2025 profile update
38 Adobe Content Authenticity Content Credentials-based preference system for signaling that generative AI should not train on or use a creator's files. live not recorded helpx.adobe.com Sep 02, 2025 profile update
39 Spawning ai.txt A proposed convention for AI-specific crawler directives via an `ai.txt` file. in progress not recorded spawning.ai Aug 28, 2025 profile update
47 User Intents Proposed AT Protocol mechanism for users to declare data-reuse preferences such as generative-AI training. in progress not recorded github.com Mar 08, 2025 profile update
48 trust.txt Publisher-oriented trust file that can also declare whether AI training is allowed through a machine-readable `datatrainingallowed=` field. live about 3,000 participating publishers rjionline.org Feb 21, 2025 profile update
53 DeviantArt NoAI / NoImageAI Platform-level HTML and HTTP directives that tell external AI datasets and models not to use artists' work unless they opt in. live not recorded deviantart.com Apr 17, 2024 profile update
54 TK Labels Local Contexts labels that let Indigenous communities express culturally specific conditions for access and reuse of knowledge and data. live not recorded localcontexts.org Jun 17, 2022 profile update

Formal license 3 rows

Uses explicit license terms for AI reuse.

back to entries
Formal license initiatives
row initiative status adoption signal latest evidence update
25 Really Simple Licensing (RSL) A machine-readable licensing schema for clearly signaling reuse permissions and conditions (including payment or use restriction). live 1,500+ organizations rslstandard.org Dec 10, 2025 profile update
30 copyright.sh Machine-readable licensing layer that lets websites declare AI usage terms and pricing. live not recorded blog.copyright.sh Oct 28, 2025 profile update
46 AI-Ready Licenses Research-backed proposal for modular standard data licenses tailored to AI data sharing. in progress not recorded mlcommons.org Mar 17, 2025 profile update

Licensing collective 5 rows

Coordinates licensing across many rights holders.

back to entries
Licensing collective initiatives
row initiative status adoption signal latest evidence update
03 SPUR (Standards for Publisher Usage Rights) Publisher coalition building shared standards and licensing frameworks for responsible AI use of journalism. in progress 6 founding publisher members mediahuis.com May 11, 2026 profile update
06 Publishers' Rights Organization Coalition pursuing licensing, compensation, and enforcement for publisher content used by AI systems. live not recorded publishersrights.org May 02, 2026 profile update
14 CCC AI Licensing Suite Voluntary collective licensing program covering internal AI reuse, external AI training, and transactional AI uses for copyrighted works. live not recorded copyright.com Mar 03, 2026 profile update
15 CLA Generative AI Training Licence UK collective licensing offer for AI model training, fine-tuning, and RAG over published text content. live not recorded pls.org.uk Mar 01, 2026 profile update
50 Dataset Providers Alliance (DPA) Trade alliance of dataset licensors pushing for legal clarity, ethical sourcing, and scalable licensing markets for AI training data. live 12 announced members thedpa.ai Dec 10, 2024 profile update

Marketplace 13 rows

Matches data suppliers with AI buyers.

back to entries
Marketplace initiatives
row initiative status adoption signal latest evidence update
01 Stack Data Licensing Licensed access to Stack Overflow's developer knowledge corpus for AI training, fine-tuning, RAG, and agentic use cases. live 83M+ human-verified questions and answers stackoverflow.co May 14, 2026 profile update
04 SourceAudio AI Dataset Licensing Opt-in music dataset licensing program that pays rights holders for AI training use of tracks and catalogs. live 3,000+ music catalogs sourceaudio.com May 06, 2026 profile update
05 Created by Humans Rights-cleared book licensing platform for AI training, reference, and transformative use. live 100+ bestselling authors createdbyhumans.ai May 02, 2026 profile update
16 Protege AI training data platform for compliant exchange of proprietary, real-world datasets across sectors. live hundreds of organizations withprotege.ai Feb 12, 2026 profile update
17 Microsoft Publisher Content Marketplace Paid marketplace routing premium publisher content into Microsoft Copilot, MSN, and Discover experiences. live 7 launch publisher partners about.ads.microsoft.com Feb 03, 2026 profile update
18 Defined.ai Marketplace for ethically sourced, annotated datasets used to train and fine-tune AI systems. live several partners generate $1M+/year defined.ai Jan 27, 2026 profile update
20 Human Native AI data marketplace for licensed multimedia datasets, now being integrated into Cloudflare's AI crawl and content-access stack. live not recorded blog.cloudflare.com Jan 15, 2026 profile update
31 DataSeeds.AI Rights-cleared image dataset marketplace that uses Zedge and GuruShots creator networks to supply AI training data. live approximately 30 million rights-cleared images accessnewswire.com Oct 20, 2025 profile update
34 Bria Artist Program / Licensed Training Catalog Contributor compensation and licensed visual training-data program tied to Bria's commercially safe generative AI stack. live 30+ data partners bria.ai Sep 18, 2025 profile update
37 ProRata / Gist A 50/50 revenue-share platform connecting publishers with AI companies, with 700+ publishers signed up including major news outlets. live 700+ publishers businesswire.com Sep 05, 2025 profile update
40 Dappier Rights-cleared content marketplace and monetization layer for RAG, assistants, and other AI applications. live not recorded dappier.com Aug 18, 2025 profile update
43 GCX (Global Copyright Exchange) Music licensing platform for rights-cleared AI training data and audio assets. live 4.4M+ hours of audio / 32B metadata text pairs / 3PB music data rightsify.com May 15, 2025 profile update
49 Gloo AI Licensing Licensed-content marketplace for the faith ecosystem, seeded with a pooled guarantee for AI assistants and search experiences. live $5M pooled guarantee prnewswire.com Feb 20, 2025 profile update

Tollgate 2 rows

Makes AI access conditional on payment or metering.

back to entries
Tollgate initiatives
row initiative status adoption signal latest evidence update
08 Cloudflare AI Crawl Control A set of tools to block or charge for scraping; includes AI Audit dashboard, managed robots.txt, and pay-per-crawl marketplace. live 3.8M+ domains on managed robots.txt blog.cloudflare.com Apr 17, 2026 profile update
23 TollBit Add subdomains to make content accessible to AI with blocking and monetization. live 4,000+ premium publishers tollbit.com Dec 16, 2025 profile update

Technical blocking 4 rows

Blocks or constrains automated access.

back to entries
Technical blocking initiatives
row initiative status adoption signal latest evidence update
09 Fastly AI Bot Management Edge bot-management layer for detecting and blocking AI crawlers and fetchers that scrape website content. live not recorded fastly.com Apr 16, 2026 profile update
10 Akamai Content Protector Enterprise anti-scraping product that detects and blocks persistent content scrapers, now positioned as part of broader AI and LLM bot management. live not recorded akamai.com Apr 08, 2026 profile update
22 easy-dataset-share A simple anti-scraping tool intended to protect datasets from basic crawlers/scrapers. live not recorded arxiv.org Jan 09, 2026 profile update
41 Hugging Face Gated Datasets Hugging Face Hub access-control feature that requires users to request approval before downloading a dataset. live not recorded github.com Aug 18, 2025 profile update

New infrastructure 14 rows

Builds new rails for governed data sharing.

back to entries
New infrastructure initiatives
row initiative status adoption signal latest evidence update
11 IETF Web Bot Auth Working group standardizing cryptographic authentication for bots and AI agents on the web. in progress not recorded datatracker.ietf.org Apr 01, 2026 profile update
13 IAB Tech Lab CoMP Standards initiative for machine-readable commercial agreements, access policies, and monetization workflows before AI crawling or content use. in progress not recorded iabtechlab.com Mar 10, 2026 profile update
19 CommonsDB Registry for public-domain and openly licensed works using verifiable rights declarations and content-derived identifiers. live 300,000+ declarations commonsdb.org Jan 20, 2026 profile update
21 Wikimedia Enterprise Enterprise-grade APIs and structured dumps for Wikipedia and sister projects, designed for large-scale reuse in AI, search, and knowledge graphs. live 10+ announced partners enterprise.wikimedia.com Jan 15, 2026 profile update
24 Amlet AI content registry for publishers and authors that links ownership proof, TDM registration, and licensing rules for AI reuse. live not recorded blog.amlet.ai Dec 15, 2025 profile update
26 Mozilla Data Collective Community-centered dataset platform for sharing AI-relevant data under contributor-controlled licenses, access rules, and governance terms. live 470+ datasets community.mozilladatacollective.com Nov 25, 2025 profile update
27 European Books Data Commons Proposal for a commons-based infrastructure for large-scale access to digitized European books with conditional commercial access. in progress not recorded openfuture.eu Nov 20, 2025 profile update
28 SyftBox Open-source protocol for privacy-preserving AI and analytics across distributed datasets without centralizing the underlying data. live not recorded github.com Nov 12, 2025 profile update
32 Attribution-based control OpenMined architecture for permissioned data contribution and attribution-based access in AI systems. in progress not recorded openmined.org Oct 06, 2025 profile update
36 Spawning Do Not Train Registry Registry and opt-out workflow for marking works that should not be used in future AI training datasets. in progress not recorded spawning.ai Sep 15, 2025 profile update
42 FlexOlmo Distributed language-model training approach that lets data owners contribute experts without sharing raw data or giving up opt-out control. in progress not recorded allenai.org Jul 09, 2025 profile update
44 Social License for Data Reuse Participatory governance framework for communities to define conditions for data reuse, including AI training. in progress not recorded blog.thegovlab.org May 13, 2025 profile update
45 Credtent Independent creative registry for opting out of AI use, licensing content, and certifying human-created work. live thousands of creators globenewswire.com Mar 31, 2025 profile update
51 Spawning Data Diligence Python package and API helpers for checking whether works are opted out before model training. live not recorded pypi.org Oct 09, 2024 profile update

Certification 1 row

Verifies or signals compliant sourcing practices.

back to entries
Certification initiatives
row initiative status adoption signal latest evidence update
52 Fairly Trained Certification program for AI models and companies that meet stated consent and licensing criteria for training data. live 16+ announced certified entities fairlytrained.org Aug 01, 2024 profile update