Approaches for controlling how data is used in AI pipelines.

DataLicenses.org tracks licenses, preference signals, markets, access controls, registries, standards, certification, and governed sharing for AI training, evaluation, retrieval, and generation.

The catalog focuses on permissions, terms, and controlled access. It does not cover complete withholding, data poisoning, litigation tracking, or general AI governance.

Browse initiatives

Current initiatives: 60
Live: 46
Approach categories: 10
Latest tracked evidence: Jul 16, 2026

Browse initiatives

Search 60 current initiatives or narrow the catalog by status, approach, data type, pipeline stage, and public adoption evidence.

“Live” means publicly available.It does not mean widely adopted, legally tested, or effective against non-compliant actors.

60 initiatives

Showing all current initiatives.

Cloudflare AI Crawl Control

Live

Live controls for classifying and blocking Search, Agent, and Training bots, alongside managed preference signals and monetization tools.

Open profile Official site

TollgateAlso: Technical blocking · Preference signalCollect · RetrieveWeb content

Access controlIt can control the protected access path, but cannot guarantee control of downstream copies or alternate routes.

Jul 01, 2026blog.cloudflare.comChecked Jul 15, 20263.8M+ domains on managed robots.txt · 1B+ HTTP 402 responses/day across Cloudflare customers

Really Simple Licensing (RSL)

Live

A machine-readable licensing schema for clearly signaling reuse permissions and conditions (including payment or use restriction).

Open profile Official site

Formal licenseAlso: Preference signalCollect · Train · RetrieveWeb content

Legal termsPractical effect depends on applicable law, rights ownership, notice, and contract formation.

Dec 10, 2025rslstandard.orgChecked Jul 15, 20261,500+ endorsing organizations · supporters represent billions of web pages

Creative Commons Signals

In progress

Creative Commons framework for communicating expectations and building governance infrastructure around AI use of shared knowledge.

Open profile Official site

Preference signalAlso: Protocol or standardCollect · Train · RetrieveMultimodal

Voluntary signalIt only affects actors that detect and honor the signal; it does not itself prevent reuse.

May 13, 2026creativecommons.orgChecked Jun 24, 2026No public adoption figure found

IETF AI Preferences (AIPref)

In progress

Internet Engineering Task Force is working on a standardized preference signal for AI agents and crawlers ("building blocks that allow for the expression of preferences about how content is collected and processed for Artificial Intelligence (AI) model development, deployment, and use.")

Open profile Official site

Preference signalCollect · Train · RetrieveWeb content

Voluntary signalIt only affects actors that detect and honor the signal; it does not itself prevent reuse.

May 01, 2026datatracker.ietf.orgChecked Jul 15, 2026No public adoption figure found

TollBit

Live

Add subdomains to make content accessible to AI with blocking and monetization.

Open profile Official site

TollgateAlso: MarketplaceCollect · RetrieveText

Access controlIt can control the protected access path, but cannot guarantee control of downstream copies or alternate routes.

Dec 16, 2025tollbit.comChecked Jun 24, 20264,000+ premium publishers

Created by Humans

Live

Rights-cleared book licensing platform for AI training, reference, and transformative use.

Open profile Official site

MarketplaceTrain · RetrieveText

Contractual accessTerms govern participating transactions; they do not control copies obtained elsewhere.

May 02, 2026createdbyhumans.aiChecked Jul 15, 2026Supported by 100+ bestselling authors

Mozilla Data Collective

Live

Community-centered dataset platform for sharing AI-relevant data under contributor-controlled licenses, access rules, and governance terms.

Open profile Official site

Governed data sharingAlso: Technical blockingTrain · Fine-tuneMultimodal

Governed sharingPractical force depends on participation, access rules, technical configuration, and governance after data is shared.

May 19, 2026community.mozilladatacollective.comChecked Jul 15, 2026896 datasets

CommonsDB

Live

Registry of signed, attributable rights declarations for public-domain and openly licensed works, linked through content-derived identifiers.

Open profile Official site

Rights registryCollect · Train · RetrieveMultimodal

Registry infrastructurePractical force depends on reliable identity, accurate declarations, adoption, and downstream use of the registry.

Jun 25, 2026commonsdb.orgChecked Jul 15, 20263.5M+ declarations

Fairly Trained

Live

Certification program for AI models and companies that meet stated consent and licensing criteria for training data.

Open profile Official site

CertificationTrainMultimodal

AttestationCertification provides an assessment or claim; it does not itself authorize, prevent, or enforce reuse.

Jul 16, 2026fairlytrained.orgChecked Jul 16, 202616+ announced certified entities

SyftBox

Live

Open-source protocol for privacy-preserving AI and analytics across distributed datasets without centralizing the underlying data.

Open profile Official site

Governed data sharingTrain · RetrieveMultimodal

Governed sharingPractical force depends on participation, access rules, technical configuration, and governance after data is shared.

Mar 28, 2026github.comChecked Jul 15, 2026No public adoption figure found

Adobe Content Authenticity

Live

Content Credentials-based preference system for asking supported generative AI models not to train on or use a creator's files.

Open profile Official site

Preference signalTrain · GenerateImages · Video

Voluntary signalIt only affects actors that detect and honor the signal; it does not itself prevent reuse.

Sep 02, 2025helpx.adobe.comChecked Jul 15, 2026No public adoption figure found

AI-Ready Licenses

In progress

Research-backed proposal for modular standard data licenses tailored to AI data sharing.

Open profile Official site

Formal licenseCollect · Train · Fine-tune · RetrieveMultimodal

Legal termsPractical effect depends on applicable law, rights ownership, notice, and contract formation.

Mar 17, 2025mlcommons.orgChecked Jun 24, 2026No public adoption figure found

Akamai Content Protector

Live

Enterprise anti-scraping product that detects and blocks persistent content scrapers, now positioned as part of broader AI and LLM bot management.

Open profile Official site

Technical blockingCollect · RetrieveWeb content

Technical controlIt can reduce access but may be bypassed and does not determine the legality of downstream use.

Apr 08, 2026akamai.comChecked Jun 24, 2026No public adoption figure found

Amlet

Live

AI content registry that links content-derived identifiers and timestamps to declarer identity, TDM permissions, and licensing preferences.

Open profile Official site

Rights registryAlso: Formal licenseCollect · Train · RetrieveText

Registry infrastructurePractical force depends on reliable identity, accurate declarations, adoption, and downstream use of the registry.

Dec 15, 2025blog.amlet.aiChecked Jul 15, 2026No public adoption figure found

Attribution-based control

In progress

OpenMined architecture for permissioned data contribution and attribution-based access in AI systems.

Open profile Official site

Protocol or standardRetrieveMultimodal

Protocol or standardPractical force depends on implementation, interoperability, adoption, and any legal or technical controls built around it.

Oct 06, 2025openmined.orgChecked Jun 24, 2026No public adoption figure found

Bria Artist Program / Licensed Training Catalog

Live

Contributor compensation and licensed visual training-data program tied to Bria's commercially safe generative AI stack.

Open profile Official site

MarketplaceTrainImages

Contractual accessTerms govern participating transactions; they do not control copies obtained elsewhere.

Sep 18, 2025bria.aiChecked Jun 24, 202630+ data partners

CCC AI Licensing Suite

Live

Live collective licenses for internal AI reuse and external AI training, alongside announced transactional AI rights for copyrighted works.

Open profile Official site

Licensing collectiveAlso: Formal licenseTrain · Fine-tune · Retrieve · GenerateText

Contractual licensingCoverage and enforcement depend on participation, represented rights, and the resulting agreements.

May 06, 2026copyright.comChecked Jul 15, 2026No public adoption figure found

CLA Generative AI Training Licence

Live

UK collective licensing offer for AI model training, fine-tuning, and RAG over published text content.

Open profile Official site

Licensing collectiveAlso: Formal licenseTrain · Fine-tune · RetrieveText

Contractual licensingCoverage and enforcement depend on participation, represented rights, and the resulting agreements.

Mar 01, 2026pls.org.ukChecked Jun 24, 2026No public adoption figure found

copyright.sh

Live

Machine-readable licensing layer that lets websites declare AI usage terms and pricing.

Open profile Official site

Formal licenseCollect · Train · RetrieveWeb content

Legal termsPractical effect depends on applicable law, rights ownership, notice, and contract formation.

Oct 28, 2025blog.copyright.shChecked Jun 24, 2026No public adoption figure found

Credtent

Live

Independent creative registry for recording AI opt-out requests, offering content licensing, and certifying human-created work.

Open profile Official site

Rights registryAlso: Marketplace · CertificationTrainMultimodal

Registry infrastructurePractical force depends on reliable identity, accurate declarations, adoption, and downstream use of the registry.

Jul 16, 2026credtent.orgChecked Jul 16, 2026thousands of creators

Dappier

Live

Rights-cleared content marketplace and monetization layer for RAG, assistants, and other AI applications.

Open profile Official site

MarketplaceAlso: TollgateRetrieve · GenerateText

Contractual accessTerms govern participating transactions; they do not control copies obtained elsewhere.

Aug 18, 2025dappier.comChecked Jun 24, 2026No public adoption figure found

DataSeeds.AI

Live

Multimodal dataset supplier offering off-the-shelf and custom image, video, and audio training data through Zedge and GuruShots creator networks.

Open profile Official site

MarketplaceTrain · Fine-tuneImages · Video · Audio

Contractual accessTerms govern participating transactions; they do not control copies obtained elsewhere.

Oct 20, 2025accessnewswire.comChecked Jul 15, 2026approximately 30 million rights-cleared images

Dataset Providers Alliance (DPA)

Live

Trade alliance of dataset licensors pushing for legal clarity, ethical sourcing, and scalable licensing markets for AI training data.

Open profile Official site

Governed data sharingTrain · Fine-tune · EvaluateMultimodal

Governed sharingPractical force depends on participation, access rules, technical configuration, and governance after data is shared.

Jul 09, 2026thedpa.aiChecked Jul 15, 202612 listed members

Defined.ai

Live

Marketplace for ethically sourced, annotated datasets used to train and fine-tune AI systems.

Open profile Official site

MarketplaceTrain · Fine-tuneMultimodal

Contractual accessTerms govern participating transactions; they do not control copies obtained elsewhere.

Jan 27, 2026defined.aiChecked Jun 24, 2026several partners generate $1M+/year

DeviantArt NoAI / NoImageAI

Live

Platform NoAI label that emits HTML and HTTP signals stating artwork is not authorized for third-party AI-training datasets.

Open profile Official site

Preference signalCollect · TrainImages

Voluntary signalIt only affects actors that detect and honor the signal; it does not itself prevent reuse.

Jul 09, 2026deviantartsupport.comChecked Jul 15, 2026No public adoption figure found

DIY robots handling (robots.txt++)

Live

robots.txt and HTTP crawler directives can express AI crawler preferences, including disallow rules, per-response X-Robots-Tag headers, and newer Content Signals for search, AI input, and AI training, plus an experimental content-use extension.

Open profile Official site

Preference signalCollect · Train · RetrieveWeb content

Voluntary signalIt only affects actors that detect and honor the signal; it does not itself prevent reuse.

Jul 01, 2026developers.cloudflare.comChecked Jul 15, 2026No public adoption figure found

Dow Jones Factiva AI Marketplace

Live

Licensed Factiva news and business-information sources made available for enterprise GenAI products, APIs, and research workflows.

Open profile Official site

MarketplaceAlso: Formal licenseRetrieve · GenerateText

Contractual accessTerms govern participating transactions; they do not control copies obtained elsewhere.

Mar 16, 2026sec.govChecked Jul 15, 20268,000+ licensed sources for GenAI use

easy-dataset-share

Live

A simple anti-scraping tool intended to protect datasets from basic crawlers/scrapers.

Open profile Official site

Technical blockingCollectMultimodal

Technical controlIt can reduce access but may be bypassed and does not determine the legality of downstream use.

Sep 15, 2025github.comChecked Jul 15, 2026No public adoption figure found

European Books Data Commons

In progress

Proposal for a commons-based infrastructure for large-scale access to digitized European books with conditional commercial access.

Open profile Official site

Governed data sharingCollect · TrainText

Governed sharingPractical force depends on participation, access rules, technical configuration, and governance after data is shared.

Nov 20, 2025openfuture.euChecked Jun 24, 2026No public adoption figure found

Fastly AI Bot Management

Live

Edge bot-management layer for detecting and blocking AI crawlers and fetchers that scrape website content.

Open profile Official site

Technical blockingCollect · RetrieveWeb content

Technical controlIt can reduce access but may be bypassed and does not determine the legality of downstream use.

Apr 16, 2026fastly.comChecked Jun 24, 2026No public adoption figure found

FlexOlmo

In progress

Distributed language-model training approach that lets data owners contribute experts without sharing raw data or giving up opt-out control.

Open profile Official site

Governed data sharingTrainText

Governed sharingPractical force depends on participation, access rules, technical configuration, and governance after data is shared.

Jul 09, 2025allenai.orgChecked Jun 24, 2026No public adoption figure found

GCX (Global Copyright Exchange)

Live

Music licensing platform for rights-cleared AI training data and audio assets.

Open profile Official site

MarketplaceTrain · Fine-tuneMusic

Contractual accessTerms govern participating transactions; they do not control copies obtained elsewhere.

Jul 16, 2026gcx.coChecked Jul 16, 20264.4M+ hours of audio / 32B metadata text pairs / 3PB music data

Gloo AI Licensing

Live

Licensed-content marketplace for the faith ecosystem, seeded with a pooled guarantee for AI assistants and search experiences.

Open profile Official site

MarketplaceTrain · Retrieve · GenerateText · Web content

Contractual accessTerms govern participating transactions; they do not control copies obtained elsewhere.

Jul 16, 2026docs.gloo.comChecked Jul 16, 2026$5M pooled guarantee

Hugging Face Gated Datasets

Live

Hugging Face Hub access-control feature requiring authenticated users to submit an access request before downloading; approval can be automatic or manual.

Open profile Official site

Technical blockingAlso: Governed data sharingCollect · Train · Fine-tuneMultimodal

Technical controlIt can reduce access but may be bypassed and does not determine the legality of downstream use.

Aug 18, 2025github.comChecked Jul 15, 2026No public adoption figure found

IAB Tech Lab CoMP

Live

Finalized v1 API and content-packaging framework through which already-licensed AI systems can request content; separate licensing and bot blocking are required.

Open profile Official site

Protocol or standardCollect · Train · RetrieveWeb content · Text

Protocol or standardPractical force depends on implementation, interoperability, adoption, and any legal or technical controls built around it.

Apr 28, 2026iabtechlab.comChecked Jul 15, 2026No public adoption figure found

IETF Web Bot Auth

In progress

Working group standardizing cryptographic authentication for bots and AI agents on the web.

Open profile Official site

Protocol or standardCollect · RetrieveWeb content

Protocol or standardPractical force depends on implementation, interoperability, adoption, and any legal or technical controls built around it.

Jul 01, 2026developers.cloudflare.comChecked Jul 15, 2026No public adoption figure found

IPTC + PLUS Data Mining Metadata

Live

Embedded image and video metadata fields for expressing data-mining preferences, including prohibitions on generative-AI training.

Open profile Official site

Preference signalCollect · Train · Fine-tuneImages · Video

Voluntary signalIt only affects actors that detect and honor the signal; it does not itself prevent reuse.

Mar 30, 2026iptc.orgChecked Jul 15, 2026No public adoption figure found

Microsoft Publisher Content Marketplace

In progress

Pilot marketplace for licensing premium publisher content to AI products, initially tested in enterprise and consumer Microsoft Copilot.

Open profile Official site

MarketplaceRetrieve · GenerateText

Contractual accessTerms govern participating transactions; they do not control copies obtained elsewhere.

Feb 03, 2026about.ads.microsoft.comChecked Jul 15, 20267 named co-design publisher organizations

News/Media Alliance AI Licensing Program

Live

Opt-in program that gives NMA publishers access to AI content licensing opportunities with Bria and ProRata.

Open profile Official site

Licensing collectiveAlso: Formal licenseRetrieve · GenerateText · Web content

Contractual licensingCoverage and enforcement depend on participation, represented rights, and the resulting agreements.

Jul 01, 2026newsmediaalliance.orgChecked Jul 15, 2026No public adoption figure found

ProRata / Gist

Live

A 50/50 revenue-share platform connecting publishers with AI companies, with 1,000+ publications and content creators listed as partners.

Open profile Official site

MarketplaceAlso: Licensing collective · TollgateRetrieve · GenerateText

Contractual accessTerms govern participating transactions; they do not control copies obtained elsewhere.

Sep 05, 2025businesswire.comChecked Jul 15, 20261,000+ publications and content creators

Protege

Live

AI training data platform for compliant exchange of proprietary, real-world datasets across sectors.

Open profile Official site

MarketplaceAlso: Governed data sharingTrain · Fine-tuneMultimodal

Contractual accessTerms govern participating transactions; they do not control copies obtained elsewhere.

Feb 12, 2026withprotege.aiChecked Jul 15, 2026hundreds of data-partner organizations

Publishers' Rights Organization

Live

Coalition pursuing licensing, compensation, and enforcement for publisher content used by AI systems.

Open profile Official site

Licensing collectiveTrain · RetrieveText

Contractual licensingCoverage and enforcement depend on participation, represented rights, and the resulting agreements.

May 02, 2026publishersrights.orgChecked Jul 15, 2026No public adoption figure found

Shutterstock Data Licensing & AI Services

Live

Rights-cleared multimodal data licensing and model-support services for generative AI developers and enterprise partners.

Open profile Official site

MarketplaceAlso: Formal licenseTrain · Fine-tune · Evaluate · Retrieve · GenerateMultimodal · Images · Video · Music

Contractual accessTerms govern participating transactions; they do not control copies obtained elsewhere.

Apr 01, 2026investor.shutterstock.comChecked Jun 24, 2026No public adoption figure found

Social License for Data Reuse

In progress

Participatory governance framework for communities to define conditions for data reuse, including AI training.

Open profile Official site

Governed data sharingCollect · Train · Fine-tune · RetrieveMultimodal

Governed sharingPractical force depends on participation, access rules, technical configuration, and governance after data is shared.

May 13, 2025blog.thegovlab.orgChecked Jun 24, 2026No public adoption figure found

SourceAudio AI Dataset Licensing

Live

Opt-in music dataset licensing program that pays rights holders for AI training use of tracks and catalogs.

Open profile Official site

MarketplaceTrain · Fine-tuneMusic

Contractual accessTerms govern participating transactions; they do not control copies obtained elsewhere.

May 06, 2026sourceaudio.comChecked Jun 24, 20263,000+ music catalogs · 14M+ opted-in songs · nearly $10M annual revenue from eight contracts

Spawning ai.txt

In progress

A proposed machine-readable opt-out convention for commercial AI training via an `ai.txt` file.

Open profile Official site

Preference signalCollect · Train · RetrieveWeb content

Voluntary signalIt only affects actors that detect and honor the signal; it does not itself prevent reuse.

Aug 28, 2025spawning.aiChecked Jul 15, 2026No public adoption figure found

Spawning Data Diligence

Live

Python package and API helpers for checking whether works are opted out before model training.

Open profile Official site

Protocol or standardCollect · Train · Fine-tuneMultimodal

Protocol or standardPractical force depends on implementation, interoperability, adoption, and any legal or technical controls built around it.

Jul 16, 2026pypi.orgChecked Jul 16, 2026No public adoption figure found

Spawning Do Not Train Registry

In progress

Registry and opt-out workflow for marking works that should not be used in future AI training datasets.

Open profile Official site

Rights registryAlso: Preference signalCollect · TrainImages

Registry infrastructurePractical force depends on reliable identity, accurate declarations, adoption, and downstream use of the registry.

Jul 16, 2026site.spawning.aiChecked Jul 16, 2026No public adoption figure found

SPUR (Standards for Publisher Usage Rights)

In progress

Publisher coalition building shared standards and licensing frameworks for responsible AI use of journalism.

Open profile Official site

Protocol or standardTrain · RetrieveText

Protocol or standardPractical force depends on implementation, interoperability, adoption, and any legal or technical controls built around it.

Jul 09, 2026spurcoalition.orgChecked Jul 15, 202636+ publishers and affiliate organizations

Stack Data Licensing

Live

Licensed access to Stack Overflow's developer knowledge corpus for AI training, fine-tuning, RAG, and agentic use cases.

Open profile Official site

MarketplaceAlso: Formal licenseTrain · Fine-tune · RetrieveText · Code

Contractual accessTerms govern participating transactions; they do not control copies obtained elsewhere.

May 14, 2026stackoverflow.coChecked Jun 24, 202683M+ human-verified questions and answers

TDM·AI

In progress

Asset-level protocol for binding machine-readable TDM and AI-training preferences to digital works.

Open profile Official site

Preference signalAlso: Protocol or standardCollect · Train · Fine-tuneMultimodal

Voluntary signalIt only affects actors that detect and honor the signal; it does not itself prevent reuse.

Nov 04, 2025docs.tdmai.orgChecked Jun 24, 2026No public adoption figure found

TDMRep (W3C Community Group)

Live

W3C Community Group report for expressing text and data mining rights reservations and policy links, designed to support the EU DSM Directive's Article 4 mechanism.

Open profile Official site

Preference signalAlso: Formal licenseCollect · TrainWeb content

Voluntary signalIt only affects actors that detect and honor the signal; it does not itself prevent reuse.

Oct 01, 2025w3.orgChecked Jul 15, 2026No public adoption figure found

TK Labels

Live

Local Contexts labels that let Indigenous communities express culturally specific conditions for access and reuse of knowledge and data.

Open profile Official site

Preference signalCollect · Retrieve · TrainMultimodal

Voluntary signalIt only affects actors that detect and honor the signal; it does not itself prevent reuse.

May 13, 2026localcontexts.orgChecked Jul 16, 2026No public adoption figure found

Troveo

Live

Marketplace and sourcing network for licensed, non-public real-world datasets for AI model training.

Open profile Official site

MarketplaceAlso: Formal licenseTrain · Fine-tune · EvaluateMultimodal · Video · Audio · Text

Contractual accessTerms govern participating transactions; they do not control copies obtained elsewhere.

Apr 28, 2026streetinsider.comChecked Jul 15, 20268M+ licensed video hours; 4M audio hours; billions of words; plus gaming, enterprise workflow, and robotics data · $20M+ paid to content owners

trust.txt

Live

Publisher-oriented trust file that can also declare whether AI training is allowed through a machine-readable `datatrainingallowed=` field.

Open profile Official site

Preference signalAlso: Protocol or standardTrainWeb content

Voluntary signalIt only affects actors that detect and honor the signal; it does not itself prevent reuse.

Jul 16, 2026journallist.netChecked Jul 16, 2026about 3,000 participating publishers

User Intents

In progress

Proposed AT Protocol mechanism for users to declare data-reuse preferences such as generative-AI training.

Open profile Official site

Preference signalCollect · Train · RetrieveMultimodal

Voluntary signalIt only affects actors that detect and honor the signal; it does not itself prevent reuse.

Mar 08, 2025github.comChecked Jun 24, 2026No public adoption figure found

vAIsual

Live

Marketplace for rights-managed visual and biometric datasets tailored to AI training and evaluation.

Open profile Official site

MarketplaceTrain · Fine-tune · EvaluateImages · Video

Contractual accessTerms govern participating transactions; they do not control copies obtained elsewhere.

Jul 09, 2026datasetshop.comChecked Jul 15, 2026400M+ dataset files

Veritone Data Marketplace

Live

Marketplace connecting rightsholders and accredited AI developers for governed, rights-cleared multimodal datasets.

Open profile Official site

MarketplaceAlso: Formal licenseTrain · Fine-tuneMultimodal · Video · Audio · Images

Contractual accessTerms govern participating transactions; they do not control copies obtained elsewhere.

Mar 10, 2026nasdaq.comChecked Jul 15, 2026No public adoption figure found

Versos AI Video Training Data Marketplace

Live

Platform for structuring, licensing, delivering, and tracking professional video datasets for AI model training.

Open profile Official site

MarketplaceAlso: Formal licenseTrain · Fine-tuneVideo

Contractual accessTerms govern participating transactions; they do not control copies obtained elsewhere.

Feb 23, 2026nasdaq.comChecked Jun 24, 202620+ studios and content owners · 1M+ hours of professional video content represented

Wikimedia Enterprise

Live

Enterprise-grade APIs and structured dumps for Wikipedia and sister projects, designed for large-scale reuse in AI, search, and knowledge graphs.

Open profile Official site

Governed data sharingRetrieve · TrainText · Structured data

Governed sharingPractical force depends on participation, access rules, technical configuration, and governance after data is shared.

Jul 09, 2026enterprise.wikimedia.comChecked Jul 15, 202610+ announced partners · 920+ datasets / 300M+ unique project pages

Looking for older projects? Browse archived initiatives.