The Core Challenge

AI systems are trained on vast datasets often scraped without permission or compensation to creators. This raises questions about legal exposure, fair compensation, and the sustainability of the creative ecosystem.

Key Concepts

Training data provenance Documentation of where AI training data comes from and what usage rights exist.
Text and data mining (TDM) Automated processing of content to extract information, often for AI training.
Opt-out rights Creators' ability to prevent their work from being used in AI training.
Model outputs and copyright Questions about who owns AI-generated content and whether it infringes existing copyright.
Creative ecosystem sustainability The concern that AI may undermine the economic viability of human creation.

Warning Signs

Watch for these indicators of IP and sustainability risks:

  • Training data sources are unknown or undocumented
  • No due diligence has been done on data provenance for AI systems in use
  • AI-generated content is used without considering copyright implications
  • The organisation is dependent on AI systems with unclear data rights
  • There's no policy on respecting creator opt-out preferences
  • Procurement doesn't consider vendors' data sourcing practices

Questions to Ask in AI Project Reviews

  • "Where does the training data come from? What rights have been secured?"
  • "Has due diligence been done on the data provenance of this model?"
  • "What happens if the legal landscape on AI training data changes?"

Questions to Ask in Governance Discussions

  • "What's our position on using AI systems with unclear data provenance?"
  • "How are we ensuring AI-generated content doesn't create copyright exposure?"
  • "What's our policy on respecting creator opt-out preferences?"

Questions to Ask in Strategy Sessions

  • "How dependent is our AI strategy on data sources that may face legal challenges?"
  • "Are we building sustainable relationships with content creators or extracting value unsustainably?"
  • "What contingencies exist if training data rights become more restricted?"

Reflection Prompts

  1. Your awareness: For AI systems you use, do you know where the training data came from?
  2. Your organisation's exposure: What legal or reputational risks might exist from AI data sourcing practices?
  3. Your position: What do you believe is the right balance between AI innovation and creator rights?

Good Practice Checklist

  • Training data provenance is documented for AI systems in use
  • Due diligence on data rights is part of AI procurement
  • Policy exists on respecting creator opt-out preferences
  • AI-generated content use considers copyright implications
  • Relationships with content creators are sustainable, not extractive
  • Contingency planning addresses potential data rights restrictions

Quick Reference

Element Question to Ask Red Flag
Provenance Where does training data come from? Unknown or "the internet"
Rights What usage rights exist? Assumed or unverified
Outputs Who owns AI-generated content? Never considered
Sustainability Impact on creator ecosystem? Not considered
Contingency What if data rights change? No plan exists

The Evolving Legal Landscape

UK position: Government has signalled support for copyright exceptions enabling AI training, while seeking balance with creator rights. Details remain contested.

EU position: More restrictive approach requiring respect for opt-out rights and greater transparency about training data.

Litigation: Major lawsuits are ongoing globally. The legal landscape may shift significantly.

Risk implication: Organisations should understand their exposure and have contingency plans for different legal outcomes.