Back to library

©️Understand Copyright in AI Training Data

Public web is not 'fair to train on,' and not every scrape is theft. Walk the four real threads — what copyright covers, how fair use is being argued, what licensing actually looks like, and which opt-out signals matter — then outline a sourcing policy you'd defend.

Foundations14 drops~2-week path · 5–8 min/daytechnology

Phase 1What copyright protects and what training does

What copyright protects and what training does

4 drops
  1. Copyright protects expression, not ideas — and that gap matters for AI

    7 min

    Copyright protects expression, not ideas — and that gap matters for AI

  2. What training mechanically does to a copyrighted work

    8 min

    What training mechanically does to a copyrighted work

  3. The four factors of fair use — the lens every case uses

    8 min

    The four factors of fair use — the lens every case uses

  4. US fair use, EU TDM exceptions, Japan's explicit carve-out

    8 min

    US fair use, EU TDM exceptions, Japan's explicit carve-out

Phase 2Map four lawsuits onto the fair-use factors

Map four lawsuits onto the fair-use factors

5 drops
  1. NYT v OpenAI — when memorization becomes evidence

    8 min

    NYT v OpenAI — when memorization becomes evidence

  2. Getty v Stability AI — when watermarks survive training

    7 min

    Getty v Stability AI — when watermarks survive training

  3. Andersen v Stability AI — when the artists sue, not the corporations

    7 min

    Andersen v Stability AI — when the artists sue, not the corporations

  4. Authors Guild v OpenAI — when novelists frame training as licensing-bypass

    8 min

    Authors Guild v OpenAI — when novelists frame training as licensing-bypass

  5. Pattern across the four cases — what builders should actually take away

    9 min

    Pattern across the four cases — what builders should actually take away

Phase 3Licensing, opt-outs, and provenance signals

Licensing, opt-outs, and provenance signals

4 drops
  1. Your CEO asks 'can we just license everything?'

    8 min

    Your CEO asks 'can we just license everything?'

  2. Your engineer says 'robots.txt is enough — we respect it'

    8 min

    Your engineer says 'robots.txt is enough — we respect it'

  3. Provenance: C2PA, watermarking, and 'where did this come from?'

    8 min

    Provenance: C2PA, watermarking, and 'where did this come from?'

  4. Synthetic data and 'clean' training corpora — the new escape hatch?

    8 min

    Synthetic data and 'clean' training corpora — the new escape hatch?

Phase 4Outline a defensible training-data sourcing policy

Outline a defensible training-data sourcing policy

1 drop
  1. Outline a training-data sourcing policy for a hypothetical generative product

    10 min

    Outline a training-data sourcing policy for a hypothetical generative product

Frequently asked questions

Is it legal to train an AI model on copyrighted material from the public web?
This is covered in the “Understand Copyright in AI Training Data” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
How does the four-factor fair-use test apply to training data?
This is covered in the “Understand Copyright in AI Training Data” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
What are the main AI training data lawsuits and what's actually being argued?
This is covered in the “Understand Copyright in AI Training Data” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
How do robots.txt, ai.txt, and C2PA 'do-not-train' signals differ?
This is covered in the “Understand Copyright in AI Training Data” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.
What should a training-data sourcing policy actually contain?
This is covered in the “Understand Copyright in AI Training Data” learning path. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.