<?xml version="1.0" encoding="UTF-8"?>
<rfc category="info" consensus="true" docName="draft-keller-aipref-vocab-01" ipr="trust200902" sortRefs="true" submissionType="IETF" symRefs="true" tocInclude="true" version="3" xmlns:xi="http://www.w3.org/2001/XInclude">

  <front>
    <title abbrev="Opt-Out Vocab">Proposal for an Opt-Out Vocabulary</title>
    <seriesInfo name="Internet-Draft" value="draft-keller-aipref-vocab-01"/>
    <author fullname="Paul Keller">
      <organization>Open Future</organization>
      <address>
        <email>paul@openfuture.eu</email>
      </address>
    </author>
    <date day="28" month="March" year="2025"/>
    <area>AREA</area>
    <workgroup>AI Preferences</workgroup>
    <keyword>AI Preferences</keyword>
    <keyword>Opt-Out</keyword>
    <keyword>Vocabulary</keyword>
    <abstract>
      <t>This document proposes a standardized vocabulary of use cases that can be targeted when expressing machine-readable opt-outs related to Text and Data Mining (TDM) and AI training. The vocabulary is agnostic to specific opt-out mechanisms and enables declaring parties to communicate restrictions or permissions regarding the use of their digital assets in a structured and interoperable manner. It defines three key use cases&#x2014;TDM, AI Training, and Generative AI Training&#x2014;which can be referenced by opt-out systems to ensure consistent interpretation across different implementations.</t>
    </abstract>
    <note removeInRFC="true">
      <name>About This Document</name>
      <t>The latest revision of this draft can be found at <eref target="https://paul2keller.github.io/opt-out-vocab-id/draft-keller-aipref-vocab.html"/>.
      Status information for this document may be found at <eref target="https://datatracker.ietf.org/doc/draft-keller-aipref-vocab/"/>.</t>
      <t>Discussion of this document takes place on the
      WG Working Group mailing list (<eref target="mailto:ai-control@ietf.org"/>),
      which is archived at <eref target="https://mailarchive.ietf.org/arch/browse/ai-control/"/>.
      Subscribe at <eref target="https://www.ietf.org/mailman/listinfo/ai-control/"/>.</t>
      <t>Source for this draft and an issue tracker can be found at
      <eref target="https://github.com/paul2keller/opt-out-vocab-id"/>.</t>
    </note>
  </front>

  <middle>

    <section anchor="introduction">
      <name>Introduction</name>
      <t>The purpose of this document is to provide a common vocabulary that can be used for machine-readable opt-outs by parties who wish to restrict the use of their assets for the purpose of AI training and other forms of Text and Data Mining (TDM).</t>
      <t>The elements of the vocabulary can be used to describe, in a standardized way, the types of uses that a declaring party may wish to restrict (or allow), thereby ensuring that opt-outs can be communicated, processed and stored in a consistent and interoperable manner.</t>
      <t>The vocabulary is agnostic to the technical implementations of opt-out systems and is designed to ensure that opt-out information can be effectively exchanged between different systems. The vocabulary is intended to govern the use of works in the context of training AI models and other forms of TDM but does not concern itself with the collection of training data (crawling). In particular the vocabulary is not intended for expressing instructions or restrictions related to crawling for the purpose of building a search index, as there are already more specific standards and protocols for this purpose including but not limited to <xref target="RFC9309"/>.</t>
      <t>The vocabulary is intended to both work in contexts where such opt-outs expressed to the declaring party give rise to legal obligation (such as rights reservation made by rightholders) and in contexts where this is not the case. It is without prejudice to applicable laws and the applicability of exceptions and limitations.</t>
    </section>

    <section anchor="conventions-and-definitions">
      <name>Conventions and Definitions</name>
      <t>The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL
      NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
      "<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as
      described in BCP&#xa0;14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they
      appear in all capitals, as shown here.</t>
    </section>

    <section anchor="definitions">
      <name>Definitions</name>
      <ul spacing="normal">
        <li>
          <t>
            <strong>Asset:</strong>
 A digital file or stream of data, usually with associated metadata.</t>
        </li>
        <li>
          <t>
            <strong>Declaring party:</strong>
 The entity that expresses an opt-out with regards to an Asset.</t>
        </li>
      </ul>
    </section>

    <section anchor="vocabulary-structure">
      <name>Vocabulary Structure</name>
      <t>The vocabulary consists of the overarching TDM (Text and Data Mining) category and a number of specific use cases that can be addressed independently. The overarching category <tt>TDM</tt> is based on the definition of Text and Data Mining in Article 2(2) of <xref target="EUCD2019"/>.</t>
    </section>

    <section anchor="proposed-vocabulary">
      <name>Proposed Vocabulary</name>
      <t>The following categories are defined for use in the opt-out vocabulary:</t>
      <ul spacing="normal">
        <li>
          <t>
            <strong>TDM</strong>
: Text and Data Mining. The act of using one or more assets in the context of any automated analytical technique aimed at analyzing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations.</t>
        </li>
        <li>
          <t>
            <strong>AI Training</strong>
: The act of training AI models</t>
        </li>
        <li>
          <t>
            <strong>Generative AI Training</strong>
: The act of training General Purpose AI models that have the capacity to generate text, images or other forms of synthetic content, or the act of training other types of AI models that have the purpose of generating text, images or other forms of synthetic content.</t>
        </li>
      </ul>
      <t>This list of specific use cases may be expanded in the future, should a consensus emerge between stakeholders, to include categories that address additional use cases as they emerge. In addition to these categories defined in the vocabulary, it is also expected that some systems implementing this vocabulary may extend this list with additional categories for their particular needs.</t>

      <section anchor="relationship-with-more-specific-instructions">
        <name>Relationship with more specific instructions</name>
        <t>The vocabulary does not preclude the use of other specific categories. Any opt-outs based on this vocabulary shall not be interpreted as restricting the use of the work(s) strictly for the purpose of search and discovery as long as no restriction is declared through search-specific means such as <xref target="RFC9309"/>.</t>
        <t>When using this vocabulary more specific instructions &#x2014; either based on the vocabulary or derived from other protocols &#x2014; should be given preference over less specific ones.</t>
      </section>

      <section anchor="relationship-between-categories">
        <name>Relationship between categories</name>
        <t>The TDM category is the overarching category that includes the AI training category. Generative AI training is a subset of the AI training category. Both AI training and generative AI training are considered to be forms of TDM. As such, when a Declaring Party opts out of TDM, they also opt out of these categories. AI model developers processing opt-outs must therefore interpret an opt-out from TDM to also mean an opt-out from Generative AI Training and AI Training.</t>
        <t>The figure below shows the relationship between the currently defined categories:</t>

        <figure>
          <name>Overview of proposed vocabulary</name>
          <artset>
            <artwork type="svg">
              <svg font-family="sans-serif" font-size="14" height="300" stroke-width="2" text-anchor="middle" version="1.1" viewBox="0 0 750 300" width="750" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
                <title>Opt-out vocabulary overview</title>
                <rect fill="white" height="220" rx="15" ry="15" stroke="black" width="630" x="50" y="50"/>
                <text x="365" y="80">Text and Data Mining</text>
                <rect fill="white" height="130" rx="10" ry="10" stroke="black" width="380" x="80" y="100"/>
                <text x="380" y="170">AI Training</text>
                <rect fill="white" height="70" rx="5" ry="5" stroke="black" width="200" x="110" y="130"/>
                <text x="210" y="170">Generative AI Training</text>
                <rect class="dotted" fill="white" height="130" rx="5" ry="5" stroke="black" stroke-dasharray="5,5" width="150" x="500" y="100"/>
                <text x="575" y="160">[possibly]:</text>
                <text x="575" y="180">additional use cases</text>
              </svg>
            </artwork>
            <artwork type="ascii-art"><![CDATA[
+--------------------------------------------------------------------------+
|                                                                          |
|                          Text and Data Mining (TDM)                      |
|                                                                          |
| +--------------------------------------------+  +- - - - - - - - - - -+  |
| |  +--------------------------+              |  |                     |  |
| |  |                          |              |                           |
| |  |                          |              |  |    [possibly]:      |  |
| |  | Generative AI Training   |  AI Training |                           |
| |  |                          |              |  |  Other use cases    |  |
| |  |                          |              |                           |
| |  +--------------------------+              |  |                     |  |
| +--------------------------------------------+  +- - - - - - - - - - -+  |
|                                                                          |
+--------------------------------------------------------------------------+
]]></artwork>
          </artset>
        </figure>
        <t>Systems referencing the vocabulary must not introduce additional categories that include existing categories defined in the vocabulary or otherwise include additional hierarchical relationships.</t>
      </section>
    </section>

    <section anchor="usage">
      <name>Usage</name>
      <t>The vocabulary may be used by declaring that an opt-out system or entity expressing or processing opt-outs uses the terms defined in the "Proposed Vocabulary" section above, directly or via mappings, in accordance with how they are defined in this document.</t>
    </section>

    <section anchor="security-considerations">
      <name>Security Considerations</name>
      <t>TODO Security</t>
    </section>

    <section anchor="iana-considerations">
      <name>IANA Considerations</name>
      <t>This document has no IANA actions.</t>
    </section>
  </middle>

  <back>

    <references anchor="sec-combined-references">
      <name>References</name>

      <references anchor="sec-normative-references">
        <name>Normative References</name>
        <reference anchor="RFC9309">

          <front>
            <title>Robots Exclusion Protocol</title>
            <author fullname="M. Koster" initials="M." surname="Koster"/>
            <author fullname="G. Illyes" initials="G." surname="Illyes"/>
            <author fullname="H. Zeller" initials="H." surname="Zeller"/>
            <author fullname="L. Sassman" initials="L." surname="Sassman"/>
            <date month="September" year="2022"/>
            <abstract>
              <t>This document specifies and extends the "Robots Exclusion Protocol" method originally defined by Martijn Koster in 1994 for service owners to control how content served by their services may be accessed, if at all, by automatic clients known as crawlers. Specifically, it adds definition language for the protocol, instructions for handling errors, and instructions for caching.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="9309"/>
          <seriesInfo name="DOI" value="10.17487/RFC9309"/>
        </reference>
        <reference anchor="RFC2119">

          <front>
            <title>Key words for use in RFCs to Indicate Requirement Levels</title>
            <author fullname="S. Bradner" initials="S." surname="Bradner"/>
            <date month="March" year="1997"/>
            <abstract>
              <t>In many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.</t>
            </abstract>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="2119"/>
          <seriesInfo name="DOI" value="10.17487/RFC2119"/>
        </reference>
        <reference anchor="RFC8174">

          <front>
            <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
            <author fullname="B. Leiba" initials="B." surname="Leiba"/>
            <date month="May" year="2017"/>
            <abstract>
              <t>RFC 2119 specifies common key words that may be used in protocol specifications. This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings.</t>
            </abstract>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="8174"/>
          <seriesInfo name="DOI" value="10.17487/RFC8174"/>
        </reference>
      </references>

      <references anchor="sec-informative-references">
        <name>Informative References</name>
        <reference anchor="EUCD2019" target="https://eur-lex.europa.eu/eli/dir/2019/790/oj">

          <front>
            <title>Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market</title>
            <author>
              <organization>European Union</organization>
            </author>
            <date day="17" month="May" year="2019"/>
          </front>
        </reference>
      </references>
    </references>

    <section anchor="acknowledgments" numbered="false">
      <name>Acknowledgments</name>
      <t>The following individuals have been involved in the drafting of the proposal:</t>
      <ul spacing="normal">
        <li>
          <t>Cullen Miller, Spawing.ai</t>
        </li>
        <li>
          <t>Sebastian Posth, Liccium</t>
        </li>
        <li>
          <t>Leonard Rosenthol, Adobe</t>
        </li>
        <li>
          <t>Laurent Le Meur, EDRLab</t>
        </li>
      </ul>
    </section>
  </back>
</rfc>
