<?xml version="1.0" encoding="UTF-8"?>
<rfc category="exp" consensus="true" docName="draft-liao-aipref-autoctl-core-01" ipr="trust200902" sortRefs="true" submissionType="IETF" symRefs="true" tocInclude="true" version="3" xmlns:xi="http://www.w3.org/2001/XInclude">
  <front>
    <title abbrev="aipref-autoctl">Protocol for Automation Control</title>
    <seriesInfo name="Internet-Draft" value="draft-liao-aipref-autoctl-core-01"/>
    <author fullname="Liao Peiyuan">
      <organization>Condé Nast</organization>
      <address>
        <postal>
          <country>United States of America</country>
        </postal>
        <email>peiyuan_liao@condenast.com</email>
      </address>
    </author>
    <date year="2025" month="April" day="20"/>
    <area>Applications</area>
    <workgroup>AI Preferences</workgroup>
    <keyword>AI Preferences</keyword>
    <keyword>Automation Control</keyword>
    <keyword>Web Automation</keyword>
    <abstract>
      <t>
        This document specifies a machine-readable protocol for server-side automation permissions
        in the light of recent advances in AI-driven web automation. Building upon RFC9309,
        this protocol addresses a broader range of state-changing activities that service
        owners may wish to control. It defines the file format, HTTP method restrictions,
        and purpose requirements.
      </t>
    </abstract>
    <note removeInRFC="true">
      <name>About This Document</name>
      <t>The latest revision of this draft can be found at <eref target="https://datatracker.ietf.org/doc/draft-liao-aipref-autoctl-core/"/>.
      Status information for this document may be found at <eref target="https://datatracker.ietf.org/doc/draft-liao-aipref-autoctl-core/"/>.</t>
      <t>Discussion of this document takes place on the
      AI Preferences Working Group mailing list (<eref target="mailto:ai-control@ietf.org"/>),
      which is archived at <eref target="https://mailarchive.ietf.org/arch/browse/ai-control/"/>.
      Subscribe at <eref target="https://www.ietf.org/mailman/listinfo/ai-control/"/>.</t>
    </note>
  </front>

  <middle>
    <section anchor="introduction">
      <name>Introduction</name>
      <t>
        The evolution of web automation has outpaced the capabilities of existing
        standards that only provide for read-only crawler permissions.
        Sophisticated, AI-driven bots are now able to interact with web servers
        in a complex, human-like manner, allowing them to deeply acquire and
        modify content. This document introduces a protocol that enables service
        owners to declare policies governing such interactions, notably
        state-changing HTTP methods.
      </t>

      <section anchor="applicability">
        <name>Applicability</name>
        <t>
          The <tt>automation-preferences.txt</tt> <em>file</em> applies to automated systems interacting with
          web servers, especially those driven by AI models. Content owners may use this file
          to specify acceptable automation behaviors, and developers of automated systems may use
          these directives to ensure compliance.
        </t>
      </section>

      <section anchor="relationship-to-extension-specification">
        <name>Relationship to Extension Specification</name>
        <t>
          A separate document, "Protocol Extension for Automation Control,"
          extends this document with additional directives and capabilities, focusing on
          a wider range of state-changing web requests.
        </t>
        <t>
          Implementations conforming to only this specification are considered compliant
          with the protocol.
        </t>
        <t>
          This protocol <em>augments</em>, and does not relax, the directives of
          <xref target="RFC9309"/>.
          If a path is disallowed in <tt>robots.txt</tt>, an
          <tt>automation-preferences.txt</tt> directive <bcp14>MUST NOT</bcp14> be
          interpreted to expand access. Conversely, when
          <tt>robots.txt</tt> explicitly allows any HTTP method on a path,
          an <tt>automation-preferences.txt</tt> directive that disallows
          state-changing requests to that same path <bcp14>SHALL</bcp14> take
          precedence; the more restrictive rule <bcp14>MUST</bcp14> be enforced.
        </t>
      </section>
    </section>

    <section anchor="conventions-and-definitions">
      <name>Conventions and Definitions</name>
      <t>The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL
      NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
      "<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as
      described in BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they
      appear in all capitals, as shown here.</t>
      <t>
        The following terms are used in this document:
      </t>
      <ul spacing="normal">
        <li>Automation: Programmatic interactions with a web server.</li>
        <li>State-changing requests: HTTP methods that alter server state
           (e.g., POST, PUT, DELETE, PATCH).</li>
        <li>Automation purpose: The declared intent or use case for which
           automation is being performed.</li>
      </ul>
      <t>
        All terminology defined in this core specification applies to the extension
        specification without redefinition. The extension specification may introduce
        additional terms for concepts not covered in this document.
      </t>
    </section>

    <section anchor="protocol-specification">
      <name>Protocol Specification</name>
      <section anchor="file-location-and-format">
        <name>File Location and Format</name>
        <t>
          The <tt>automation-preferences.txt</tt> file <bcp14>MUST</bcp14> be hosted at the root of the domain, in the
          same manner as <xref target="RFC9309" format="default" sectionFormat="of"/>. The file is structured as a series of key-value
          pairs that specify automation permissions.
        </t>
        <t>
          The file <bcp14>MUST</bcp14> be served with the media type
          <tt>text/plain; charset=utf-8</tt>.
          Lines beginning with the hash symbol (#) are considered comments and
          <bcp14>MUST</bcp14> be ignored by parsers. Each directive consists of a
          field name, followed by a colon, followed by a value. Multiple values
          <bcp14>MAY</bcp14> be separated by commas. Parsers <bcp14>MUST</bcp14> silently ignore
          any directives they do not recognize.
        </t>
        <t>
          Implementations <bcp14>SHOULD</bcp14> reject files that contain raw control
          bytes (code points &lt; U+0020) other than the permitted CR (U+000D) and
          LF (U+000A) line breaks.
        </t>
      </section>

      <section anchor="http-method-restrictions">
        <name>HTTP Method Restrictions</name>
        <t>
          The protocol <bcp14>MUST</bcp14> explicitly list allowed HTTP methods using
          the <tt>allowed-methods</tt> directive. Typically, GET and
          HEAD are permitted while methods such as POST, PUT, DELETE, and PATCH are
          disallowed for automated processing. Methods that are not listed are
          assumed to be disallowed.
        </t>
        <t>
          Example:
        </t>
        <figure>
          <artwork><![CDATA[
allowed-methods: GET, HEAD
<!-- Note: This example shows basic method allowance -->
          ]]></artwork>
        </figure>
        <t>
          If a group omits the <tt>allowed-methods</tt> directive, <em>all</em> HTTP
          methods are considered disallowed for automated processing within that scope.
        </t>
      </section>

      <section anchor="purpose-declaration">
        <name>Purpose Declaration</name>
        <t>
          The protocol allows server owners to accept or reject user agents with
          specific purposes with the <tt>allowed-purposes</tt> directive, which
          accepts a comma-separated list of permitted purposes using
          standardized vocabulary terms.
        </t>
        <t>
          The specific vocabulary terms for automation purposes are not defined
          in this document. Instead, this protocol provides a mechanism for
          expressing allowed and disallowed purposes, which can be compatible
          with an accepted vocabulary standard in the future.
        </t>
        <t>
          Example:
        </t>
        <figure>
          <artwork><![CDATA[
allowed-purposes: PLACEHOLDER_PURPOSE1, PLACEHOLDER_PURPOSE2
<!-- Note: Placeholder purposes are used here -->
          ]]></artwork>
        </figure>
      </section>

      <section anchor="scope-and-applicability">
        <name>Scope and Applicability</name>
        <t>
          The <tt>automation-preferences.txt</tt> file is divided into groups, each of which applies to a specific
          subset of content. Each group begins with one or more scope directives that define the target of the
          preferences. The following directives <bcp14>MAY</bcp14> be used within a group:
        </t>
        <ul spacing="normal">
          <li><tt>scope</tt>: Specifies the URL pattern (e.g., <tt>/admin/</tt>) to which the group
             applies. Wildcards <bcp14>MAY</bcp14> be used to indicate variable components of the URL.</li>
          <li><tt>host</tt>: Specifies a subdomain or host. If present, the group applies only
             to the indicated subdomain; if omitted, the group is assumed to apply to the
             entire host.</li>
          <li><tt>user-agent</tt>: Specifies one or more automation user-agent
             tokens to which the group applies. If omitted, the group applies
             to all user agents.</li>
        </ul>
        <t>
          Groups are processed in order of specificity. Specificity is determined by
          (1) an <em>exact <tt>host</tt> match</em> over a wildcard or absent host;
          (2) the <em>longest matching <tt>scope</tt> directive</em>;
          (3) the <em>most specific <tt>user-agent</tt> token</em> (an exact token
              outranks the wildcard <tt>*</tt>);
          then (4) appearance order, where the group appearing later in the file
          <bcp14>SHALL</bcp14> take precedence when all preceding criteria are equal.
        </t>
        <t>
          A group <bcp14>MUST</bcp14> contain at least one <tt>scope</tt> directive
          and <bcp14>MAY</bcp14> omit all other directives.
        </t>
        <t>
          Example:
        </t>
        <figure>
          <artwork><![CDATA[
<!-- Group 1: Applies to the entire site -->
user-agent: *
host: example.com
scope: /
allowed-methods: GET, HEAD

<!-- Group 2: Specific preferences for the /admin/ path -->
user-agent: *
host: example.com
scope: /admin/
allowed-methods: GET
          ]]></artwork>
        </figure>
      </section>
    </section>

      <section anchor="formal-syntax" numbered="true" toc="include">
    <name>Formal Syntax</name>
    <t>
      Below is an Augmented Backus-Naur Form (ABNF) description, as defined in
      <xref target="RFC5234" format="default" sectionFormat="of"/>, with references to
      <xref target="RFC3629" format="default" sectionFormat="bare"/>,
      <xref target="RFC3986" format="default" sectionFormat="bare"/>, and
      <xref target="RFC9309" format="default" sectionFormat="bare"/>.
    </t>
     <sourcecode type="abnf" name="automation-preferences">
automation-preferences = *( group )

group            = 1*scope-directive           ; at least one &lt;scope&gt;
                   *( directive / emptyline )
                   1*emptyline                 ; blank line terminates group

directive        = scope-directive / host-directive /
                   user-agent-directive /
                   method-directive / purpose-directive

; --- individual directives ----------------------------------

scope-directive   = *WS "scope"            *WS ":" *WS url-pattern    EOL
host-directive    = *WS "host"             *WS ":" *WS host-pattern   EOL
method-directive  = *WS "allowed-methods"  *WS ":" *WS method-list    EOL
purpose-directive = *WS "allowed-purposes" *WS ":" *WS purpose-list   EOL
user-agent-directive  = *WS "user-agent" *WS ":" *WS product-token
                        *( *WS "," *WS product-token ) EOL

; --- directive value syntax ---------------------------------

; url-pattern: visible ASCII or UTF-8, implementers SHOULD interpret it
;              as an RFC 3986 path/authority matcher with wildcards
url-pattern     = 1*( VCHAR / UTF8-char-noctl )

; host-pattern: UTF-8 hostname or wildcard; servers SHOULD match IDNA U-labels
host-pattern    = 1*( ALPHA / DIGIT / "-" / "." / UTF8-char-noctl )

method-list     = method *( *WS "," *WS method )
method          = "GET" / "HEAD" / "POST" / "PUT" /
                  "DELETE" / "PATCH" / "OPTIONS" /
                  "TRACE" / "CONNECT"

purpose-list    = purpose-token *( *WS "," *WS purpose-token )
purpose-token   = 1*VCHAR   ; placeholder for future vocabulary

; product-token: derived from RFC 9309
product-token         = identifier / "*"
identifier            = 1*( %x2D / %x41-5A / %x5F / %x61-7A )

; --- lexical primitives -------------------------------------

comment         = "#" *( UTF8-char-noctl / WS / "#" )

emptyline       = *WS [comment] EOL
EOL             = *WS [comment] NL
NL              = CRLF / LF / CR
CRLF            = CR LF
CR              = %x0D
LF              = %x0A
WS              = SP / HTAB
SP              = %x20
HTAB            = %x09

; --- core ABNF terminals (RFC 5234) --------------------------

ALPHA           = %x41-5A / %x61-7A
DIGIT           = %x30-39
VCHAR           = %x21-7E

; --- UTF-8 (derived from RFC 3629) ---------------------------

UTF8-char-noctl = UTF8-1-noctl / UTF8-2 / UTF8-3 / UTF8-4
UTF8-1-noctl    = %x21 / %x22 / %x24-7F
UTF8-2          = %xC2-DF UTF8-tail
UTF8-3          = %xE0 %xA0-BF UTF8-tail
                / %xE1-EC UTF8-tail-2
                / %xED %x80-9F UTF8-tail
                / %xEE-EF UTF8-tail-2
UTF8-4          = %xF0 %x90-BF UTF8-tail-2
                / %xF1-F3 UTF8-tail-3
                / %xF4 %x80-8F UTF8-tail-2
UTF8-tail       = %x80-BF
UTF8-tail-2     = UTF8-tail UTF8-tail
UTF8-tail-3     = UTF8-tail UTF8-tail UTF8-tail

</sourcecode>
  </section>


    <section anchor="extension-mechanism">
      <name>Extension Mechanism</name>
      <t>
        The <tt>automation-preferences.txt</tt> file defines a forward-compatible approach where implementations:
      </t>
      <ul spacing="normal">
        <li><bcp14>MUST</bcp14> process all recognized directives according to this specification</li>
        <li><bcp14>MUST</bcp14> silently ignore any unrecognized directives</li>
        <li><bcp14>MUST</bcp14> NOT fail or produce errors when encountering extended directives</li>
      </ul>
      <t>
        This enables future extensions to add new capabilities, which may include:
      </t>
      <ul spacing="normal">
        <li>Rate limiting controls</li>
        <li>Automation technology restrictions</li>
        <li>API and XHR permissions</li>
        <li>Session requirements</li>
        <li>Asset-level annotation methods</li>
      </ul>
    </section>

    <section anchor="implementation-and-enforcement">
      <name>Implementation and Enforcement</name>
      <t>
        Servers implementing this protocol <bcp14>SHOULD</bcp14>:
      </t>
      <ul spacing="normal">
        <li>Verify incoming requests against <tt>automation-preferences.txt</tt>.</li>
        <li>Respond with appropriate HTTP status codes (e.g., 403 Forbidden)
           <xref target="RFC9110"/> for non-compliant requests.</li>
      </ul>
      <t>
        Clients consuming this protocol <bcp14>SHOULD</bcp14>:
      </t>
      <ul spacing="normal">
        <li>Fetch and parse <tt>automation-preferences.txt</tt> before performing automated
           operations.</li>
        <li>Honor the HTTP method restrictions specified.</li>
        <li>Declare their automation purpose.</li>
        <li>Respect the scope directives when performing operations on different paths.</li>
      </ul>
      <t>
        Implementations <bcp14>MAY</bcp14> cache
        <tt>automation-preferences.txt</tt> in accordance with
        <xref target="RFC9111"/>; all freshness calculations are governed
        solely by standard HTTP cache-control semantics.
      </t>
    </section>

    <section anchor="security-considerations">
      <name>Security Considerations</name>
      <t>
        The use of <tt>automation-preferences.txt</tt> introduces security
        considerations that <bcp14>SHOULD</bcp14> be assessed by implementors:
      </t>
      <ul spacing="normal">
        <li>Parsing of <tt>automation-preferences.txt</tt> <bcp14>MUST</bcp14> be performed
           securely to prevent vulnerabilities such as buffer overruns and
           denial-of-service attacks.</li>
        <li>Care <bcp14>SHOULD</bcp14> be taken to avoid exposing sensitive scope details that
           could be exploited by adversaries.</li>
        <li>The protocol does not provide authentication or cryptographic
           verification mechanisms for the file content. Servers <bcp14>SHOULD</bcp14>
           ensure the file is served via secure connections to prevent tampering.</li>
        <li>The protocol does not enforce client compliance; it relies on
           good-faith adherence by automation providers. Servers <bcp14>SHOULD</bcp14>
           implement additional detection and enforcement mechanisms when needed.</li>
      </ul>
    </section>

    <section anchor="iana-considerations">
      <name>IANA Considerations</name>
      <t>This document has no IANA actions.</t>
    </section>
  </middle>

  <back>
    <references>
      <name>References</name>
      <references anchor="normative">
        <name>Normative References</name>
        <reference anchor="RFC2119">
          <front>
            <title>Key words for use in RFCs to Indicate Requirement Levels</title>
            <author initials="S." surname="Bradner" fullname="Scott Bradner">
              <organization>Harvard University</organization>
            </author>
            <date year="1997" month="March" />
          </front>
          <seriesInfo name="BCP" value="14" />
          <seriesInfo name="RFC" value="2119" />
          <seriesInfo name="DOI" value="10.17487/RFC2119" />
        </reference>
        <reference anchor="RFC8174">
          <front>
            <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
            <author initials="B." surname="Leiba" fullname="Barry Leiba">
              <organization>Huawei Technologies</organization>
            </author>
            <date year="2017" month="May" />
          </front>
          <seriesInfo name="BCP" value="14" />
          <seriesInfo name="RFC" value="8174" />
          <seriesInfo name="DOI" value="10.17487/RFC8174" />
        </reference>
        <reference anchor="RFC5234" target="https://www.rfc-editor.org/info/rfc5234" quoteTitle="true" derivedAnchor="RFC5234">
          <front>
            <title>Augmented BNF for Syntax Specifications: ABNF</title>
            <author fullname="D. Crocker" initials="D." role="editor" surname="Crocker"/>
            <author fullname="P. Overell" initials="P." surname="Overell"/>
            <date month="January" year="2008"/>
          </front>
          <seriesInfo name="STD" value="68"/>
          <seriesInfo name="RFC" value="5234"/>
          <seriesInfo name="DOI" value="10.17487/RFC5234"/>
        </reference>
        <reference anchor="RFC3629" target="https://www.rfc-editor.org/info/rfc3629" quoteTitle="true" derivedAnchor="RFC3629">
          <front>
            <title>UTF-8, a transformation format of ISO 10646</title>
            <author fullname="F. Yergeau" initials="F." surname="Yergeau"/>
            <date month="November" year="2003"/>
          </front>
          <seriesInfo name="STD" value="63"/>
          <seriesInfo name="RFC" value="3629"/>
          <seriesInfo name="DOI" value="10.17487/RFC3629"/>
        </reference>
        <reference anchor="RFC9309" target="https://www.rfc-editor.org/info/rfc9309" quoteTitle="true" derivedAnchor="RFC9309">
          <front>
            <title>Robots Exclusion Protocol</title>
            <author initials="M." surname="Koster" fullname="Martijn Koster"/>
            <author initials="G." surname="Illyes" fullname="Gary Illyes"/>
            <author initials="H." surname="Zeller" fullname="Henner Zeller"/>
            <author initials="L." surname="Sassman" fullname="Lizzi Sassman"/>
            <date year="2022" month="September" />
          </front>
          <seriesInfo name="RFC" value="9309" />
          <seriesInfo name="DOI" value="10.17487/RFC9309" />
        </reference>
        <reference anchor="RFC9111" target="https://www.rfc-editor.org/info/rfc9111" quoteTitle="true" derivedAnchor="RFC9111">
          <front>
            <title>HTTP Caching</title>
            <author fullname="R. Fielding" initials="R." role="editor" surname="Fielding"/>
            <author fullname="M. Nottingham" initials="M." role="editor" surname="Nottingham"/>
            <author fullname="J. Reschke" initials="J." role="editor" surname="Reschke"/>
            <date month="June" year="2022"/>
          </front>
          <seriesInfo name="STD" value="98"/>
          <seriesInfo name="RFC" value="9111"/>
          <seriesInfo name="DOI" value="10.17487/RFC9111"/>
        </reference>
        <reference anchor="RFC9110" target="https://www.rfc-editor.org/info/rfc9110" quoteTitle="true" derivedAnchor="RFC9110">
          <front>
            <title>HTTP Semantics</title>
            <author fullname="R. Fielding" initials="R." role="editor" surname="Fielding"/>
            <author fullname="M. Nottingham" initials="M." role="editor" surname="Nottingham"/>
            <author fullname="J. Reschke" initials="J." role="editor" surname="Reschke"/>
            <date month="June" year="2022"/>
          </front>
          <seriesInfo name="STD" value="97"/>
          <seriesInfo name="RFC" value="9110"/>
          <seriesInfo name="DOI" value="10.17487/RFC9110"/>
        </reference>
        <reference anchor="RFC3986" target="https://www.rfc-editor.org/info/rfc3986" quoteTitle="true" derivedAnchor="RFC3986">
          <front>
            <title>Uniform Resource Identifier (URI): Generic Syntax</title>
            <author fullname="T. Berners-Lee" initials="T." surname="Berners-Lee"/>
            <author fullname="R. Fielding" initials="R." surname="Fielding"/>
            <author fullname="L. Masinter" initials="L." surname="Masinter"/>
            <date month="January" year="2005"/>
          </front>
          <seriesInfo name="STD" value="66"/>
          <seriesInfo name="RFC" value="3986"/>
          <seriesInfo name="DOI" value="10.17487/RFC3986"/>
        </reference>
      </references>
      <references anchor="informative">
        <name>Informative References</name>
      </references>
    </references>

    <section numbered="false" anchor="sample-automation-preferences-txt-file">
      <name>Sample automation-preferences.txt File</name>
      <t>
        The following is an example of an automation-preferences.txt file that adheres to this
        specification:
      </t>
      <figure>
        <artwork><![CDATA[
<!-- Automation preferences for example.com -->
<!-- Version: 1.0 -->
<!-- Last updated: 2025-04-20 -->

<!-- Group 1: Applies to the entire site -->
scope: /
user-agent: *
host: example.com
allowed-methods: GET, HEAD
allowed-purposes: PLACEHOLDER_PURPOSE1, PLACEHOLDER_PURPOSE2

<!-- Group 2: Specific preferences for the /admin/ path -->
scope: /admin/
user-agent: ExampleBot
host: example.com
allowed-methods: GET
allowed-purposes: PLACEHOLDER_PURPOSE1

]]></artwork>
      </figure>
    </section>
  </back>
</rfc>