Agent Operation Authorization

Agent Operation Authorization Alibaba

max.ldp@alibaba-inc.com

Alibaba

hongru.zhr@alibaba-inc.com

Security oauth This document specifies the Agent Operation Authorization framework — a structured mechanism that enables verifiable delegation of actions from human principals to autonomous AI agents with fine-grained agent operation authorization. The framework introduces two distinct phases:

Agent Operation Authorization Request: A human-readable proposal of operations derived from natural language input and converted to a JSON Web Token (JWT).
Agent Operation Authorization Token: A JSON Web Token representing confirmed authorization for a specific agent operation, enforceable at runtime by agents and verifiers. It cryptographically verifies user intent, prevents unauthorized or hallucinated actions, and ensures auditable traceability of each authorized operation.

In agent-based systems, especially those involving generative capabilities, it is essential to convey not only what actions are permitted but also the original intent behind them and conditions under which an autonomous agent may act on behalf of a principal. This document specifies the Agent Operation Authorization framework — a mechanism that enables verifiable delegation of actions from human principals to autonomous AI agents with fine-grained agent operation authorization. The framework includes Agent Operation Authorization Proposal and Agent Operation Authorization phases. This specification defines a new top-level JSON Web Token (JWT) claim, agent_operation, which contains fine-grained and structured operational parameters including agent_operations, constraints, and conditions. Additionally, it supports inclusion of a user-provided prompt whose authenticity is protected via a W3C Verifiable Credential (VC). The AI agent captures the user’s natural-language instruction during interaction, constructs a structured agent_operation_proposal object,includes a prompt evidence subfield carrying the user's natural-language instruction in the form of a JWT-based Verifiable Credential (JWT-VC), and submits the resulting JWT to the Authorization Server (AS) via OAuth 2.0 Pushed Authorization Requests (PAR) [RFC9126]. This design ensures that downstream verifiers can validate both the policy boundaries and the provenance of the initiating instruction, without dependency on Decentralized Identifiers (DIDs). This enables secure, auditable delegation for autonomous AI Agent. Upon successful user confirmation and authentication of the Authorization Proposal during the first phase, the Authorization Server (AS) SHALL issue an Agent Operation Authorization Token. This token serves as the access token for subsequent interactions. The agent MUST present this JWT access token when accessing protected resources at the AS, using the mechanisms defined in OAuth 2.0 [RFC6749] and bearer token usage rules [RFC6750]. Together, these components ensure that AI systems act only within user-approved boundaries, mitigating risks such as hallucination. It is designed for use in autonomous AI Agent system, multi-agent orchestration, and regulated domains such as finance, healthcare, and public services — particularly where accountability and auditability are important. The framework supports enterprise identity providers, and zero-trust architectures.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

The PAR-JWT (Pushed Authorization Request in JWT format) is used in the first phase. Its purpose is to deliver the user's original input and the agent-proposed operational strategy to the AS, enabling the generation of a high-quality consent UI and establishing an evidentiary starting point. Its format is defined as follows:

The evidence field is a JWT in JSON-VC (JSON Web Token-based Verifiable Credential) format, generated by the agent client and included in the agent operation proposal token. Its format is as follows:

JWT Header

alg Uses the RS256 asymmetric signing algorithm (recommended). typ Explicitly set to JWT to indicate the token type. kid The key identifier that references the public key used for verification, enabling the recipient to locate the corresponding public key (e.g., from a JWKS endpoint).

JWT Payload

Public Key Discovery Mechanism (JWKS)

The client agent publishes its public keys in JSON Web Key Set (JWKS) format at the well-known endpoint /.well-known/jwks.json. To retrieve the public keys, a relying party sends an HTTPS GET request to this endpoint.

Signature

The Issuer (https://client.myassistant.example) generates the signature using its private key and the RS256 (RSA Signature with SHA-256) algorithm over the concatenated content: base64url(header) + '.' + base64url(payload). Final Output as Standard JWT Tripartite String The resulting JWT is a URL-safe, three-part encoded string in the format: eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6Imh0dHBzOi8vY2xpZW50Lm15YW5zd2VyLmV4YW1wbGUvLndlbGwta25vd24vandrLmpzb24ja2V5LTAxIn0. eyJqdGkiOiJwdC0wMDEiLCJpc3MiOiJodHRwczovL2NsaWVudC5teWFuc3dlci5leGFtcGxlIiwic3ViIjoidXNlcl8xMjM0NSIsImlhdCI6MTczMTY2NDUwMCwiZXhwIjoxNzMxNjY4MTAwLCJ0eXBlIjoiVmVyaWZpYWJsZUNyZWRlbnRpYWwiLCJjcmVkZW50aWFsU3ViamVjdCI6eyJ0eXBlIjoiVXNlcklucHV0RXZpZGVuY2UiLCJwcm9tcHQiOiJCdXkgc29tZXRoaW5nIGNoZWFwIG9uIE5vdiAxMSBu .SIGNATURE

Verification Process:

(1) Decode the JWT; (2) Extract the kid from the header; (3) Retrieve the corresponding public key from /.well-known/jwks.json; (4) Validate the cryptographic signature; (5) Check policy conditions such as iss, time window (iat, exp), and device fingerprint. The Agent Client sends this PAR-JWT to the Authorization Server (AS) via the Pushed Authorization Request (PAR) mechanism, as defined in (OAuth 2.0 Pushed Authorization Requests).

Upon successful user authorization and authentication, the Authorization Server (AS) issues a Verifiable Agent Operation Credential in the form of a JWT token. The purpose of this credential is to serve as a digitally signed and independently verifiable "authorization letter", which enables the Personal Agent to perform authorized operations on behalf of the user. The issuer of the credential is the Authorization Server (AS), and the intended recipient is the Personal Agent (which may be delivered via the client). The credential becomes effective immediately after the user clicks "Allow" or "Consent".

auditTrail establishes a complete, semantically traceable chain—from the user's original intent to the system's final executed action—in AI Agent scenarios. This mechanism is known as a Semantic Audit Trail. The specific purposes and their descriptions are outlined in the following table:

Purposes and Descriptions of the Semantic Audit Trail

Purpose	Description
1. Intent Provenance	Records what the user originally said (e.g., "Buy something cheap on Nov 11 night") to prevent disputes such as: "I didn’t say I wanted to buy anything!"
2. Action Interpretation	Documents how the system interpreted and rendered the input into a concrete operation (e.g., "Purchase under $50 during 00:00–06:00"), reflecting the AI’s reasoning process.
3. Semantic Transparency	Shows whether semantic expansions or default values were applied (e.g., mapping "cheap" to $50, defining "night" as 00:00–06:00).
4. User Confirmation Evidence	Includes timestamps indicating when the user reviewed and confirmed the interpreted action, serving as proof of authorization.
5. Accountability Support	Enables post-hoc analysis in case of erroneous transactions: Was the issue due to ambiguous user input, system misinterpretation, or misleading UI guidance.

| | | | | | | (2) | | Parse & structure | | | | operation proposal | | | | | | (3) | | Generate user's | | | | prompt VC | | | | | | (4) | | Build operation | | | | proposal JWT | | | | | | (5) | | | POST /par | | | | with JWT --------->| | | | | (6) | | | Return request_uri | | | |<-------------------| | | | | (7) | | Redirect user to | | | | /authorize?request_uri=... | | |---------------------------------------->| | | | | (8) | Approve | | | |<------------------------------------------------------------| | | | | (9) | | | Validate JWT | | | | Extract operation | | | | Issue access token | | | |<-------------------| | | | | (10) | | Present access | | | | token ------------>| Resource API | | | |------------------->| | | | Enforce operation | | | | Execute or deny | | | |<-------------------| | | | Response | |<------------------------------------------------------------| ]]>

" } | v (7) AS validates JWS, extracts agent_operation_proposal.prompt.signature and validate | v (8) AS issues request_uri to agent | v (9) Agent redirects user to /authorize?request_uri=... | v (10) User reviews the original prompt and the agent__authorization_operation and approves | v (11) AS issues Agent Operation Authorization Token as access token | v (12) Agent uses token to access Resource Server | v (13) AS verifies evidence VC | v (14) RS enforces constraints and conditions | v (15) Action executed or denied ]]>

The combination of JWS and VC provides dual-layer integrity: JWS protects the token, VC protects the prompt. Authorization Servers MUST validate the VC proof using the referenced issuerKey and associated public key material before accepting the request. Public keys referenced by issuerKey MUST be obtained through secure, trusted mechanisms (e.g., pre-registration, PKI). Expression evaluation (e.g., CEL) MUST occur in sandboxed environments. The use of PAR prevents leakage of sensitive operation data in URLs.

This document requests IANA to register the following two claims in the "JSON Web Token Claims" registry, following the procedure defined in RFC 8126.

Claim Name:: agent_operation_proposal
Claim Description:: A structured representation of an operation proposed by an autonomous agent on behalf of a user. It includes intended actions, constraints, conditions, and references to verifiable evidence (e.g., signed user input). Used in delegation flows where user intent is expressed through natural language and converted into machine-executable proposals.
Change Controller:: IETF
Specification Document:: This document, Section X.Y ("Agent Operation Proposal")

Claim Name:: agent_operation_authorization
Claim Description:: A structured authorization decision issued by an Authorization Server in response to an operation proposal. It mirrors the structure of the proposal but represents a formally approved scope of execution, potentially with additional policy-enforced constraints. Enables auditable, revocable, and context-aware delegation for AI agents.
Change Controller:: IETF
Specification Document:: This document, Section X.Z ("Agent Operation Authorization")

Both claims are intended to be used within JWTs carrying structured permissions and operational intent in human-AI collaboration scenarios, particularly in regulated environments requiring traceability, non-repudiation, and alignment with EU AI Act principles such as transparency and accountability.

Implementers may choose to publish formal JSON Schemas for agent_operation_proposal and agent_operation_authorization. If standardized schemas are developed, they can be submitted to the IANA "JSON Schema Reserved Vocabulary" registry per RFC 9539.

References Normative References

The author thanks contributors from the IETF community for their valuable feedback on agent authorization semantics.