Bundle Protocol Endpoint ID Patterns

Bundle Protocol Endpoint ID Patterns The Johns Hopkins University Applied Physics Laboratory

11100 Johns Hopkins Rd. Laurel MD 20723 United States of America brian.sipos+ietf@gmail.com

Transport Delay-Tolerant Networking DTN PKIX This document extends the Endpoint ID (EID) concept into an EID Pattern, which is used to categorize any EID as matching a specific pattern or not. EID Patterns are suitable for expressing agent configuration, for being used on-the-wire by DTN protocols, and for being easily understandable by a layperson. EID Patterns include scheme-specific optimizations for expressing set membership and each scheme pattern includes text and CBOR encoding forms; the pattern for the "ipn" EID scheme being designed to be highly compressible in its CBOR form. This document also defines a Public Key Infrastructure Using X.509 (PKIX) Other Name form to contain an EID Pattern and a handling rule to use a pattern to match an EID.

Introduction The Bundle Protocol (BP) Version 7 specification defines text and CBOR encoding forms of an Endpoint ID (EID) which is used as both a source and a destination for individual bundles. BP Agent implementations have necessarily used methods of defining patterns for matching multiple EIDs in order to configure routing, forwarding, and delivery of bundles, but these have not yet been standardized and do not have a concise form suitable for on-the-wire messaging. In much the same way that the Classless Inter-domain Routing (CIDR) mechanism of can be used to aggregate a contiguous and bit-aligned block of IP addresses in a concise unit (encoded as text or otherwise), this concept of EID Pattern is used to aggregate a set of EIDs into a single concise unit. This is especially valuable because an EID includes both an identifier of the node sending or receiving the bundle as well as an identifier for the specific service which generated or will process the bundle. Any EID Pattern can be used both to aggregate EIDs based on node identifier, service identifier, or both. A purely text-based pattern mechanism such as could handle the general case of matching the text form of EIDs (as URIs) but would not be able to achieve the same level of encoding compression and would not be able to express of exact numeric ranges like the scheme-specific mechanism defined in this document. The certificate profile and NODE-ID definition of uses the text form of EID to authenticate nodes based on EID. This document defines a Public Key Infrastructure Using X.509 (PKIX) Other Name Form to contain an EID Pattern and a handling rule to use a pattern to match an EID. This allows authenticating an individual EID based on an EID Pattern in much the same way as using a "wildcard" certificate to match a DNS name. One other aspect of this patterning mechanism is that the text form of each scheme-specific pattern is intended to be, in a subjective sense, natural and understandable for the case of a human manually typing patterns into a text document or quick email message; the interpretation of the text pattern should "make sense" with minimal training.

Scope This document defines a logical model of pattern matching BP Endpoint IDs and both text and CBOR encoding forms, as well as a PKIX extension to make use of an EID Pattern. This document does not define a method of disambiguating an EID from an EID Pattern (in either encoded form) without any other context. Given a pure text or CBOR encoding of an arbitrary value, there must be some external context to determine how to interpret it. Although the same EID definitions apply to BP Version 6 this document does not provide any mechanisms of integrating with that protocol. It is an implementation matter for a BP Agent to use EID Patterns with BP Version 6 bundles and their compressed bundle header encoding (CBHE).

Use of ABNF This document defines text structure using the Augmented Backus-Naur Form (ABNF) of . The entire ABNF structure can be extracted from the XML version of this document using the XPath expression: '//sourcecode[@type="abnf"]' The following initial fragment defines the top-level rules of this document's ABNF. eid-pattern = ipn-pattern / dtn-pattern ; Shared wildcard rules wildcard = "*" multi-wildcard = "**" From the document the definition is taken for pchar. From the document the definition is taken for digit. From the document the definition is taken for nbr-delim.

Use of CDDL This document defines CBOR structure using the Concise Data Definition Language (CDDL) of . The entire CDDL structure can be extracted from the XML version of this document using the XPath expression: '//sourcecode[@type="cddl"]' The following initial fragment defines the top-level symbols of this document's CDDL, which includes the example CBOR content. start = eid-pattern eid-pattern = $eid-pattern .within eid-structure From the document the definition is taken for eid-structure.

Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 when, and only when, they appear in all capitals, as shown here.

Patterns for Endpoint IDs This document does not define a universal form of EID Pattern, though text forms of EID Patterns do share concepts and rules for wildcard matching. Instead, in order to achieve efficiencies in non-text encoding, each EID scheme uses a different form of complex pattern matching. The text form of an EID Pattern is not a URI and is not bound by the character set restrictions imposed in . This is much the same as a URI template is also not itself a URI. Although some forms of EID Pattern can contain reserved URI characters, it is not guaranteed that any particular EID Pattern will be intrinsically differentiable from an EID. See for details on handling concerns. For the pattern forms defined in this document, the exact-match pattern's text form is identical with its matching EID. This behavior is not required or necessary but is a convenient side effect of the text definitions and makes the EID Pattern a proper superset of EID. The IPN pattern has an exact-match CBOR form which is identical to its matching EID, while the DTN pattern CBOR form is always as a component pattern array.

DTN Scheme Pattern As defined in , DTN scheme EIDs have an authority (node name) part and a sequence of path (service demux) segment components. Combining these components together, the whole EID SSP is treated as a sequence of these unstructured text components. Because of the lack of more specific structure, outside of match-all wildcards only a generic pattern matching mechanism like a regular expression can be used. The conceptual model of the DTN pattern is that the node name and the sequence of path segments can be matched as one of:

Specific value:: This will match only a single value (as decoded text).
Regular expression:: This will match a decoded text value based on a (possibly anchored) regular expression.
Single-segment wildcard:: This will match an individual path segment.
Multi-segment wildcard:: For the node name this will match any valid value. For the path segment this will match any number of segments of any value.

A DTN pattern SHALL contain at least two components: the first for the node name and the others for the service demux. A DTN pattern SHALL contain no more than one multi-segment wildcard component. If present, a DTN pattern SHALL only contain a multi-segment wildcard in its last (demux path segment) component.

EID Matching When matching a DTN pattern any query or fragment parts of an EID SHALL be ignored and not treated as comparison components. A DTN pattern SHALL be considered to match a specific EID when both have the same scheme, the pattern has the same number of components as the EID, and each component of the the pattern matches the corresponding component of the EID SSP. If the number of components differ or if any component doesn't match, the whole pattern does not match. Each pattern component SHALL be considered to match according to the following rules:

Specific value:: The pattern component SHALL be compared with the EID component after both are percent-decoded in accordance with and UTF-8 decoded in accordance with .
Regular expression:: The pattern component SHALL be percent-decoded and UTF-8 decoded then interpreted as a regular expressing in accordance with . The EID component SHALL be percent-decoded and UTF-8 decoded. The regular expression SHALL then be compared with the decoded EID component.
Single-segment wildcard:: The pattern component SHALL be considered to match with any EID component, if present, including an empty component.
Multi-segment wildcard:: The pattern component SHALL be considered to match with any number of EID components, including zero EID components.

Because these are dealing with text values in an information model, the matching occurs in the percent-encoding normalized or percent-decoded domain (i.e. it's not a pattern for the encoded URI, the matching is performed within the information model of the SSP).

Pattern Set Logic Because of the arbitrarily complex nesting rules allowed by regular expressions, and the multiple techniques available for different expressions to match the same subsets of text, DTN pattern sets can only be consistently computed when the node-name or demux path segments are either exact-text matches or one of the match-all wildcards. Users of the DTN pattern SHALL have a mechanism to perform set logic with specific value and wildcard components. EID Pattern processors MAY, but cannot be assumed to, have a mechanism to perform set logic on regular expression components.

Text Form The text form of the DTN pattern conforms to the ABNF in . The authority begins with the same string "//" and authority and demux components are separated by the same character "/" as in the DTN URI scheme. This pattern uses reserved URI characters of "[" and "]" (see ) to indicate the presence of a regular expression for a component. This allows completely disambiguating a DTN pattern from a specific DTN EID when a regular expression or wildcard is present. Because neither of those are required to be present in a DTN pattern and the asterisk "*" is a valid path segment character, the considerations of still always apply to decoding text as EID Pattern versus an EID.

DTN Pattern ABNF Schema dtn-pattern = "dtn:" dtn-ssp dtn-ssp = dtn-wkssp-exact / dtn-fullssp ; A node-name authority with some number of demux path segments dtn-fullssp = "//" dtn-authority-pat "/" dtn-path-pat dtn-authority-pat = exact / regexp / multi-wildcard ; Only the last path segment is allowed a multi-wildcard dtn-path-pat = *( dtn-single-pat "/" ) dtn-last-pat dtn-single-pat = exact / regexp / wildcard dtn-last-pat = dtn-single-pat / multi-wildcard ; Exact-match text, which excludes gen-delims characters exact = *pchar ; Regular expression for the whole SSP within the gen-delims brackets ; with an allowance for more regexp characters regexp = "[" *( pchar / "^" ) "]" ; Exact match for well-known SSP dtn-wkssp-exact = "none" A concrete use of this text form is illustrated in this example: dtn://node/[%5Eanchored]/other%20part/** <-- P --> <--- P ---> <--- P ----> Where the "P" sections are percent-encoded (with no reserved characters) and square brackets unambiguously delimit the expression component. The actual components in this example are the specific value "node", the regular expression "^anchored", and the specific value "other part" and all are UTF-8 and percent-encoded. Further examples are given in .

CBOR Form The CBOR form of the DTN pattern conforms to the CDDL in . Just as in the DTN URI scheme the pattern scheme identifier is 1, the first component of the SSP identifies the node and the last components identify the service path segments. The well-known SSP SHALL be encoded using the same uint value specified for the DTN URI scheme. Each of the DTN pattern components SHALL be CBOR encoded as follows:

Specific value:: A text item (not otherwise UTF-8 or percent-encoded) corresponding to the dtn-exact symbol.
Regular expression:: A tagged regular expression item corresponding to the regexp symbol.
Single-segment wildcard:: The true item.
Multi-segment wildcard:: The false item.

The wildcard sentinel values have no intrinsic meaning and were simply chosen to be one-octet-encoded special items. The CBOR form of the DTN pattern is not as compressible as the IPN pattern, but the exact text is not percent encoded and the regular expression tag "regexp" does save one octet per instance.

DTN Pattern CDDL Schema $eid-pattern /= [ uri-code: 1, SSP: dtn-ssp ] dtn-ssp = dtn-wkssp-exact / dtn-fullssp-parts dtn-fullssp-parts = [ dtn-authority-pat, dtn-path-pat, ] dtn-authority-pat = dtn-exact / regexp / multi-wildcard ; Only the last path segment is allowed a multi-wildcard dtn-path-pat = ( * dtn-single-pat, ? multi-wildcard ) dtn-single-pat = dtn-exact / regexp / wildcard dtn-exact = tstr wildcard = true multi-wildcard = false ; Exact match for well-known SSP dtn-wkssp-exact = $dtn-wkssp .within uint $dtn-wkssp /= 0 ; For "none"

IPN Scheme Pattern As defined in and updated in , IPN scheme EIDs have a SSP which is divided into a bounded number of integer numeric components. Because of this, the pattern for IPN scheme EIDs is based on matching a numeric value or range for each component. The conceptual model of the IPN pattern is that each of the components of the SSP can be matched as one of:

Specific value:: This will match only a single value (as decoded number).
Range:: This will match any value contained in a disjoint set of numeric intervals.
Wildcard:: This will match any valid value, but not the absence of a value.

An IPN pattern SHALL contain between two and four components, inclusive, corresponding to the IPN scheme EID components. Within a single component of the IPN pattern, the range intervals SHALL be disjoint and non-contiguous. Any overlapping or contiguity of intervals within a set can be coalesced into a single covering interval with the same meaning. The text form of a range can, but SHOULD NOT, contain overlapping or contiguous intervals. The CBOR form of a range does not allow overlapping intervals because of its compressed form, but does allow contiguous intervals. The decoder for any form of an IPN pattern SHALL normalize all intervals sets to satisfy information model requirements. The decoder for any form of an IPN pattern SHOULD treat the failure of any piece parts of a pattern as a failure to decode the whole pattern. A limitation of this mechanism is that there is no intermediate component pattern between a specific set of finite intervals and the match-all (unbounded) wildcard. There is no capability of including an non-finite bounds within any interval.

EID Matching An IPN pattern SHALL be considered to match a specific EID when both have the same scheme, the pattern has the same number of components as the EID, and each component of the the pattern matches the corresponding component of the EID SSP. If the number of components differ or if any component doesn't match, the whole pattern does not match. Each pattern component SHALL be considered to match according to the following rules:

Specific value:: The pattern component SHALL be compared to the EID component as an exact match of decoded numeric value.
Range:: The pattern component SHALL be considered to match with any EID component value that is contained in any of the finite intervals of the range.
Wildcard:: The pattern component SHALL be considered to match with any EID component, if present.

Because these are dealing with numeric values in an information model, the matching occurs after any encoding-specific normalization (i.e. it's not a text pattern for the text encoding, the matching is performed within the information model of the SSP).

Pattern Set Logic One benefit of using an EID pattern with an information model of a sequence of numbers or ranges is that performing set logic such as intersection or containment is straightforward. For set logical behavior, the specific value case is treated as a singleton set and the wildcard case is treated as the unbounded-interval. Two IPN patterns intersect if all of their corresponding components intersect, and the intersection of each component range can be readily computed using multi-interval set logic. Likewise, one IPN pattern is a subset (or proper subset) of another pattern if all of the components is a subset (or proper subset) of the other's corresponding component.

Text Form The text form of the IPN pattern conforms to the ABNF in . Each component is separated by the same character "." as in the IPN URI scheme. This pattern uses reserved URI characters of "[" and "]" (see ) to indicate the presence of a range set for a component, the character "," to separate each range, and the character "-" to indicate the inclusive range within the set. Each of the numeric values within the range is inclusive. If the range does not contain two values it is a length-one range. The canonical text form of an IPN pattern SHALL order all range sets in ascending numeric order.

IPN Pattern ABNF Schema ipn-pattern = "ipn:" ipn-ssp ; Up to three preceding components with a service number ipn-ssp = 1*3( ipn-part-pat nbr-delim ) ipn-part-pat ipn-part-pat = ipn-number / ipn-range / wildcard ipn-number = 1*DIGIT ipn-range = "[" ipn-interval *( "," ipn-interval ) "]" ipn-interval = ipn-number [ "-" ipn-number ]

CBOR Form The CBOR form of the IPN pattern conforms to the CDDL in . Just as in the IPN URI scheme the pattern scheme identifier is 2, the first components of the SSP identify the node and the last component identifies the service. Each of the IPN pattern components SHALL be CBOR encoded as follows:

Specific value:: A number corresponding to the uint symbol.
Range:: An array item corresponding to the ipn-range symbol.
Wildcard:: The true item.

The wildcard sentinel values have no intrinsic meaning and were simply chosen to be one-octet-encoded special items. The encoding of ranges is a compressed form in which each pair of values in the range indicates:

The non-zero offset from the previous one-past-end-of-range, or the offset from zero if there is no preceding range
The length of this range, which is inclusive of the first and last contained value so should always be non-zero

Another way to interpret these pairs is that each number indicates the length of alternating "excluded" and "included" intervals for the range.

IPN Pattern CDDL Schema $eid-pattern /= [ uri-code: 2, SSP: ipn-ssp ] ipn-ssp = [ 2*4 ipn-part-pat, ] ipn-part-pat = uint / ipn-range / true ipn-range = [ 1* ipn-interval-pair ] ipn-interval-pair = ( offset: uint, length: uint .gt 0, )

PKIX Certificate Profile Update This document expands upon the PKIX profile of TCPCLv4 to allow an EID Pattern in any certificate where an Node ID is required or allowed.

New Other Name Form This document defines a PKIX Other Name Form identifier, id-on-bundleEIDPattern in ; this identifier can be used as the type-id in a Subject Alternative Name entry of type otherName. The BundleEIDPattern value associated with the otherName type-id id-on-bundleEIDPattern SHALL be an EID Pattern text form, encoded as an UTF8String, with a scheme that is present in the IANA "Bundle Protocol URI Scheme Types" registry .

New Identifier Type This specification defines an EID-PATTERN-ID of a certificate as being the Subject Alternative Name entry of type otherName with a name form of BundleEIDPattern and a value limited to an EID Pattern text form. An entity SHALL ignore any entry of type otherName with a name form of BundleEIDPattern and a value that is some text other than an EID Pattern. The EID-PATTERN-ID is similar to the NODE-ID as defined in but can match many different and distinct Node IDs. URI matching of an EID-PATTERN-ID SHALL use the scheme-specific matching logic defined in this specification. An EID Pattern scheme can refine this matching logic with rules regarding how node IDs within that scheme are to be compared with the issued EID-PATTERN-ID. As an augmentation of : Unless prohibited by CA policy, a TCPCL end-entity certificate SHALL contain either a NODE-ID or an EID-PATTERN-ID that authenticates the node ID of the peer. All other requirements of that certificate profile are unchanged by this document.

Security Considerations It is critical for applications handling EIDs and EID Patterns to positively distinguish between the two based on the context in which the value is being used. For PKIX Subject Alternative Name this is distinguished by the different Other Name forms. An EID which is inappropriately interpreted as an EID Pattern could allow an attacker to elevate access depending upon other aspects of the system being accessed. CAs which issue certificates containing EID Patterns need to consider the implications of an overly-broad pattern in the same way that current Web PKI CAs must manage certificates with wildcard DNS-IDs. Although the reserved characters "[" and "]" are disallowed within the URI authority and path segments by there are still URI processors which could be lax about enforcing that restriction and could allow an EID pattern to be decoded in a place where an actual EID is expected. This could allow unwanted side-effects when the EID is handled by a BP Agent. Both URI authority and path segments are percent-encoded text and need to be handled by EID processors as such for both pattern matching and equality comparison. Additionally, for the IPN scheme there are numeric values that must be handled as such for pattern matching and comparison.

IANA Considerations

Bundle Protocol URI Scheme Types This specification re-uses the "Bundle Protocol URI Scheme Types" sub-registry within the "Bundle Protocol" registry for the CBOR encoding of EID Patterns and adds an informative column "EID Pattern Reference" as in the following table.

Value	Description	...	EID Pattern Reference
1	dtn		[This specification]
2	ipn		[This specification]

Object Identifier for PKIX Other Name Forms IANA has created, under the "Structure of Management Information (SMI) Numbers" registry , a sub-registry titled "SMI Security for PKIX Other Name Forms". The other name forms table is updated to include a row "id-on-bundleEIDPattern" for containing an Endpoint ID Pattern as in the following table.

Decimal	Description	References
ON-TBD	id-on-bundleEIDPattern	[This specification]

The formal structure of the associated other name form is in . The use of this OID is defined in .

References Normative References ECMAScript Language Specification 5.1 Edition European Computer Manufacturers Association ECMA Standard ECMA-262 Bundle Protocol IANA Structure of Management Information (SMI) Numbers IANA Information technology -- Abstract Syntax Notation One (ASN.1): Specification of basic notation ITU-T ITU-T Recommendation X.680, ISO/IEC 8824-1:2015 Informative References URI Pattern Matching for Groups of Resources W3C

ASN.1 Module The following ASN.1 module formally specifies the BundleEIDPattern structure and its Other Name form in the syntax of . This specification uses the ASN.1 definitions from with the 2002 ASN.1 notation used in that document. DTN-EIDPATTERN-2023 { iso(1) identified-organization(3) dod(6) internet(1) security(5) mechanisms(5) pkix(7) id-mod(0) id-mod-dtn-eidpattern-2023(MOD-TBD) } DEFINITIONS IMPLICIT TAGS ::= BEGIN IMPORTS OTHER-NAME FROM PKIX1Implicit-2009 -- [RFC5912] { iso(1) identified-organization(3) dod(6) internet(1) security(5) mechanisms(5) pkix(7) id-mod(0) id-mod-pkix1-implicit-02(59) } id-pkix FROM PKIX1Explicit-2009 -- [RFC5912] { iso(1) identified-organization(3) dod(6) internet(1) security(5) mechanisms(5) pkix(7) id-mod(0) id-mod-pkix1-explicit-02(51) } ; id-on OBJECT IDENTIFIER ::= { id-pkix 8 } DTNOtherNames OTHER-NAME ::= { on-bundleEIDPattern, ... } -- The otherName definition for Bundle EID Pattern on-bundleEIDPattern OTHER-NAME ::= { BundleEIDPattern IDENTIFIED BY { id-on-bundleEIDPattern } } id-on-bundleEIDPattern OBJECT IDENTIFIER ::= { id-on ON-TBD } -- Same encoding as BundleEID, which allows URI reserved characters BundleEIDPattern ::= IA5String END

Examples

DTN Patterns

Exact Match This trivial example matches only one EID (which itself has the same text form) dtn://node/service which has a CBOR form of: [1, ["node", "service"]] An example of normalized matching is that the pattern dtn://node/service will still match the EIDs dtn://node/ser%76ice and dtn://no%64e/service because each component match is performed in percent-decoded and UTF-8 decoded form.

Wildcards This example matches a single-segment service demux on a single node dtn://node/* which has a CBOR form of: [1, ["node", true]] That single wildcard will match the empty demux dtn://node/ but will not match demux paths such as dtn://node/long/name or any more segments. This example matches all service demux on a single node with a multi-wildcard dtn://node/** which has a CBOR form of: [1, ["node", false]] This example matches a service demux with a prefix segment "pre" dtn://node/pre/** which has a CBOR form of: [1, ["node", "pre", false]] This example matches all node names having the same service demux dtn://**/some/serv which has a CBOR form of: [1, [false, "some", "serv"]]

Regular Expression Match This example includes a single regular expression for single-segment service that starts with the letter "a" in the text form of dtn://**/[^a] which has a CBOR form of: [1, [false, 35("^a")]]

IPN Patterns

Exact Match This trivial example matches only one EID (which itself has the same text and CBOR forms) ipn:0.3.4 which has a CBOR form of: [2, [0, 3, 4]]

Single Wildcard Match This example matches all service numbers on a single node ipn:0.3.* which has a CBOR form of: [2, [0, 3, true]] This example matches all no-authority nodes with the same service number ipn:*.4 which has a CBOR form of: [2, [true, 4]]

Range Match This example includes a single range over the service numbers ipn:0.3.0 to ipn:0.3.19 inclusive as ipn:0.3.[0-19] which has a CBOR form of: [2, [0, 3, [0, 20]]] This example includes an offset range over the service numbers ipn:0.3.10 to ipn:0.3.19 inclusive as ipn:0.3.[10-19] which has a CBOR form of: [2, [0, 3, [10, 10]]] This example includes multiple ranges of service numbers ipn:0.3.0 to ipn:0.3.4 and ipn:0.3.10 to ipn:0.3.19 inclusive as ipn:0.3.[0-4,10-19] which has a CBOR form of: [2, [0, 3, [0, 5, 5, 10]]] An overlapping or contiguous pattern such as ipn:0.3.[0-9,10-19] or ipn:0.3.[0-15,10-19] or ipn:0.3.[10-19,0-9] would be normalized to ipn:0.3.[0-19]. An unordered pattern such as ipn:0.3.[10-19,0-4] would be normalized to ipn:0.3.[0-4,10-19].

Acknowledgments The DTN pattern expressiveness is based on use case examples provided by Carlo Caini and Lucien Loiseau.