<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<!-- generated by https://github.com/cabo/kramdown-rfc version 1.6.14 (Ruby 3.1.2) -->
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" ipr="trust200902" docName="draft-ietf-jsonpath-iregexp-01" category="std" consensus="true" submissionType="IETF" tocInclude="true" sortRefs="true" symRefs="true" version="3">
  <!-- xml2rfc v2v3 conversion 3.13.0 -->
  <front>
    <title abbrev="I-Regexp">I-Regexp: An Interoperable Regexp Format</title>
    <seriesInfo name="Internet-Draft" value="draft-ietf-jsonpath-iregexp-01"/>
    <author initials="C." surname="Bormann" fullname="Carsten Bormann">
      <organization>Universität Bremen TZI</organization>
      <address>
        <postal>
          <street>Postfach 330440</street>
          <city>Bremen</city>
          <code>D-28359</code>
          <country>Germany</country>
        </postal>
        <phone>+49-421-218-63921</phone>
        <email>cabo@tzi.org</email>
      </address>
    </author>
    <author initials="T." surname="Bray" fullname="Tim Bray">
      <organization>Textuality</organization>
      <address>
        <email>tbray@textuality.com</email>
      </address>
    </author>
    <date year="2022" month="July" day="11"/>
    <keyword>Internet-Draft</keyword>
    <abstract>
      <t>This document specifies I-Regexp, a flavor of regular expressions that is
limited in scope with the goal of interoperation across many different
regular-expression libraries.</t>
    </abstract>
    <note removeInRFC="true">
      <name>About This Document</name>
      <t>
        Status information for this document may be found at <eref target="https://datatracker.ietf.org/doc/draft-ietf-jsonpath-iregexp/"/>.
      </t>
      <t>
        Discussion of this document takes place on the
        JSONPath Working Group mailing list (<eref target="mailto:JSONPath@ietf.org"/>),
        which is archived at <eref target="https://mailarchive.ietf.org/arch/browse/JSONPath/"/>.
      </t>
      <t>Source for this draft and an issue tracker can be found at
        <eref target="https://github.com/ietf-wg-jsonpath/iregexp"/>.</t>
    </note>
  </front>
  <middle>
    <section anchor="intro">
      <name>Introduction</name>
      <t>This specification describes an interoperable regular expression flavor, I-Regexp.</t>
      <t>This document uses the abbreviation "regexp" for what are usually
called regular expressions in programming.
"I-Regexp" is used as a noun meaning a character string which conforms to the requirements
in this specification; the plural is "I-Regexps".</t>
      <t>I-Regexp does not provide advanced regexp features such as capture groups, lookahead, or backreferences.
It supports only a Boolean matching capability, i.e., testing whether a given regexp matches a given piece of text.</t>
      <t>I-Regexp supports the entire repertoire of Unicode characters.</t>
      <t>I-Regexp is a subset of XSD regexps <xref target="XSD-2"/>.</t>
      <t>This document includes rules for converting I-Regexps for use with several well-known regexp libraries.</t>
      <section anchor="terminology">
        <name>Terminology</name>
        <t>The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL
NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
"<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as
described in BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they
appear in all capitals, as shown here.</t>
        <t>The grammatical rules in this document are to be interpreted as ABNF,
as described in <xref target="RFC5234"/> and <xref target="RFC7405"/>.</t>
      </section>
    </section>
    <section anchor="requirements">
      <name>Requirements</name>
      <t>I-Regexps should handle the vast majority of practical cases where a
matching regexp is needed in a data model specification or a query
language expression.</t>
      <t>A brief survey of published RFCs yielded the regexp patterns in
Appendix A (with no attempt at completeness).
With certain exceptions as discussed there,
these should be covered by I-Regexps, both syntactically and with
their intended semantics.</t>
    </section>
    <section anchor="defn">
      <name>I-Regexp Syntax</name>
      <t>An I-Regexp <bcp14>MUST</bcp14> conform to the ABNF specification in
<xref target="iregexp-abnf"/>.</t>
      <figure anchor="iregexp-abnf">
        <name>I-Regexp Syntax in ABNF</name>
        <sourcecode type="abnf"><![CDATA[
i-regexp = branch *( "|" branch )
branch = *piece
piece = atom [ quantifier ]
quantifier = ( %x2A-2B ; '*'-'+'
 / "?" ) / ( "{" quantity "}" )
quantity = QuantExact [ "," [ QuantExact ] ]
QuantExact = 1*%x30-39 ; '0'-'9'

atom = NormalChar / charClass / ( "(" i-regexp ")" )
NormalChar = ( %x00-27 / %x2C-2D ; ','-'-'
 / %x2F-3E ; '/'-'>'
 / %x40-5A ; '@'-'Z'
 / %x5E-7A ; '^'-'z'
 / %x7E-10FFFF )
charClass = "." / SingleCharEsc / charClassEsc / charClassExpr
SingleCharEsc = "\" ( %x28-2B ; '('-'+'
 / %x2D-2E ; '-'-'.'
 / "?" / %x5B-5E ; '['-'^'
 / %s"n" / %s"r" / %s"t" / %x7B-7D ; '{'-'}'
 )
charClassEsc = catEsc / complEsc
charClassExpr = "[" [ "^" ] ( "-" / CCE1 ) *CCE1 [ "-" ] "]"
CCE1 = ( CCchar [ "-" CCchar ] ) / charClassEsc
CCchar = ( %x00-2C / %x2E-5A ; '.'-'Z'
 / %x5E-10FFFF ) / SingleCharEsc
catEsc = %s"\p{" charProp "}"
complEsc = %s"\P{" charProp "}"
charProp = IsCategory
IsCategory = Letters / Marks / Numbers / Punctuation / Separators /
    Symbols / Others
Letters = %s"L" [ ( %x6C-6D ; 'l'-'m'
 / %s"o" / %x74-75 ; 't'-'u'
 ) ]
Marks = %s"M" [ ( %s"c" / %s"e" / %s"n" ) ]
Numbers = %s"N" [ ( %s"d" / %s"l" / %s"o" ) ]
Punctuation = %s"P" [ ( %x63-66 ; 'c'-'f'
 / %s"i" / %s"o" / %s"s" ) ]
Separators = %s"Z" [ ( %s"l" / %s"p" / %s"s" ) ]
Symbols = %s"S" [ ( %s"c" / %s"k" / %s"m" / %s"o" ) ]
Others = %s"C" [ ( %s"c" / %s"f" / %x6E-6F ; 'n'-'o'
 ) ]
]]></sourcecode>
      </figure>
      <t>As an additional restriction, <tt>charClassExpr</tt> is not allowed to
match <tt>[^]</tt>, which according to this grammar would parse as a
positive character class containing the single character <tt>^</tt>.</t>
      <t>This is essentially XSD regexp without character class
subtraction, without multi-character escapes such as <tt>\s</tt>,
<tt>\S</tt>, and <tt>\w</tt>, and without Unicode blocks.</t>
      <t>An I-Regexp implementation <bcp14>MUST</bcp14> be a complete implementation of this
limited subset.
In particular, full Unicode support is <bcp14>REQUIRED</bcp14>; the implementation
<bcp14>MUST NOT</bcp14> limit itself to 7- or 8-bit character sets such as ASCII and
<bcp14>MUST</bcp14> support the Unicode character property set in character classes.</t>
      <section anchor="checking">
        <name>Checking Implementations</name>
        <t>A <em>checking</em> I-Regexp implementation is one that checks a supplied
regexp for compliance with this specification and reports any problems.
Checking implementations give their users confidence that they didn't
accidentally insert non-interoperable syntax, so checking is <bcp14>RECOMMENDED</bcp14>.
Exceptions to this rule may be made for low-effort implementations
that map I-Regexp to another regexp library by simple steps such as
performing the mapping operations discussed in <xref target="mapping"/>; here, the
effort needed to do full checking may dwarf the rest of the
implementation effort.
Implementations <bcp14>SHOULD</bcp14> document whether they are checking or not.</t>
        <t>Specifications that employ I-Regexp may want to define in which
cases their implementations can work with a non-checking I-Regexp
implementation and when full checking is needed, possibly in the
process of defining their own implementation classes.</t>
      </section>
    </section>
    <section anchor="i-regexp-semantics">
      <name>I-Regexp Semantics</name>
      <t>This syntax is a subset of that of <xref target="XSD-2"/>.
Implementations which interpret I-Regexps <bcp14>MUST</bcp14>
yield Boolean results as specified in <xref target="XSD-2"/>.
(See also <xref target="xsd-regexps"/>.)</t>
    </section>
    <section anchor="mapping">
      <name>Mapping I-Regexp to Regexp Dialects</name>
      <t>(TBD; these mappings need to be further verified in implementation work.)</t>
      <section anchor="xsd-regexps">
        <name>XSD Regexps</name>
        <t>Any I-Regexp also is an XSD Regexp <xref target="XSD-2"/>, so the mapping is an identity
function.</t>
        <t>Note that a few errata for <xref target="XSD-2"/> have been fixed in <xref target="XSD11-2"/>, which
is therefore also included as a normative reference.
XSD 1.1 is less widely implemented than XSD 1.0, and implementations
of XSD 1.0 are likely to include these bugfixes, so for the intents
and purposes of this specification an implementation of XSD 1.0
regexps is equivalent to an implementation of XSD 1.1 regexps.</t>
      </section>
      <section anchor="toESreg">
        <name>ECMAScript Regexps</name>
        <t>Perform the following steps on an I-Regexp to obtain an ECMAScript
regexp <xref target="ECMA-262"/>:</t>
        <ul spacing="normal">
          <li>For any dots (<tt>.</tt>) outside character classes (first alternative
of <tt>charClass</tt> production): replace dot by <tt>[^\n\r]</tt>.</li>
          <li>Envelope the result in <tt>^</tt> and <tt>$</tt>.</li>
        </ul>
        <t>Note that where a regexp literal is required,
the actual regexp needs to be enclosed in <tt>/</tt>.</t>
      </section>
      <section anchor="pcre-re2-ruby-regexps">
        <name>PCRE, RE2, Ruby Regexps</name>
        <t>Perform the same steps as in <xref target="toESreg"/> to obtain a valid regexp in
PCRE <xref target="PCRE2"/>, the Go programming language <xref target="RE2"/>, and the Ruby
programming language, except that the last step is:</t>
        <ul spacing="normal">
          <li>Enclose the regexp in <tt>\A</tt> and <tt>\z</tt>.</li>
        </ul>
      </section>
    </section>
    <section anchor="background">
      <name>Motivation and Background</name>
      <t>While regular expressions originally were intended to describe a
formal language to support a Boolean matching function, they
have been enhanced with parsing functions that support the extraction
and replacement of arbitrary portions of the matched text. With this
accretion of features, parsing regexp libraries have become
more susceptible to bugs and surprising performance degradations which
can be exploited in Denial of Service attacks by
an attacker who controls the regexp submitted for
processing. I-Regexp is designed to offer interoperability, and to be
less vulnerable to such attacks, with the trade-off that its only
function is to offer a boolean response as to whether a character
sequence is matched by a regexp.</t>
      <section anchor="subsetting">
        <name>Implementing I-Regexp</name>
        <t>XSD regexps are relatively easy to implement or map to widely
implemented parsing regexp dialects, with these notable
exceptions:</t>
        <ul spacing="normal">
          <li>Character class subtraction.  This is a very useful feature in many
specifications, but it is unfortunately mostly absent from parsing
regexp dialects. Thus, it is omitted from I-Regexp.</li>
          <li>Multi-character escapes.  <tt>\d</tt>, <tt>\w</tt>, <tt>\s</tt> and their uppercase
complement classes exhibit a
large amount of variation between regexp flavors.  Thus, they are
omitted from I-Regexp.</li>
          <li>Not all regexp implementations
support accesses to Unicode tables that enable
executing constructs such as <tt>\p{Nd}</tt>,
although the <tt>\p</tt>/<tt>\P</tt> feature in general is now quite
widely available. While in principle it's possible to
translate these into character-class matches, this also requires
access to those tables. Thus, regexp libraries in severely
constrained environments may not be able to support I-Regexp
conformance.</li>
        </ul>
      </section>
    </section>
    <section anchor="iana-considerations">
      <name>IANA Considerations</name>
      <t>This document makes no requests of IANA.</t>
    </section>
    <section anchor="security-considerations">
      <name>Security considerations</name>
      <t>As discussed in <xref target="background"/>, more complex regexp libraries may
contain exploitable bugs leading to crashes and remote code
execution.  There is also the problem that such libraries often have
hard-to-predict performance characteristics, leading to attacks
that overload an implementation by matching against an expensive
attacker-controlled regexp.</t>
      <t>I-Regexps have been designed to allow implementation in a way that is
resilient to both threats; this objective needs to be addressed
throughout the implementation effort.
Non-checking implementations (see <xref target="checking"/>) are likely to expose
security limitations of any regexp engine they use, which may be less
problematic if that engine has been built with security considerations
in mind (e.g., <xref target="RE2"/>); a checking implementation is still <bcp14>RECOMMENDED</bcp14>.</t>
    </section>
  </middle>
  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <reference anchor="XSD-2" target="https://www.w3.org/TR/2004/REC-xmlschema-2-20041028/">
          <front>
            <title>XML Schema Part 2: Datatypes Second Edition</title>
            <author fullname="Ashok Malhotra" role="editor"/>
            <author fullname="Paul V. Biron" role="editor"/>
            <date day="28" month="October" year="2004"/>
          </front>
          <seriesInfo name="W3C REC" value="REC-xmlschema-2-20041028"/>
          <seriesInfo name="W3C" value="REC-xmlschema-2-20041028"/>
        </reference>
        <reference anchor="XSD11-2" target="https://www.w3.org/TR/2012/REC-xmlschema11-2-20120405/">
          <front>
            <title>W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes</title>
            <author fullname="Ashok Malhotra" role="editor"/>
            <author fullname="David Peterson" role="editor"/>
            <author fullname="Henry Thompson" role="editor"/>
            <author fullname="Michael Sperberg-McQueen" role="editor"/>
            <author fullname="Paul V. Biron" role="editor"/>
            <author fullname="Sandy Gao" role="editor"/>
            <date day="5" month="April" year="2012"/>
          </front>
          <seriesInfo name="W3C" value="REC-xmlschema11-2-20120405"/>
          <seriesInfo name="W3C REC" value="REC-xmlschema11-2-20120405"/>
        </reference>
        <reference anchor="RFC5234" target="https://www.rfc-editor.org/info/rfc5234">
          <front>
            <title>Augmented BNF for Syntax Specifications: ABNF</title>
            <author fullname="D. Crocker" initials="D." role="editor" surname="Crocker">
              <organization/>
            </author>
            <author fullname="P. Overell" initials="P." surname="Overell">
              <organization/>
            </author>
            <date month="January" year="2008"/>
            <abstract>
              <t>Internet technical specifications often need to define a formal syntax.  Over the years, a modified version of Backus-Naur Form (BNF), called Augmented BNF (ABNF), has been popular among many Internet specifications.  The current specification documents ABNF. It balances compactness and simplicity with reasonable representational power.  The differences between standard BNF and ABNF involve naming rules, repetition, alternatives, order-independence, and value ranges.  This specification also supplies additional rule definitions and encoding for a core lexical analyzer of the type common to several Internet specifications.  [STANDARDS-TRACK]</t>
            </abstract>
          </front>
          <seriesInfo name="STD" value="68"/>
          <seriesInfo name="RFC" value="5234"/>
          <seriesInfo name="DOI" value="10.17487/RFC5234"/>
        </reference>
        <reference anchor="RFC7405" target="https://www.rfc-editor.org/info/rfc7405">
          <front>
            <title>Case-Sensitive String Support in ABNF</title>
            <author fullname="P. Kyzivat" initials="P." surname="Kyzivat">
              <organization/>
            </author>
            <date month="December" year="2014"/>
            <abstract>
              <t>This document extends the base definition of ABNF (Augmented Backus-Naur Form) to include a way to specify US-ASCII string literals that are matched in a case-sensitive manner.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="7405"/>
          <seriesInfo name="DOI" value="10.17487/RFC7405"/>
        </reference>
        <reference anchor="RFC2119" target="https://www.rfc-editor.org/info/rfc2119">
          <front>
            <title>Key words for use in RFCs to Indicate Requirement Levels</title>
            <author fullname="S. Bradner" initials="S." surname="Bradner">
              <organization/>
            </author>
            <date month="March" year="1997"/>
            <abstract>
              <t>In many standards track documents several words are used to signify the requirements in the specification.  These words are often capitalized. This document defines these words as they should be interpreted in IETF documents.  This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.</t>
            </abstract>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="2119"/>
          <seriesInfo name="DOI" value="10.17487/RFC2119"/>
        </reference>
        <reference anchor="RFC8174" target="https://www.rfc-editor.org/info/rfc8174">
          <front>
            <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
            <author fullname="B. Leiba" initials="B." surname="Leiba">
              <organization/>
            </author>
            <date month="May" year="2017"/>
            <abstract>
              <t>RFC 2119 specifies common key words that may be used in protocol  specifications.  This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the  defined special meanings.</t>
            </abstract>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="8174"/>
          <seriesInfo name="DOI" value="10.17487/RFC8174"/>
        </reference>
      </references>
      <references>
        <name>Informative References</name>
        <reference anchor="RE2" target="https://github.com/google/re2">
          <front>
            <title>RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.</title>
            <author>
              <organization/>
            </author>
            <date>n.d.</date>
          </front>
        </reference>
        <reference anchor="PCRE2" target="http://pcre.org/current/doc/html/">
          <front>
            <title>Perl-compatible Regular Expressions (revised API: PCRE2)</title>
            <author>
              <organization/>
            </author>
            <date>n.d.</date>
          </front>
        </reference>
        <reference anchor="ECMA-262" target="https://www.ecma-international.org/wp-content/uploads/ECMA-262.pdf">
          <front>
            <title>ECMAScript 2020 Language Specification</title>
            <author>
              <organization>Ecma International</organization>
            </author>
            <date year="2020" month="June"/>
          </front>
          <seriesInfo name="ECMA" value="Standard ECMA-262, 11th Edition"/>
        </reference>
        <reference anchor="RFC7493" target="https://www.rfc-editor.org/info/rfc7493">
          <front>
            <title>The I-JSON Message Format</title>
            <author fullname="T. Bray" initials="T." role="editor" surname="Bray">
              <organization/>
            </author>
            <date month="March" year="2015"/>
            <abstract>
              <t>I-JSON (short for "Internet JSON") is a restricted profile of JSON designed to maximize interoperability and increase confidence that software can process it successfully with predictable results.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="7493"/>
          <seriesInfo name="DOI" value="10.17487/RFC7493"/>
        </reference>
      </references>
    </references>
    <section anchor="rfcs">
      <name>Regexps and Similar Constructs in Recent Published RFCs</name>
      <t>This appendix contains a number of regular expressions that have been
extracted from some recently published RFCs based on some ad-hoc matching.
Multi-line constructions were not included.
With the exception of some (often surprisingly dubious) usage of multi-character
escapes and a reference to the <tt>IsBasicLatin</tt> Unicode block, all
regular expressions validate against the ABNF in <xref target="iregexp-abnf"/>.</t>
      <figure anchor="iregexp-examples">
        <name>Example regular expressions extracted from RFCs</name>
        <artwork><![CDATA[
rfc6021.txt  459 (([0-1](\.[1-3]?[0-9]))|(2\.(0|([1-9]\d*))))
rfc6021.txt  513 \d*(\.\d*){1,127}
rfc6021.txt  529 \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?
rfc6021.txt  631 ([0-9a-fA-F]{2}(:[0-9a-fA-F]{2})*)?
rfc6021.txt  647 [0-9a-fA-F]{2}(:[0-9a-fA-F]{2}){5}
rfc6021.txt  933 ((:|[0-9a-fA-F]{0,4}):)([0-9a-fA-F]{0,4}:){0,5}
rfc6021.txt  938 (([^:]+:){6}(([^:]+:[^:]+)|(.*\..*)))|
rfc6021.txt 1026 ((:|[0-9a-fA-F]{0,4}):)([0-9a-fA-F]{0,4}:){0,5}
rfc6021.txt 1031 (([^:]+:){6}(([^:]+:[^:]+)|(.*\..*)))|
rfc6020.txt 6647 [0-9a-fA-F]*
rfc6095.txt 2544 \S(.*\S)?
rfc6110.txt 1583 [aeiouy]*
rfc6110.txt 3222 [A-Z][a-z]*
rfc6536.txt 1583 \*
rfc6536.txt 1632 [^\*].*
rfc6643.txt  524 \p{IsBasicLatin}{0,255}
rfc6728.txt 3480 \S+
rfc6728.txt 3500 \S(.*\S)?
rfc6991.txt  477 (([0-1](\.[1-3]?[0-9]))|(2\.(0|([1-9]\d*))))
rfc6991.txt  525 \d*(\.\d*){1,127}
rfc6991.txt  541 [a-zA-Z_][a-zA-Z0-9\-_.]*
rfc6991.txt  542 .|..|[^xX].*|.[^mM].*|..[^lL].*
rfc6991.txt  571 \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?
rfc6991.txt  665 ([0-9a-fA-F]{2}(:[0-9a-fA-F]{2})*)?
rfc6991.txt  693 [0-9a-fA-F]{2}(:[0-9a-fA-F]{2}){5}
rfc6991.txt  725 ([0-9a-fA-F]{2}(:[0-9a-fA-F]{2})*)?
rfc6991.txt  743 [0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-
rfc6991.txt 1041 ((:|[0-9a-fA-F]{0,4}):)([0-9a-fA-F]{0,4}:){0,5}
rfc6991.txt 1046 (([^:]+:){6}(([^:]+:[^:]+)|(.*\..*)))|
rfc6991.txt 1099 [0-9\.]*
rfc6991.txt 1109 [0-9a-fA-F:\.]*
rfc6991.txt 1164 ((:|[0-9a-fA-F]{0,4}):)([0-9a-fA-F]{0,4}:){0,5}
rfc6991.txt 1169 (([^:]+:){6}(([^:]+:[^:]+)|(.*\..*)))|
rfc7407.txt  933 ([0-9a-fA-F]){2}(:([0-9a-fA-F]){2}){0,254}
rfc7407.txt 1494 ([0-9a-fA-F]){2}(:([0-9a-fA-F]){2}){4,31}
rfc7758.txt  703 \d{2}:\d{2}:\d{2}(\.\d+)?
rfc7758.txt 1358 \d{2}:\d{2}:\d{2}(\.\d+)?
rfc7895.txt  349 \d{4}-\d{2}-\d{2}
rfc7950.txt 8323 [0-9a-fA-F]*
rfc7950.txt 8355 [a-zA-Z_][a-zA-Z0-9\-_.]*
rfc7950.txt 8356 [xX][mM][lL].*
rfc8040.txt 4713 \d{4}-\d{2}-\d{2}
rfc8049.txt 6704 [A-Z]{2}
rfc8194.txt  629 \*
rfc8194.txt  637 [0-9]{8}\.[0-9]{6}
rfc8194.txt  905 Z|[\+\-]\d{2}:\d{2}
rfc8194.txt  963 (2((2[4-9])|(3[0-9]))\.).*
rfc8194.txt  974 (([fF]{2}[0-9a-fA-F]{2}):).*
rfc8299.txt 7986 [A-Z]{2}
rfc8341.txt 1878 \*
rfc8341.txt 1927 [^\*].*
rfc8407.txt 1723 [0-9\.]*
rfc8407.txt 1749 [a-zA-Z_][a-zA-Z0-9\-_.]*
rfc8407.txt 1750 .|..|[^xX].*|.[^mM].*|..[^lL].*
rfc8525.txt  550 \d{4}-\d{2}-\d{2}
rfc8776.txt  838 /?([a-zA-Z0-9\-_.]+)(/[a-zA-Z0-9\-_.]+)*
rfc8776.txt  874 ([a-zA-Z0-9\-_.]+:)*
rfc8819.txt  311 [\S ]+
rfc8944.txt  596 [0-9a-fA-F]{2}(:[0-9a-fA-F]{2}){7}
]]></artwork>
      </figure>
      <t>The multi-character escapes (MCE) or the character classes built
around them used here can be substituted as shown in <xref target="tbl-sub"/>.</t>
      <table anchor="tbl-sub">
        <name>Substitutes for multi-character escapes in examples</name>
        <thead>
          <tr>
            <th align="left">MCE/class</th>
            <th align="left">Substitute class</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td align="left">
              <tt>\S</tt></td>
            <td align="left">
              <tt>[^ \t\n\r]</tt></td>
          </tr>
          <tr>
            <td align="left">
              <tt>[\S ]</tt></td>
            <td align="left">
              <tt>[^\t\n\r]</tt></td>
          </tr>
          <tr>
            <td align="left">
              <tt>\d</tt></td>
            <td align="left">
              <tt>[0-9]</tt></td>
          </tr>
        </tbody>
      </table>
      <t>Note that the semantics of <tt>\d</tt> in XSD regular expressions is that of
<tt>\p{Nd}</tt>; however, this would include all Unicode characters that are
digits in various writing systems and certainly is not actually meant
in the RFCs listed.</t>
      <t>The construct <tt>\p{IsBasicLatin}</tt> is essentially a reference to legacy
ASCII, it can be replaced by the somewhat more accessible character
class <tt>[\u0000-\u007f]</tt>.</t>
    </section>
    <section numbered="false" anchor="acknowledgements">
      <name>Acknowledgements</name>
      <t>This draft has been motivated by the discussion in the IETF JSONPATH
WG about whether to include a regexp mechanism into the JSONPath query
expression specification, as well as by previous discussions about the
YANG <tt>pattern</tt> and CDDL <tt>.regexp</tt> features.</t>
      <t>The basic approach for this draft was inspired by <xref target="RFC7493">The
I-JSON Message Format</xref>.</t>
    </section>
  </back>
  <!-- ##markdown-source:
H4sIAAAAAAAAA6Vb63LbRpb+j6fopXcqlExQvFNk1pPItJxoy5I1llOZiSQP
QaBJYgwCHDSgSyil9jX23/7YJ9l9k32S/c7pbgCklNsMSyUCfT197pem67rO
zVh0HScLs0iOxR8dIU7cD3Ih79ZjcRSLkziTabKWqTeLpNAd4m2SrrzM8Waz
VGK6neAEiR97KywTpN48c0OZzd2/qSRee9nSDVMe5EZeJlXmBPgai06r03Fb
Q7fddpzP8v42SYOx3jOWmfuGlnF8LxsLlQWOn8RKxipXY5GluXRUPluFSoVJ
nN2vsdjJ8ce3jnMj41yOcY6VF0Zj8e8X78/Osf/XBE0zSRfoWYTZMp+NBQN4
uyhgPDAwOo6XZ8skpVVcoY808VKVyVi8prPHMXqEwGpj8V0c3shUhdn//ncm
XqdyhUEffzjhASpLpQT054nK5p6/FN1uq9drcZ8fZvdjM0E3JAH2eeN2Drv9
kWnJ4yzFqG8kbXrPjetlEmPcy97I7XXabqd96A66o06bO6U+tO/Nkq+zH0Nz
XnuGj+EKG3r3JfAf5V2WexFAqU7PZhj0dVb0Nf1k5TgxUx2HJbT8+QKAjsX3
3Unzw/HEvVtFyl9iuttxO61Wr93qHOpR7faz46gZI9udVq/Vx8gPbyf9Trc3
Ft4snuv3IXr0u+srxwnjeRWAD8edMcNc4Vz6oF2ESnhi7qmsIZQ3lw2RLVPp
Be48DWUcRPfCi4jDeC2RJWLm+Z+zFP/CeCHAAnnkpQJ8kErmLiHjRRhLJaLw
M8YvEyVFrmQgwlicTz4cN8S5TKOG8OJAnN+jP26Kk0xDMXn5EtOAz/S+qcH1
0gWxRG2ZZWs1PjjQzEgoPlgkySKSB6ns1DCWln7+jLSdiwng2dCIJYN8XICs
RB2iGRKQR+cnY73W3lMAsP/aTyXxyYGfp6mMswNI8cEyW0UHBMTx5PTI7Qws
HGauhf329rYpfVA9jA1Ck9iLeLXbNSBEKxbM11HiBerArtVcB/Pqsaj9wk/D
dUb6oCXeefEi9xZSXKylH85Dn9flGaVcElczCx9jf6MyzP7cW+iXltsaaGGU
IL8iNhobPNLGY3GRgXBeGhRnbYh2O1uK4yDkfR3XdcGGijgkc5yPSxAWOMoh
t5lQGkQwh9WCDeK9yLtJUpHMn+EmBQ7yiDucKFyFmWYj5UPHiluwAnqlWCRe
RLPDQvsSJMLz00QpQapABOF8LolejtnCrTCs5jhA1dTQr8IgiKTjAE1pEuQ+
r2Y+mxchtT46ryofx5xTVSkgAqlAphkO68UV2IgFn5EajYRGgZjmLu4gQ4qP
qw1JqDepaR1cExB3cUuo8lKSN6ii6B7GIIqAsufQCjSu02SReqsV5Ljp1OzG
NRJFFliPRDKGVhUr6cUk7Z7wlx4RVqakranpdhlCU4N5Sd8o0g8EYyr/noes
rDMFXYS2XfR8yePWUZ6CeOgs9lc1HN2+4Pg4dZxkBOxNGOD0wY0X+/pQNGAu
vSzHqYTKAQdA9r01NYhFmuRr1RBRknz2llBoDUgAK69UMi/4RHBoHpWv10ma
KZHEpOxgs5II5wXjZP6SjogVvVlIqr0hwqZsQkPCJuvDS5wixaQFlGNsYeKZ
RHfTvA6lL4lDyUhUT1dsTbgAroAyrAEuyRJ6xAzYS7J0Jd5VdT4rTRh2JTMa
DANiQFBis2Gj8/j4hJHC2I9yMKdI8wj/iXFAPlhlPlJBBu4AH2g5UxIDQKlb
GUXu5zi5LQ5blZ4XL2AjUzBUEiWLe9pYCjgqgjwVkPj0u4uPtYb+Fmfv+fnD
8Z++O/lw/IaeL749eveueHDMiItv33/37k35VM6cvD89PT57oyejVWw1ObXT
o7/UtKGpvT//ePL+7OhdTVhuLPBBAkNmTWohhYhkzPyOFWDWOa8n5//zX+0e
8PovsLaddnv0+GheDtvDHl7ADLHejRlJv4Kw9463XkuIH1aBPBI7hZkXgTXB
rWpJqAQPSaBv/5Iwcz0W/zbz1+3eH00DHXir0eJsq5Fx9rTlyWSNxGeantmm
wOZW+w6mt+E9+svWu8V7pVGzBWseaAKoKMOIv5kw4uj12duGg4ctCm02Lrk/
oATRwLzBGWIReAGzX1FJTsnmoEAeBWKJSZFkQbyBLwQZ/luSQuRJrtYkegyp
75EWviV6Cc8pNERaiGMsZaDB8cioemIF6Y12LENCGuPvuUzvncha71I3A9oj
MYNEzSHa6Y3UIOSzKFRLLA2GU+I+lBHto3Utbw73hqw64dE5AsPFQXgnjkSd
xTdOBHWv4DTARJAzFAGZ8NLUXtP5nkb4kH8PYMs7X64zNhGE4FD5uVJ6p1Q2
HHxBJRicgTJ+Ar2A7tl9qTkaYpaQyriPM4M30qugCYFCK4QpUzSmEyi4t1B8
PquPYg1xQZPvYG4DOY8fgZG47GOZMBbHGhxiiR0sAw+bjY2jNGdgj59++kl7
zaFrEPcKyIZJWYr9uqg91OzbnmMeXol9VuCOVuOvgMFkJS5BQAIc3kwqrp3K
yytRF3+468A5ei2+FF/sf+F+8fILRxyI2lc1sYdvbLOpmelgsNojmp3i9ZX4
Ez0e3wF32KXWqOF/pekau1VeX4n2/h/uui23O6LdWtht9AVCMoLxlTijICCa
wHhgX7Ihk8iDT8Qw1KELLQ5qewRDZbQ+RKvldoYYjeNM3M4b2qCBDVw+Dhrf
ut1jajxA0x9NY6/l9o+o8Ws0/mAa+8fukBs/ofFH0zg8dtutt/hg6xK2V6LW
rKH/AoIVSQLmWPlV6HdfITjO9mAscVXTVDg0VKgXVEAj7CKDTSdpFqRhOF+7
fe66RNcnPV7V4pr+Ts13pgcPX7tDxskGgx8xuHIMDQY40UBLEodHZwtsAvSS
yFv7VANdQROXVp5MjttglH3+vuTGa1G7rjncQJSZTGgd02derpm3qgA4pqek
5USf/9hQqLlNIUuMXeQ75hiv6OxXa/AuLXsOd5Z417FnM/3nT/rtyytxoiYI
NBYJFF/5iPZ3knQX8eWpl36m77N8NdMt53nsI7JmkQZgcg1HKEuoi+OSi/vV
LIlo4HtSUcqxazE07wi7dPrBxB0wrSKceWUJmxhC9txhnzozdOZESMiYhoRX
OTWrqJpvOEDWCs6gsRZaHn1WjA7MqKhWbEejqyfiGecFlF13MCBAfAAyt1CG
tSq0qqb0MhVU8Co/FPva/dY7EwyqePTFkzN9Nt+rbWg1WvWkyZNJc43BwbE7
eEuAxwA8MRiEqnU2Y/GiqoN1FPuqtqvoYXlIhddI1XPA5AU6oCT3QFKswWFY
Q0y3JGjKNhfhAWxMcktmKtFWWUwvP11PGyY88XwfHihZajYXmKPdD4RMbMiA
SFg1inecdaJCznSUkY7PeonicxhIXgQGR7GEVEZNP02tq40/WFby59n0lW45
W8Akz3YXp+QcB8x8RDtolUdZ6JZD4ex460qkM71S04YzvbqYardzenVrnuwK
NnqYRYn/mSxs1YyG5ASQO6Q5ka0qLLpXuAe7IyiAWVYicR14IIaKCYGw4RRk
NsQ8h5NrdzYRDqHEeq068tte27F+ruDFRZgpGc2JWkNKXIhDdxZWsYZ9Szwc
XUxOTujcehW7Je3yJHyiOJLiq3tagrhuhxI2jJkspU5wnWzBibjqhW+6iFXF
vn3b/1nEhhRZSp3H4NE6aFuvo1AGjg1jOQrDxJCiW5vbeJJTIOIiQOSYkfIa
OM0MewHoAuBwB+AFp+3Y6UI0lzIjzxFJ0zYME4UocPSC+IvMgaBQV8Z8G8YY
n0G8Ync7fcGe3V1DqET4xbaqGhM0nePSj7QyR24+/Op74rKVB6rQoSG2rpzP
mUe2IXcYupW3LjGLlTxIO4XcW9HnPfmfiucLlcl1wRsOQCY30UotVlvTc5El
qrq4HEOYEY+PX3JUxhGcYwA0zj2gCBLN5sXx6VjBrZfOjUuuMi0u0tnhBr0U
hGaHTCYGK4Ifm1pg6lAkVGwFpAEHYNOthJ9JlMHFj5LSF2e4buEsMtByHsYU
TGmt6OhgxvjjO+D4UMGI2j9rTvSYCQoIihLGztlY9SDs3cFNERY1BLSrCmfM
W4wc8K8PVUm4YugMnQAQhcU7y1cEtBIo2PjB5uCMPdnOjDBu8F3JiuwSQFuK
ItKsJEJIqzgccRW5IRAYyplDJJvTNPxTrF+/kFCmEWRks7lTgfG0KRrdowOc
Gk6s8rZ5egOzIf2MdI3lRsepf3ytNacquFij1YTI8zxlfkE4VkCzgz+iJ2/+
gk2SOR4ZhQrDMMQhm+ByUHkuFvqqJOmhrDSoLjIn30YHsWdJZjSMJ+byVsg0
pYCYhL5YDmE3tNNMEs+EdxUcUsWDdtOcGiodf2KuwalJYhVJSlPrEEVur+kQ
+O1mmyCMiMduASRxnkUKB7XmmO1mS1vOXR1ksmroZyGkmgbWyAoADEVm+YLg
V4weOiGbOE7oK4fWXecpeF8qa0Sf6PVnjK3Z2LEpPXIr/p6HN+AOLdC/MKtt
E4HanFWKBparNy+y5PgCo8Bc51pLMtTzhFwpIq3WpBq6KpsmM84UoLVc1tqx
zcbWBR4fx46zT9VPtlRBAoauT5vTPQHPRIXBEw8LyKnPw1Rl1YITfHycqXT6
pmTyTE5+b0y2MPJgyLA6mQA4fVfxVXoNT2xfHMc3MqJCgdHIEFhiL/hp2lv6
1+kWk5qkTmlYMmlS0yaZHXD2A74k1fnsMJJAZUQQbBclxpBMD6Ya87rk9eG4
g385QCykropz5a2s4fKUlgFLnMcqxgVoHxap7zB2aHUM5oIViQst9k1SzeyL
IsO02ZhBdHoaSPA4zw1tmCxQ4R+gB2QhAIEPJuuxPms1AUWnvjoyyL36kc5/
moCGpW147fmfKS1P6bkXs+Jlu5jyaruy8v0yfLZgAsZMw0UYs6dyS6QrUkps
6nRiEA4910GjEg3otR7iM7l+q79M5rZUTzJe6roDG0QKGKrDjfWtup7yznr0
jvHaiFPZtoOjvRQOLXsuNEGfZ24UKxUPAl0rEN9bR5BcM5glI+S26tEoINnN
xlvNCqdSOivSmypX7JKRB0cMmy8UU0VBNaUhL2K8JXZBAwm+CKrG0SGnYMap
yiixtbg3Mg51+e1CpjchJnpZ5pGTC+aiOI7fJBWnEo6hUgpBK2zD9xIyWg1b
W4eASlKiWuYAQcNFrImbUC2vWlEztRnma5JEh/X9TR7FxmFlmpNDqEFrlLVD
0CCQLlY0dUZTBirsGO1dbOmJWWn/13S/gsQV3WUdqNBpjoLWYCc7VAVNZ/eF
gtHaoXBCtlyBzQvtuWTa9FfLOh5XiCLWjWB86Sltjew65B2Sy0xAsb1zqvZu
h1kC42iU+MCJ4FsSzpwyEcwCP9kJhysRa1MIG/Z65H/cU6ABF9ByKbGJuZCx
ZfUoUZwTzrniSOncLIfWp3OtEpVR0nhGUbSYp8nKwo5FdqBvYvcca+l1EstM
NKdSTN0Xp8+H04B+ehUgcNbhMwXVVkNSyLQGh5Gj7AgTFzOWrcGSd8uQIlMP
3RGV+4W3omsoJA83kEOt+2Yyu5VleVCXehWjjQC3Lj7Zup+F/kwnOQplu+On
iFKp+SRAkvnShr9MURshxExeAdilnzPj0VWhLM39rJpbWG/OgsdpAwNhjpdJ
vtACgw7YtqvzaZW6CxlbYxkntwLmMqMdjMfl3XhhRJtCnbE259ozHKiQ4rUw
+7//+E9lIwOSVcwEa8WK7j4ZpoSwJ6VsuZoFTZm1oR0qdguNqSZ8aDzo4JMt
FePAMssTdUnXCqjGSSIjDEZgc0ELGd+EaRJz4YjjKco3kWkpNItGfBEVCVuc
8NgPdU6Ozo7EBCsCHSbqfMbi7ZZpV95nLn3zmRBPsoGgpbCicwHScXXK/7Vl
zdpHTwLdiv2FT8AWQjP43VPk4NSOyX9Z7c+nZxsChWhTa37qKS58s8FbkW9F
/OcYVjOqgo21IRhfAtBJDGtCwYDlzsmcrpGRLYMtTgM3S1x4AEHoZ1vGqmCN
UFEs2KgCZZS+TidQuYru1zzjO0M5F26At8BRyRPl80rgGABYU+YaMxYVtxCa
1YJi6TNUjRYnKJ+khsinuwVL2UsuYF0YM+PecxWNL2Nl6kvN5Mnsb1B5FOhU
HU8vCMgrkgHOmJKkUu7vaZKtyD2cVYP53ci/riT5ikWW63FvJ/LBISFPsG+G
BTlj5xU+DDn7hoH0TTCt32ASbDbWpIDISDuG9lQIFuHcaiietoQaYjTO8hCu
u7mG8Dzfk4UJwXR12Vw0G9bV3fuSLfKzByUOBK9Ao26lrfgCEImG41h6EjNf
4Izkf05KTYktP0ifSHW+XZvdvEjnvvoFp/ZXPkYReLaGawSP41yuM/ziTamC
+xzjgFprouAIYhpBDDru1JNnHukFYIVHeYG7TPxCGpqONp0RUaWwFdovJFkm
fWjjcVNN1g6wcSAIXl63rqW5dDgBSJDPwiRXe2AQ8s4xdCft7di0N9HBKwN8
W/mdnqjXngr9dyBrPN1OeTdI7JzncMWBFNkXK+lFEZmV43NlYwdkHbQ67WZ2
lwnR649EvX7ZctvX9avmZdvtXn+Ft9H13t5DvXPVrLce6mgdXV8F+3v4bM/u
t7sCHZhJ3Zt2o90ZPu4M6YwwZNN7dPG/Y/5/5P/jyn9e4uXeV9uTB922IOBG
njs/ct9e08Dx9vve/pNJvaH4lTmb/g6Uo24XaBg/VMe1Gr3HvfFefbdtvIev
pwscEh4/ja9fon/waJ/5P1DZ3L9qNgmDD1vz2q3O4J/auN0iFP2ejVs8b7CD
pX3dO+pzb6ff64mrC5p8YdDbbuuJ7f5hV1x6Etx+b2bZrm6n0xGXR+4P15ee
+6Pp7HcH5byrnbZBFxM+Xe1fN3XHoNe1XIP915uqTDzi8J2+Of6wc6i37B22
AOjL7cZ+q7UD/Whk+X04/P38Xszud/o/w+/lkF5b0PGBhr9emwesf+X+tWkw
UhnaEc2HZvPh8tPdn4GCh+blp9UpP+ApemexUk4Ytn+3LBWTB4P+b5alctKo
+1tlqZgz7PwDGw172xsdPrrV196T163p7Vav/Q+JUWWBwe8Ro3LeaMRwX+1S
F0Ixqpxo/MyAQe+fg7k9GP0OmIe91rCi7Co77DGFdlv2WN56j1tz271R7zfN
7TW6bT112D80JG6RsfgFXi2Gtrv9w18Zemg0FTTAM/aFh4z6Wi0ddjvdJ6qu
0tvv/7LAVocOxCVE9RJSelnI52Grp/t7QzaHz8CCISOtdoetnlaRtqc96hlR
I0O5v9PW1Uqa5AHaip8GO/NGrb744eHy6uWVe13B1s6gAUjeqdc7lz3SdQ/1
rlF6V8295s6moyHx5eWcJXVHbsd2dGekDzQcHQ62D9TtGfY8HB7aAxVto86w
qvAPC74aGhpZMan0gMC/SJ/K0H7rtyjUQ6hxo1Ax4XmCDYfaRIHoh+Lgq/rO
xi/36gdPmvZ3ZhIedweNzSig2/BvG/bi6kJcsw07HPUMFfqjwa9qXtie3csp
8s6j+EDZCyrH+v1ZR3vbs3bIg6aLK3St9efubNRPJ8d7whSGnhY+OMRxPJ0b
x5CVvoHPobLJulJKELDl5g6svjisSwWzyEUv+6kPAhsd6PzIg7go5pis3YPz
4Jaf6rNtwgp0o0T/2uGBKiriKtNFFd1EAxjvUztgq18PuAoqK5DATEXxeWC8
G6AtuktI9cXzn8MjJyA0pQjlZfmGyyi2Fsw1I4IhjO3tm6c/gFC2KuzYZNeX
AkilJJBJKenLQbbS51WutZTX8U2FM5VOEC5CHRdS5g8xjbhFqMqFtHuVyZWO
YMw9WypEmptLXE+inKcE8PrHElJHZYjQMgqpmLOKwItzc1sO3nT3xtFOmBTJ
heffO3xPhvOkhqVMTYLz0ow/BGn8GxLOBukEGufkymBMsxEYIG/h49LXcE71
NufIp58FRDJYmDvWT+LazTiPdQQrg0eb7qIfLZah/koXjEqQTNLKJEuohX63
qH+nePTxW+f7b4Q3o3RHcV+iLM0WtbyVxAHiUK10HpFWsT90NNewKz/E2UpR
8x19+s0Dfc/ovo28YdKWcCkDAF1n+MvR2Tdiam5i6zTy5M2bd2La1JAUSVNl
iDojIlKonyb0g0ddOS7QcsvFQLUOzQ3rzeYr/qUffMs6ZjsnLh1DnAJyip31
D033SA/8P6UatHirOgAA

-->

</rfc>
