<?xml version="1.0" encoding="UTF-8"?>
<rfc xmlns:xi="http://www.w3.org/2001/XInclude"
     version="3"
     ipr="trust200902"
     category="info"
     submissionType="independent"
     docName="draft-rayner-proquint-03">

  <front>
    <title abbrev="Proquint">Proquints: Readable, Spellable, and Pronounceable Identifiers</title>
    <seriesInfo name="Internet-Draft" value="draft-rayner-proquint-03"/>

    <author fullname="Thomas Rayner" initials="" surname="Rayner">
      <organization>Independent</organization>
      <address>
        <email>thmsrynr@outlook.com</email>
      </address>
    </author>

    <date year="2025" month="August" day="11"/>

    <abstract>
      <t>This document specifies "proquints" (PRO-nounceable QUINT-uplets), a
      human-friendly encoding that maps binary data to pronounceable identifiers
      using fixed consonant-vowel patterns. The concept was originally described
      by Daniel Shawcross Wilkerson in 2009.
      This document formalizes the format for archival and reference.</t>
    </abstract>

  </front>

  <middle>
    <section anchor="intro" numbered="true">
      <name>Introduction</name>
      <t>Proquints encode binary data as alternating consonant-vowel letters grouped
      into five-letter syllables, yielding identifiers that are readable, spellable,
      and pronounceable. The idea and specific letter tables were first described
      by Daniel Shawcross Wilkerson in 2009 (<xref target="WILKERSON2009"/>).
      This document does not claim originality for the concept; it reformulates and
      formalizes the description for archival purposes.</t>
      <t>While multiple schemes exist for encoding network addresses and other
      binary data, Proquints aim to provide a unique blend of human-reabability,
      accessibility, and long-term usability. They reduce transcription errors, are
      friendlier for non-technical users, and offer mnemonic qualities that can help
      in educational or operational contexts. Although they may not replace all
      existing representations, Proquints can serve as a complementary format that
      improves clarity in documentation, user interfaces, and spoken communication,
      particularly where accuracy and inclusivity matter.</t>
    </section>

    <section anchor="req" numbered="true">
      <name>Requirements Language</name>
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
      "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in BCP 14
      [<xref target="RFC2119"/>] [<xref target="RFC8174"/>] when, and only when, they
      appear in all capitals, as shown here.</t>
    </section>

    <section anchor="format" numbered="true">
      <name>Format</name>
      <t>A proquint encodes data in 16-bit blocks. Each block maps to a five-letter
      syllable of the form CVCVC (Consonant-Vowel-Consonant-Vowel-Consonant).</t>
      <t>The mapping tables are fixed:</t>
      <t>Consonants (indices 0..15):</t>
      <ul>
        <li><t>b d f g h j k l m n p r s t v z</t></li>
      </ul>
      <t>Vowels (indices 0..3):</t>
      <ul>
        <li><t>a i o u</t></li>
      </ul>
    </section>

    <section anchor="encoding" numbered="true">
      <name>Encoding</name>
      <ul>
        <li><t>Split the input byte string into 16-bit words (big-endian). If the number
        of bytes is odd, an implementation MAY pad a single zero byte to complete
        the final word; if padding is used, applications MUST define how the
        original length is recovered.</t></li>
        <li><t>For each 16-bit word, map bits 15-12 to the first consonant, bits 11-10 to
        the first vowel, bits 9-6 to the second consonant, bits 5-4 to the second
        vowel, and bits 3-0 to the final consonant.</t></li>
        <li><t>Concatenate syllables. Hyphens MAY be inserted between syllables for
        readability; decoders MUST ignore hyphens.</t></li>
      </ul>
    </section>

    <section anchor="decoding" numbered="true">
      <name>Decoding</name>
      <t>Decoders MUST reverse the mapping in <xref target="encoding"/>. Each five-letter
      syllable maps to one 16-bit value using the same tables and bit ordering.
      Hyphens, if present, MUST be ignored.</t>
    </section>

    <section anchor="spec" numbered="true">
      <name>Encoding and Decoding Specification</name>

      <section anchor="tables" numbered="true">
        <name>Letter Tables and Indices</name>
        <t>Proquint encodes each 16-bit word as five letters in the pattern CVCVC
    (Consonant–Vowel–Consonant–Vowel–Consonant). The mapping tables and
    indices are fixed and normative.</t>
        <t>Consonant table (index 0..15):</t>
        <artwork><![CDATA[
Index  Hex  Bits  Consonant
-----  ---  ----  ---------
  0     0   0000     b
  1     1   0001     d
  2     2   0010     f
  3     3   0011     g
  4     4   0100     h
  5     5   0101     j
  6     6   0110     k
  7     7   0111     l
  8     8   1000     m
  9     9   1001     n
 10     A   1010     p
 11     B   1011     r
 12     C   1100     s
 13     D   1101     t
 14     E   1110     v
 15     F   1111     z
]]></artwork>
      <t>Vowel table (index 0..3):</t>
      <artwork><![CDATA[
Index  Bits  Vowel
-----  ----  -----
  0    00      a
  1    01      i
  2    10      o
  3    11      u
]]></artwork>
    </section>

    <section anchor="bitlayout" numbered="true">
      <name>Bit Layout</name>
      <t>Each 16-bit input value (bits 15..0, most significant bit first) MUST be
    mapped to letters in this order:</t>
      <artwork><![CDATA[
bits 15..12 -> first consonant (C1)
bits 11..10 -> first vowel     (V1)
bits  9.. 6 -> second consonant(C2)
bits  5.. 4 -> second vowel    (V2)
bits  3.. 0 -> third consonant (C3)
]]></artwork>
      <t>Encoders MUST process input as an ordered sequence of 16-bit words formed
      from the input octet string in network byte order (big-endian): octet[i]
      contributes bits 15..8 and octet[i+1] contributes bits 7..0 of the word.
      If the input contains an odd number of octets, encoders MAY pad a single
      zero octet to complete the final 16-bit word; applications using padding
      MUST specify how the original length is recovered.</t>
      <t>Encoders MAY insert ASCII hyphens (0x2D) between syllables for readability.
      Decoders MUST ignore hyphens.</t>
    </section>

    <section anchor="encode-alg" numbered="true">
      <name>Encoding Algorithm (Pseudocode)</name>
      <artwork><![CDATA[
Input: bytes[]  // octet string
Output: string  // proquint

consonants = "bdfghjklmnprstvz"   // length 16, index 0..15
vowels     = "aiou"               // length 4,  index 0..3

function encode(bytes):
  out = ""
  i = 0
  while i < len(bytes):
    hi = bytes[i]; i += 1
    if i < len(bytes): lo = bytes[i]; i += 1
    else:              lo = 0  // optional pad
    w = (hi << 8) | lo        // 16-bit word

    c1 = consonants[(w >> 12) & 0xF]
    v1 = vowels    [(w >> 10) & 0x3]
    c2 = consonants[(w >>  6) & 0xF]
    v2 = vowels    [(w >>  4) & 0x3]
    c3 = consonants[(w      ) & 0xF]

    syllable = c1 + v1 + c2 + v2 + c3
    out += syllable
    // optional readability: insert '-' between syllables
    // e.g., if not last: out += '-'

  return out
]]></artwork>
      </section>

      <section anchor="decode-alg" numbered="true">
        <name>Decoding Algorithm (Pseudocode)</name>
        <artwork><![CDATA[
Input: string pq  // CVCVC syllables, hyphens optional
Output: bytes[]   // octet string

consonants = "bdfghjklmnprstvz"
vowels     = "aiou"

function indexOf(ch, table):
  pos = table.find(ch)
  if pos < 0: error("invalid character")
  return pos

function decode(pq):
  // remove hyphens; decoders MUST accept upper or lower case
  s = toLowercase(removeAll(pq, '-'))
  if len(s) % 5 != 0: error("length not multiple of 5")

  out = []
  for j in range(0, len(s), 5):
    c1 = indexOf(s[j+0], consonants)
    v1 = indexOf(s[j+1], vowels)
    c2 = indexOf(s[j+2], consonants)
    v2 = indexOf(s[j+3], vowels)
    c3 = indexOf(s[j+4], consonants)

    w = (c1 << 12) | (v1 << 10) | (c2 << 6) | (v2 << 4) | c3
    out.append( (w >> 8) & 0xFF )
    out.append(  w       & 0xFF )

  return out
]]></artwork>
      <t>Decoders MUST accept input in either case (upper/lower) and MUST reject any
      character not in the defined consonant/vowel sets (after stripping hyphens).
      If applications use padding on encode, they MUST specify how to remove any
      trailing zero octet introduced solely for padding.</t>
      </section>

      <section anchor="norms" numbered="true">
        <name>Normalization</name>
        <t>Encoders SHOULD produce lowercase output. Decoders MUST treat input as
        case-insensitive and MUST ignore ASCII hyphens (0x2D).</t>
        <t>Encoders and decoders MUST use the tables and ordering defined in
        <xref target="tables"/> and <xref target="bitlayout"/>. Substituting letters
        or re-ordering bits is not Proquint and will not interoperate.</t>
      </section>

      <section anchor="vectors" numbered="true">
        <name>Test Vectors</name>
        <t>The following vectors are derived directly from this specification and can
        be used to verify independent implementations.</t>
        <artwork><![CDATA[
# Single-word (16-bit) values:
0x0000 -> babab
0xFFFF -> zvzuz
0x1234 -> damuh
0xF00D -> zabat
0xBEEF -> ruroz

# Two words (32-bit), big-endian byte order:
bytes:  0x12 0x34 0xF0 0x0D
words:  0x1234, 0xF00D
pq:     damuh-zabat      (with hyphen)  or  damuhzabat (without)

# Raw ASCII example ("F3r41OutL4w"),
# UTF-8 bytes, zero-padded to even length:
ASCII:  46 33 72 34 31 4F 75 74 4C 34 77
Pad:                                      00
Words:  0x4633 0x7234 0x314F 0x7574 0x4C34 0x7700
PQ:     himug-lamud-kudaz-lijuh-hubuh-lisab  (hyphens optional)
]]></artwork>
        <t>Implementations MUST reproduce these outputs exactly.</t>
      </section>

      <section anchor="errors" numbered="true">
        <name>Error Handling</name>
        <t>Decoders MUST fail input that: (1) contains characters outside the defined
      tables (after hyphen removal); (2) has length not divisible by 5 letters; or
      (3) violates the CVCVC pattern. Error signaling is application-specific but
      MUST reject invalid input rather than attempt to guess.</t>
      </section>
    </section>

    <section anchor="security" numbered="true">
      <name>Security Considerations</name>
      <t>Proquint is a presentation encoding. It provides no confidentiality,
      integrity, or authentication services. It does not add or remove entropy,
      and it MUST NOT be used as a cryptographic transform.</t>
    </section>

    <section anchor="iana" numbered="true">
      <name>IANA Considerations</name>
      <t>This document has no IANA actions.</t>
    </section>

    <section anchor="ack" numbered="true">
      <name>Acknowledgments</name>
      <t>The author thanks Daniel Shawcross Wilkerson for originating the proquint
      concept and publishing the initial specification in 2009 (<xref target="WILKERSON2009"/>).</t>
    </section>
  </middle>

<back>
  <references>
    <name>References</name>

    <references anchor="normative">
      <name>Normative References</name>

      <referencegroup anchor="BCP14" target="https://www.rfc-editor.org/info/bcp14">
        <reference anchor="RFC2119" target="https://www.rfc-editor.org/info/rfc2119">
          <front>
            <title>Key words for use in RFCs to Indicate Requirement Levels</title>
            <author initials="S." surname="Bradner" fullname="Scott Bradner"/>
            <date month="March" year="1997"/>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="2119"/>
          <seriesInfo name="DOI" value="10.17487/RFC2119"/>
        </reference>

        <reference anchor="RFC8174" target="https://www.rfc-editor.org/info/rfc8174">
          <front>
            <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
            <author initials="B." surname="Leiba" fullname="Barry Leiba"/>
            <date month="May" year="2017"/>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="8174"/>
          <seriesInfo name="DOI" value="10.17487/RFC8174"/>
        </reference>
      </referencegroup>
    </references>

    <references anchor="informative">
      <name>Informative References</name>
      <reference anchor="WILKERSON2009" target="https://arxiv.org/html/0901.4016">
        <front>
          <title>Proquints: Identifiers that are Readable, Spellable, and Pronounceable</title>
          <author initials="D.S." surname="Wilkerson" fullname="Daniel Shawcross Wilkerson"/>
          <date month="January" year="2009"/>
        </front>
        <seriesInfo name="arXiv" value="0901.4016"/>
      </reference>
    </references>

  </references>
</back>
</rfc>
