<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE RFC SYSTEM "rfc2629.dtd" [
<!ENTITY RFC4648 SYSTEM
"http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.4648.xml">
<!ENTITY RFC8174 SYSTEM
"http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml">
<!ENTITY RFC2119 SYSTEM
"http://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml">
]>
<?xml-stylesheet type='text/xsl' href='RFC2629.xslt' ?>
<?rfc compact="yes"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<!-- Expand crefs and put them inline -->
<?rfc comments='yes' ?>
<?rfc inline='yes' ?>  

<rfc
    docName="draft-faltstrom-base45-12"
    ipr="trust200902"
    submissionType="IETF"
    category="info">
  <front>
    <title abbrev="Base45">
      The Base45 Data Encoding
    </title>
    <author fullname="Patrik Faltstrom" initials="P." surname="Faltstrom">
      <organization abbrev="Netnod">Netnod</organization>
      <address>
	<email>paf@netnod.se</email>
      </address>
    </author>
    <author fullname="Fredrik Ljunggren" initials="F." surname="Ljunggren">
      <organization abbrev="Kirei">Kirei</organization>
      <address>
	<email>fredrik@kirei.se</email>
      </address>
    </author>
    <author fullname="Dirk-Willem van Gulik" initials="D." surname="van Gulik">
      <organization abbrev="Webweaving">Webweaving</organization>
      <address>
	<email>dirkx@webweaving.org</email>
      </address>
    </author>
    <date month="June" year="2022" day="16"/>
    <area>Operations</area>
    <keyword>BASE45</keyword>
    <abstract>
      <t>
	This document describes the Base45 encoding scheme which is
	built upon the Base64, Base32 and Base16 encoding schemes.
      </t>
    </abstract>
  </front>
  <middle>
    <section anchor="intro" title="Introduction">
      <t>
	A QR-code is used to encode text as a graphical
	image. Depending on the characters used in the text various
	encoding options for a QR-code exist, e.g. Numeric,
	Alphanumeric and Byte mode. Even in Byte mode a typical
	QR-code reader tries to interpret a byte sequence as a UTF-8
	or ISO/IEC 8859-1 encoded text. Thus, QR-codes cannot be used
	to encode arbitrary binary data directly. Such data has to be
	converted into an appropriate text before that text could be
	encoded as a QR-code.  Compared to already established Base64,
	Base32 and Base16 encoding schemes, that are described in
	<xref target="RFC4648">RFC 4648</xref>, the Base45 scheme
	described in this document offer a more compact QR-code
	encoding.
      </t>
      <t>
	One important difference from those others and Base45 is the
	key table and that the padding with '=' is not required.
      </t>
    </section>
    <section title="Conventions Used in This Document">
      <t>
	The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
	NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT
	RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
	interpreted as described in BCP 14 <xref target="RFC2119" />
	<xref target="RFC8174" /> when, and only when, they appear in
	all capitals, as shown here.
      </t>
    </section>
    <section title="Interpretation of Encoded Data">
      <t>
	Encoded data is to be interpreted as described in <xref
	target="RFC4648">RFC 4648</xref> with the exception that a
	different alphabet is selected.
      </t>
    </section>
    <section title="The Base45 Encoding">
      <t>
	QR codes have a limited ability to store binary data. In
	practice binary data have to be encoded in characters
	according to one of the modes already defined in the standard
	for QR codes. The easiest mode to use in called Alphanumeric
	mode (see section 7.3.4 and Table 2 of <xref
	target="ISO18004">ISO/IEC 18004:2015</xref>).  Unfortunately
	Alphanumeric mode uses 45 different characters which implies
	neither Base32 nor Base64 are very effective encodings.
      </t>
      <t>
	A 45-character subset of US-ASCII is used; the 45 characters
	usable in a QR code in Alphanumeric mode (see section 7.3.4
	and Table 2 of <xref target="ISO18004">ISO/IEC
	18004:2015</xref>). Base45 encodes 2 bytes in 3 characters,
	compared to Base64, which encodes 3 bytes in 4 characters.
      </t>
      <t>
	For encoding, two bytes [a, b] MUST be interpreted as a number
	n in base 256, i.e. as an unsigned integer over 16 bits so
	that the number n = (a*256) + b.
      </t>
      <t>
	This number n is converted to base 45 [c, d, e] so that n = c
	+ (d*45) + (e*45*45). Note the order of c, d and e which are
	chosen so that the left-most [c] is the least significant.
      </t>
      <t>
	The values c, d and e are then looked up in Table 1 to produce
	a three character string. The process is reversed when
	decoding.
      </t>
      <t>
	For encoding a single byte [a], it MUST be interpreted as a
	base 256 number, i.e. as an unsigned integer over 8 bits. That
	integer MUST be converted to base 45 [c d] so that a = c +
	(45*d). The values c and d are then looked up in Table 1 to
	produce a two character string.
      </t>
      <t>
	A byte string [a b c d ... x y z] with arbitrary content and
	arbitrary length MUST be encoded as follows: From left to
	right pairs of bytes MUST be encoded as described above. If the
	number of bytes is even, then the encoded form is a string
	with a length which is evenly divisible by 3. If the number of
	bytes is odd, then the last (rightmost) byte MUST be encoded
	on two characters as described above.
      </t>
      <t>
	For decoding a Base45 encoded string the inverse operations
	are performed.
      </t>
      <section title="When to, and not to, use Base45">
	<t>
	  If binary data is to be stored in a QR-Code, the suggested
	  mechanism is to use the Alphanumeric mode that uses 11 bits
	  for 2 characters as defined in section 7.3.4 in <xref
	  target="ISO18004">ISO/IEC 18004:2015</xref>. The ECI mode
	  indicator for this encoding is 0010.
	</t>
	<t>
	  On the other hand if the data is to be sent via some other
	  transport, a transport encoding suitable for that transport
	  should be used instead of Base45. For example, it is not
	  recommended to first encode data in Base45 and then encode
	  the resulting string in Base64 if the data is to be sent via
	  email. Instead, the Base45 encoding should be removed, and
	  the data itself should be encoded in Base64.
	</t>
      </section>
      <section title="The alphabet used in Base45">
	<t>
	  The Alphanumeric mode is defined to use 45 characters as specified
	  in this alphabet.
	</t>
	<t>
	  <figure><artwork>
               Table 1: The Base45 Alphabet

Value Encoding  Value Encoding  Value Encoding  Value Encoding
   00 0            12 C            24 O            36 Space
   01 1            13 D            25 P            37 $
   02 2            14 E            26 Q            38 %
   03 3            15 F            27 R            39 *
   04 4            16 G            28 S            40 +
   05 5            17 H            29 T            41 -
   06 6            18 I            30 U            42 .
   07 7            19 J            31 V            43 /
   08 8            20 K            32 W            44 :
   09 9            21 L            33 X
   10 A            22 M            34 Y
   11 B            23 N            35 Z
          </artwork></figure>
	</t>
      </section>
      <section title="Encoding examples">
	<t>
	  It should be noted that although the examples are all text,
	  Base45 is an encoding for binary data where each octet can
	  have any value 0-255.
	</t>
	<t>
	  Encoding example 1: The string "AB" is the byte sequence [65
	  66]. The 16 bit value is 65 * 256 + 66 = 16706. 16706 equals
	  11 + 45 * 11 + 45 * 45 * 8, so the sequence in base 45 is [11
	  11 8]. By looking up these values in the Table 1 we get the
	  encoded string "BB8".
	</t>
	<t>
	  Encoding example 2: The string "Hello!!" as ASCII is the
	  byte sequence [72 101 108 108 111 33 33]. If we look at each
	  16 bit value, it is [18533 27756 28449 33]. Note the 33 for
	  the last byte. When looking at the values in base 45, we get
	  [[38 6 9] [36 31 13] [9 2 14] [33 0]] where the last byte is
	  represented by two.  The resulting string "%69 VD92EX0" is
	  created by looking up these values in Table 1. It should be
	  noted it includes a space.
	</t>
	<t>
	  Encoding example 3: The string "base-45" as ASCII is the
	  byte sequence [98 97 115 101 45 52 53]. If we look at each
	  16 bit value, it is [25185 29541 11572 53]. Note the 53 for
	  the last byte. When looking at the values in base 45, we get
	  [[30 19 12] [21 26 14] [7 32 5] [8 1]] where the last byte
	  is represented by two.  By looking up these values in the
	  Table 1 we get the encoded string "UJCLQE7W581".
	</t>
      </section>
      <section title="Decoding examples">
	<t>
	  Decoding example 1: The string "QED8WEX0" represents, when
	  looked up in Table 1, the values [26 14 13 8 32 14 33 0]. We
	  arrange the numbers in chunks of three, except for the last
	  one which can be two, and get [[26 14 13] [8 32 14] [33 0]].
	  In base 45 we get [26981 29798 33] where the bytes are [[105
	  101] [116 102] [33]].  If we look at the ASCII values we get
	  the string "ietf!".
	</t>
      </section>
    </section>
    <section title="IANA Considerations">
      <t>
	There are no considerations for IANA in this document.
      </t>
    </section>
    <section title="Security Considerations">
      <t>
	When implementing encoding and decoding it is important to be
	very careful so that buffer overflow or similar does not
	occur.  This of course includes the calculations in base 45
	and lookup in the table of characters (Table 1). A decoder
	must also be robust regarding input, including proper handling
	of any octet value 0-255, including the NUL character (ASCII
	0).
      </t>
      <t>
	It should be noted that Base64 and some other encodings pad
	the string so that the encoding starts with an aligned number
	of characters while Base45 specifically avoids padding. Because of
	this, special care has to be taken when odd number of octets
	are to be encoded. Similarly, care must be taken if the number
	of characters to decode are not evenly divisible by 3.
      </t>
      <t>
	Base encodings use a specific, reduced alphabet to encode
	binary data. Non-alphabet characters could exist within
	base-encoded data, caused by data corruption or by
	design. Non-alphabet characters may be exploited as a "covert
	channel", where non-protocol data can be sent for nefarious
	purposes. Non-alphabet characters might also be sent in order
	to exploit implementation errors leading to, e.g., buffer
	overflow attacks.
      </t>
      <t>
	Implementations MUST reject any input that is not a valid
	encoding. For example, it MUST reject the input (encoded data)
	if it contains characters outside the base alphabet (in Table
	1) when interpreting base-encoded data.
      </t>
      <t>
	Even though  a Base45 encoded string  contains only characters
	from the alphabet in Table 1,  cases like the following has to
	be considered: The string "FGW" represents 65535 (FFFF in base
	16),  which  is  a  valid  encoding of  16  bits.  A  slightly
	different  encoded string  of  the same  length, "GGW",  would
	represent 65536  (10000 in base  16), which is  represented by
	more  than  16  bits.  Implementations MUST  also  reject  the
	encoded data  if it  contains a  triplet of  characters which,
	when decoded, results in an  unsigned integer which is greater
	than 65535 (ffff in base 16).
      </t>
      <t>
	It should be noted that the resulting string after encoding to
	Base45 might include non-URL-safe characters so if the URL
	including the Base45 encoded data has to be URL safe, one
	has to use %-encoding.
      </t>
    </section>
    <section title="Acknowledgements">
      <t>
	The authors thank Mark Adler, Anders Ahl, Alan Barrett, Sam
	Spens Clason, Alfred Fiedler, Tomas Harreveld, Erik Hellman,
	Joakim Jardenberg, Michael Joost, Erik Kline, Christian
	Landgren, Anders Lowinger, Mans Nilsson, Jakob Schlyter, Peter
	Teufl and Gaby Whitehead for the feedback. Also, everyone that
	have been working with Base64 over a long period of years and
	have proven the implementations are stable.
      </t>
    </section>
  </middle>
  <back>
    <references title='Normative References'>
      &RFC4648;
      &RFC2119;
      &RFC8174;
      <reference anchor="ISO18004">
	<front>
          <title>
	    ISO/IEC 18004:2015 Information technology - Automatic
	    identification and data capture techniques - QR Code bar
	    code symbology specification
	  </title>
          <author>
            <organization>ISO/IEC JTC 1/SC 31</organization>
          </author>
          <date month="February" year="2015" />
        </front>
        <seriesInfo name="ISO/IEC 18004:2015"
                    value="https://www.iso.org/standard/62021.html" />
      </reference>
    </references>
  </back>
</rfc>
