<?xml version='1.0' ?>
<!DOCTYPE rfc SYSTEM 'rfc2629.dtd' [
  <!ENTITY rfc.2119 PUBLIC '' 'reference.RFC.2119.xml'>
  <!ENTITY rfc.2629 PUBLIC '' 'reference.RFC.2629.xml'>
  <!ENTITY rfc.2046 PUBLIC '' 'reference.RFC.2046.xml'>
  <!ENTITY rfc.2047 PUBLIC '' 'reference.RFC.2047.xml'>
  <!ENTITY rfc.2231 PUBLIC '' 'reference.RFC.2231.xml'>
  <!ENTITY rfc.1867 PUBLIC '' 'reference.RFC.1867.xml'>
  <!ENTITY rfc.2183 PUBLIC '' 'reference.RFC.2183.xml'>
  <!ENTITY rfc.2388 PUBLIC '' 'reference.RFC.2388.xml'>
  <!ENTITY rfc.3986 PUBLIC '' 'reference.RFC.3986.xml'>
  <!ENTITY rfc.5987 PUBLIC '' 'reference.RFC.5987.xml'>
  <!ENTITY rfc.6838 PUBLIC '' 'reference.RFC.6838.xml'>
  <!ENTITY I-D.file PUBLIC '' 'reference.I-D.draft-ietf-appsawg-file-scheme-00.xml'>
  <!ENTITY w3c.html32 PUBLIC '' 'reference.W3C.REC-html32-19970114.xml'>
  <!ENTITY w3c.html5  PUBLIC '' 'reference.W3C.REC-html5-20141028.xml'>

  ]>
  
<rfc ipr='trust200902' docName='draft-ietf-appsawg-multipart-form-data-09'
     obsoletes='2388' category='std'
     >
  <?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
  <?rfc toc='yes' ?>
  <?rfc symrefs='yes' ?>
  <?rfc sortrefs='yes'?>
  <?rfc iprnotified='no' ?>
  <?rfc strict='yes' ?>
  <front>
    <title abbrev='multipart/form-data'>Returning Values from Forms:
    multipart/form-data</title>
    <author initials='L.'
	    surname='Masinter'
	    fullname='Larry Masinter'>
      <organization>Adobe</organization>
      <address>
	<email>masinter@adobe.com</email>
	<uri>http://larry.masinter.net</uri>
      </address>
    </author>
    <date/>
    
    <area>Applications</area>
    <workgroup>APPSAWG</workgroup>
    
    <abstract>
      <t>
	This specification (re)defines the multipart/form-data
	Internet Media Type, which can be used by a wide variety of
	applications and transported by a wide variety of protocols as
	a way of returning a set of values as the result of a user
	filling out a form.  It obsoletes RFC 2388.
      </t>
    </abstract>
  </front>
  <middle>
    <section title='Introduction'>
      <t>
	In many applications, it is possible for a user to be
	presented with a form. The user will fill out the form,
	including information that is typed, generated by user input,
	or included from files that the user has selected. When the
	form is filled out, the data from the form is sent from the
	user to the receiving application.
      </t>
      <t>
	The definition of <spanx
	style='verb'>multipart/form-data</spanx> is derived from one
	of those applications, originally set out in <xref
	target='RFC1867'/> and subsequently incorporated into <xref
	target='W3C.REC-html32-19970114'>HTML 3.2</xref>, where forms
	are expressed in HTML, and in which the form data is sent
	via HTTP or electronic mail. This representation is widely
	implemented in numerous web browsers and web servers.
      </t>
      <t>
	However, <spanx style='verb'>multipart/form-data</spanx> is
	also used for forms that are presented using representations
	other than HTML (spreadsheets, PDF, etc.), and for transport
	using means other than electronic mail or HTTP; it is used in
	distributed applications which do not involve forms at all, or
	do not have users filling out the form. For this reason, this
	document defines a general syntax and semantics independent of
	the application for which it is used, with specific rules for
	web applications noted in context.
      </t>

      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in BCP 14, RFC 2119 <xref
      target="RFC2119"></xref>.</t>
      
    </section>
    <section title="percent-encoding option" anchor="percenthex">
      <t>Within this specification, "percent-encoding" (as defined
      in <xref target="RFC3986"/>) 
      is offered as a possible way of encoding 
      characters in file names that are otherwise
      disallowed, including non-ASCII characters, spaces,
      control characters and so forth.
      The encoding is
      created replacing each non-ASCII or disallowed character
      with a sequence, where each byte of the UTF-8 encoding
      of the character is represented by a percent-sign (%)
      followed by the (case-insensitive) hexadecimal of that byte.
      
      </t>
    </section>


    <section title="Advice for Forms and Form Processing">
      <t>
	The representation and interpretation of forms and the nature
	of form processing is not specified by this document.
	However, for forms and form-processing that result in
	generation of multipart/form-data, some suggestions are
	included.
      </t>
      <t>
	In a form, there is generally a sequence of fields, where each
	field is expected to be supplied with a value, e.g. by a user
	who fills out the form.  Each field has a name.  After a form
	has been filled out, and the form's data is "submitted": the
	form processing results in a set of values for each field--
	the "form data".
      </t>
      <t>
	In forms that work with multipart/form-data, field names could
	be arbitrary Unicode strings; however, restricting field names
	to ASCII will help avoid some interoperability issues
	(see <xref target='non-ascii-fields'/>). 
      </t>
      <t>
	Within a given form, ensuring field names are unique is also
	helpful.  Some fields may have default values or presupplied
	values in the form itself. Fields with presupplied values
	might be hidden or invisible; this allows using generic
	processing for form data from a variety of actual forms.
      </t>
    </section>
    <section title='Definition of multipart/form-data'>
      <t>
	The media-type <spanx style='verb'>multipart/form-data</spanx>
	follows the model of multipart MIME data streams as
	specified in <xref target='RFC2046'/> Section 5.1; changes are
	noted in this document.
      </t>
      <t>
	A <spanx style='verb'>multipart/form-data</spanx> body
	contains a series of parts, separated by a boundary.
      </t>

      <section title="Boundary parameter of multipart/form-data">
	<t>
	  As with other multipart types, the parts are delimited with
	  a boundary delimiter, constructed using CRLF, "--", the value of the
	  boundary parameter.  The boundary is supplied as a "boundary"
	  parameter to the <spanx style='verb'>multipart/form-data</spanx>
	  type. As noted in <xref target='RFC2046'/> Section
	  5.1, the boundary delimiter MUST NOT appear inside any
	  of the encapsulated parts, and it is often necessary to
	  enclose the boundary
	  parameter values in quotes on the Content-type line.
	</t>
      </section>
      <section title='Content-Disposition header for each part'>
	<t>
	  Each part MUST contain a <spanx
	  style='verb'>content-disposition</spanx> header
	  <xref target='RFC2183'/> and where the
	  disposition type is <spanx style='verb'>form-data</spanx>.
	  The <spanx style='verb'>content-disposition</spanx> header
	  MUST also contain an additional parameter of <spanx
	  style='verb'>name</spanx>; the value of the <spanx
	  style='verb'>name</spanx> parameter is the original field
	  name from the form (possibly encoded; see <xref
	  target='non-ascii-fields'/>).  For example, a part might
	  contain a header:

      <figure><artwork>
        Content-Disposition: form-data; name="user"
      </artwork></figure>

      with the body of the part containing the form data of the 
      <spanx style='verb'>user</spanx> field.
	</t>
      </section>

      <section title="filename attribute of content-distribution part header">
	<t>
	  For form data that represents the content of a file, a
	  name for the file SHOULD be supplied as well, by using a
	  <spanx style='verb'>filename</spanx> parameter of the <spanx
	  style='verb'>content-disposition</spanx> header.  The file
	  name isn't mandatory for cases where the file name isn't
	  available or is meaningless or private; this might result,
	  for example, from 
	  selection or drag-and-drop or where the form data content
	  is streamed directly from a device. 
	</t>
	<t>If a filename parameter is supplied, the requirements
	of <xref target="RFC2183"/> Section 2.3 for "receiving MUA"
	apply to recievers of <spanx style='verb'>multipart/form-data</spanx>
	as well: Do not use the file name blindly, check and
	possibly change to match local filesystem conventions if
	applicable, do not use directory path information that
	may seems to be present. </t>
	<t>
	  In most multipart types, the MIME headers in each part are
	  restricted to US-ASCII; for compatibility with those
	  systems, file names normally visible to users MAY be encoded
	  using the percent-encoding method in <xref
	  target="percenthex"/>, following how a "file:" URI <xref target="I-D.ietf-appsawg-file-scheme"/> might be encoded.
	  
	</t>
	<t>NOTE: The encoding method described in <xref target="RFC5987"/>,
	which would add a "filename*" paramter to the "Content-Disposition" header, MUST NOT be used. 
	</t>

	<t>
	  Some commonly deployed systems use multipart/form-data with
	  file names directly encoded including octets outside the
	  US-ASCII range. The encoding used for the file names is
	  typically UTF-8, although HTML forms will use the charset
	  associated with the form.
	</t>
      </section>
	  
      <section title="Multiple files for one form field">
	<t>The form data for a form field might include
	multiple files.</t>
	<t><xref target='RFC2388'/> suggested that multiple files
	for a single form field be transmitted using
	a nested multipart/mixed part. This usage is
	deprecated.
	</t>
	<t>
	  To match widely deployed implementations, multiple files
	  MUST be sent by supplying each file in a separate part,
	  but all with the same <spanx style='verb'>name</spanx>
	  parameter.
	</t>
	<t>
	  Receiving applications intended for wide applicability (e.g.
	  multipart/form-data parsing libraries) SHOULD also support
	  the older method of supplying multiple files.
	</t>
      </section>
	
      <section title='Content-Type header for each part'>
	<t>
	  Each part MAY have an (optional) <spanx
	  style='verb'>content-type</spanx>, which defaults to <spanx
	  style='verb'>text/plain</spanx>.  If the contents of a file
	  are to be sent, the file data SHOULD be labeled with an
	  appropriate media type, if known, or <spanx
	  style='verb'>application/octet-stream</spanx>.
	</t>

      </section>
      <section title="The charset parameter for text/plain form data">
	<t>
	  In the case where the form data is text, the charset parameter
	  for the <spanx style='verb'>text/plain</spanx> Content-Type
	  MAY be used to indicate the character encoding used in that
	  part.  For example, a form with a text field in which a user
	  typed "Joe owes &lt;eu&gt;100" where &lt;eu&gt; is the Euro
	  symbol might have form data returned as:
	</t>
	<figure><artwork>
    --AaB03x
    content-disposition: form-data; name="field1"
    content-type: text/plain;charset=UTF-8
    content-transfer-encoding: quoted-printable
      
    Joe owes =E2=82=AC100.
    --AaB03x
</artwork></figure>


	<t>In practice, many widely deployed implementations do not
	supply a charset parameter in each part, but, rather, they rely
	on the notion of a "default charset" for a multipart/form-data
	instance. Subsequent sections will explain how the default
	charset is established. </t>
      </section>
	
      <section title="The _charset_ field for default charset">

        <t>Some form processing applications (including HTML) have the
        convention that the value of a form entry with entry name
        <spanx style='verb'>_charset_</spanx> and type <spanx
        style='verb'>hidden</spanx> is automatically set when the form
        is opened; the value is used as the default charset of text
        field values (see form-charset in <xref target="form-charset"
        />).  In such cases, the value of the default charset for each
        text/plain part without a charset parameter is the supplied
        value. For example:</t>
  
  <figure><artwork>
    --AaB03x
    content-disposition: form-data; name="_charset_"
      
    iso-8859-1
    --AaB03x--
    content-disposition: form-data; name="field1"
    
    ...text encoded in iso-8859-1 ...
    AaB03x--
  </artwork></figure>

      </section>

      <section title="Content-Transfer-Encoding deprecated"
	       anchor='cte'>

	<t>Previously, it was recommended that senders use a <spanx
	style='verb'>Content-Transfer-Encoding</spanx> encoding (such
	as <spanx style='verb'>quoted-printable</spanx>) for each
	non-ASCII part of a multipart/form-data body, because that
	would allow use in transports that only support a <spanx
	style='verb'>7BIT</spanx> encoding. This use is deprecated for
	use in contexts that support binary data such as HTTP. Senders
	SHOULD NOT generate any parts with a <spanx
	style='verb'>Content-Transfer-Encoding</spanx> header.</t>

	<t>Currently, no deployed implementations that send such
	bodies have been discovered.</t>

      </section>
	  
      <section title="Other Content- headers">
	<t>The <spanx style='verb'>multipart/form-data</spanx> media
	type does not support any MIME headers in the parts other than
	Content-Type, Content-Disposition, and (in limited
	circumstances) Content-Transfer-Encoding. Other headers
	MUST NOT be included and MUST be ignored.
	</t>
      </section>
    </section>
	    
    <section title="Operability considerations">

      <section title="Non-ASCII field names and values"
	       anchor="non-ascii-fields">
	<t>
	  
	  Normally, MIME headers in multipart bodies are required to
	  consist only of 7-bit data in the US-ASCII character set.
	  While <xref target="RFC2388"/> suggested that non-ASCII
	  field names be encoded according to the method in
	  <xref target="RFC2047"/>, this practice doesn't seem to have been
	  followed widely.
	</t>

	<t>This specification makes three sets of recommendations 
	for three different states of workflow.</t>

	<section title="Avoid non-ASCII field names">
          <t>For broadest interoperability with existing deployed
          software, those creating forms SHOULD avoid non-ASCII field
          names. This should not be a burden, because in general the
          field names are not visible to users. The field names
	  in the underlying need not match what the user
	  sees on the screen.</t>
	  <t>If non-ASCII field names are unavoidable, form
	  or application creators SHOULD use UTF-8 uniformly.
	  This will minimize interoperability problems.
	  </t>
        </section>
	
	<section title="Interpreting forms and creating form-data"
		 anchor="form-charset">
	  <t>Some applications of this specification will supply a
	  character encoding to be used for interpretation of the
	  multipart/form-data body. In particular, <xref
	  target="W3C.REC-html5-20141028">HTML 5</xref> uses:
	  <list style='symbols'>
	    <t>The content of a '_charset_' field, if
	    there is one.</t>
	    <t>the value of an accept-charset attribute of
	    the &lt;form&gt; element, if there is one,</t>
	    <t>the character encoding of the document containing
	    the form, if it is US-ASCII compatible,</t>
	    <t>otherwise UTF-8.</t>
	  </list>
	  
	  Call this value the form-charset. Any text,
	  whether field name, field value, or (text/plain)
	  form data which is uses characters outside
	  the ASCII range MAY be represented directly
	  encoded in the form-charset.
	  </t>
	</section>
	    
	<section title="Parsing and interpreting form data">
	  <t> 
            While this specification provides guidance for creation of
            multipart/form-data, parsers and interpreters 
            should be aware of the variety of implementations. File
            systems differ as to whether and how they normalize
            Unicode names, for example.  The matching of form elements
            to form-data parts may rely on a fuzzier match.  In
            particular, some multipart/form-data generators might have
            followed the previous advice of <xref target="RFC2388"/>
            and used the <xref target="RFC2047"/> "encoded-word"
            method of encoding non-ASCII values:

	    <figure><artwork>
 encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
 </artwork></figure>
          </t>
	  <t>
            Others have been known to follow
            <xref target="RFC2231"/>, to send unencoded UTF-8, or even
            strings encoded in the form-charset.
	  </t>
          <t>
	    For this reason, interpreting <spanx
	    style='verb'>multipart/form-data</spanx> (even from
	    conforming generators) may require knowing the charset
	    used in form encoding, in cases where the _charset_ field
	    value or a charset parameter of a text/plain Content-Type
	    header is not supplied.

          </t>
	</section>
      </section>
      <section title="Ordered fields and duplicated field names">
	<t>
	  Form processors given forms with a well-defined ordering
	  SHOULD send back results in order (note that there
	  are some forms which 	  do not define a natural order.)
	  Intermediaries MUST NOT
	  reorder the results.
	  Form parts with identical field names MUST NOT
	  be coalesced. 
	</t>
      </section>	
      <section title="Interoperability with web applications">
	<t>
	  Many web applications use the "application/x-url-encoded"
	  method for returning data from forms. This format is quite
	  compact, e.g.:
	</t>
	<figure><artwork>
   name=Xavier+Xantico&amp;verdict=Yes&amp;colour=Blue&amp;happy=sad&amp;Utf%F6r=Send
</artwork></figure>
	<t>
	  However, there is no opportunity to label the enclosed
	  data with content type, apply a charset, or use other
	  encoding mechanisms.
	</t>
	<t>
	  Many form-interpreting programs (primarily web browsers)
	  now implement and generate multipart/form-data, but an
	  existing application might need to optionally support both
	  the application/x-url-encoded format as well.
	</t>
      </section>

      <section title="Correlating form data with the original form">
      <t>
	This specification provides no specific mechanism by which
	multipart/form-data can be associated with the form that
	caused it to be transmitted. This separation is intentional;
	many different forms might be used for transmitting the same
	data. In practice, applications may supply a specific form
	processing resource (in HTML, the ACTION attribute in a FORM
	tag) for each different form.  Alternatively, data about the
	form might be encoded in a "hidden field" (a field which is
	part of the form but which has a fixed value to be transmitted
	back to the form-data processor.)
      </t>
      </section>
    </section>

    <section title="IANA Considerations">
      <t>
	Please update the Internet Media Type registration of
	multipart/form-data to point to this document, using the
	template in <xref target='template'/>.  In addition, please
	update the registrations of the "name" and "form-data"
	parameters in the "Content Disposition Parameters" registry to
	both point to this document.
      </t>
    </section>

    <section title="Security Considerations" anchor="security">
      <t>
	Applications which receive forms and process them must be
        careful not to supply data back to the requesting form
        processing site that was not intended to be sent.
      </t>
      <t>
	It is important when interpreting the filename of the
	Content-Disposition header to not overwrite files in the
	recipient's file space inadvertently.
      </t>
      <t>
	User applications that request form information from users
	must be careful not to cause a user to send information to the
	requestor or a third party unwillingly or unwittingly. For
	example, a form might request 'spam' information to be sent to
	an unintended third party, or private information to be sent
	to someone that the user might not actually intend. While this
	is primarily an issue for the representation and
	interpretation of forms themselves (rather than the data
	representation of the form data),  the
	transportation of private information must be done in a way
	that does not expose it to unwanted prying.
      </t>
      <t>
	With the introduction of form-data that can reasonably send
	back the content of files from a user's file space, the
	possibility arises that a user might be sent an automated script that
	fills out a form and then sends one of the user's local files to
	another address. Thus, additional caution is required
	when executing automated scripting where form-data might
	include a user's files.
      </t>
      <t>
	Files sent via multipart/form-data may contain arbitrary
	executable content, and precautions against malicious
	content are necessary.
      </t>
      <t>
	The considerations of <xref target="RFC2183"/> Sections 2.3 and 5
	with respect to the filename parameter of the Content-Disposition
	header also apply to its usage here. 
      </t>
      <t>
      	All form processing software should treat user supplied
      	form-data with sensitivity, as it often contains confidential
      	or personally identifying information. There is widespread use
      	of form "auto-fill" features in web browsers; these might be
      	used to trick users to unknowingly send confidential
      	information when completing otherwise innoccuous tasks.
      	Multipart/form-data does not supply any features for checking
      	integrity, ensuring confidentiality, avoiding user confusion,
      	or other security features; those concerns must be addressed
      	by the form-filling and form-data-interpreting applications.
      </t>
    </section>

    <section title="Media type registration for multipart/form-data" anchor="template">
      <t>
      This section is the <xref target="RFC6838"/> media type registration.
	<list style='hanging'>
	<t hangText='Type name:'>
	  multipart
	</t>
	<t hangText='Subtype name:'>
	  form-data
	</t>
	<t hangText='Required parameters:'>
	  boundary
	</t>
	<t hangText='Optional parameters:'>
	  none
	</t>
	<t hangText='Encoding considerations:'>
	  Common use is BINARY. 
	  <vspace />
	  In limited use (or transports that restrict the encoding to
	  7BIT or 8BIT each part is encoded separately using
	  Content-Transfer-Encoding <xref target='cte'/>.
	</t>
	<t hangText='Security considerations:'>
	  See <xref target="security"/> of this document.
	</t>

	<t hangText='Interoperability considerations:'>
	  This document makes several recommendations for
	  interoperability with deployed implementations, including
	  <xref target='cte'/>.
	</t>
	<t hangText='Published specification:'>
	  This document.
	</t>
	<t hangText='Applications that use this media type:'>
	  Numerous web browsers, servers, and web applications.
	</t>
	<t hangText='Fragment identifier considerations:'>
	  None: Fragment identifiers are not defined for this type.
	</t>
	<t hangText='Additional information:'>
	  None: no deprecated alias names, magic numbers, file extensions
	  or Macintosh ssssfile type codes.
	</t>
	<t hangText='Person &amp; email address to contact
		     for further information'>
	  Author of this document.
	</t>
	<t hangText='Intended Usage:'>
	  COMMON
	</t>
	<t hangText='Restrictions on usage:'>
	  none
	</t>
	<t hangText='Author:'>
	  Author of this document.
	</t>
	<t hangText='Change controller:'>
	  IETF
	</t>
	<t hangText='Provisional registration:'>
	  N/A
	</t>
	</list>
      </t>
    </section>
  </middle>
  <back>
    <references title="Normative References">
      &rfc.2119;
      &rfc.2046;
      &rfc.2047;
      &rfc.2231;
      &rfc.2183;
      &rfc.3986;
      
    </references>
    <references title="Informative References">
      &rfc.1867;
      &rfc.2388;
      &rfc.5987;
      &rfc.6838;
      &I-D.file;
      &w3c.html5;
      &w3c.html32;
    </references>
    <section title="Changes from RFC 2388">
      <t>The handling of non-ASCII field names changed-- no longer
      recommending the RFC 2047 method, instead suggesting 
      senders send UTF-8 field names directly, and file names
      directly in the form-charset.
      </t>
      <t>
	  The handling of multiple files submitted as the result of a
	  single form field (e.g. HTML's &lt;input type=file multiple&gt;
	  element) results in each file having its own top level part with
	  the same name parameter; the method of using a nested
	  <spanx style='verb'>multipart/mixed</spanx>
	  from <xref target="RFC2388"/> is no
	  longer recommended for creators, and not required
	  for receivers as there are
	  no known implementations of senders.
	</t>
	<t>
	  The _charset_ convention and use of an explicit form-data
	  charset is documented.
	</t>
	<t>
	'boundary' is a required parameter in Content-Type.
	</t>
	<t>
	  The relationship of the ordering of fields within a form and
	  the ordering of returned values within multipart/form-data was
	  not defined before, nor was the handling of the case where a
	  form has multiple fields with the same name.
	</t>
	<t>
	  Editorial:
	  Removed obsolete discussion of alternatives in
	  appendix. 
	  Update references. Move outline of form processing
          into Introduction.
	</t>
    </section>

    <section title="Alternatives">
	
      <t>There are numerous alternative ways in which form data can be
      encoded; many are listed in <xref target="RFC2388"/> section
      5.2.  The multipart/form-data encoding is verbose, especially if
      there are many fields with short values. In most use cases, this
      overhead isn't significant. </t>
      <t>More problematic is the ambiguity introduced because
      implementations did not follow <xref target="RFC2388"/>
      because it used "may" instead of "MUST" when specifying
      encoding of field names, and for other unknown reasons,
      so now, parsers need to be more complex for fuzzy matching
      against the possible outputs of various encoding methods.
      </t>
    </section>
  </back>
</rfc>
       
