Bypassing DOMPurify with good old XML

Bypassing DOMPurify with good old XML

Introduction 介绍

Hello, I’m RyotaK ( @ryotkak ), a security engineer at Flatt Security Inc.
大家好,我是 RyotaK ( @ryotkak ),Flatt Security Inc. 的安全工程师。

Recently, @slonser_ found a bypass in the DOMPurify when it’s used to sanitize XML documents. After taking a look at the patch, I found two more bypasses of XML/HTML confusion, so I’m documenting it here.
最近,@slonser_ 在 DOMPurify 中发现一个用于清理 XML 文档的绕过。在查看了补丁后,我发现了另外两个 XML/HTML 混淆的绕过,所以我在这里记录它。


As @slonser_ wrote in his post, HTML and XML have a bit different parsing rules.
正如 @slonser_ 在他的帖子中所写的,HTML 和 XML 的解析规则略有不同。

For example, the following text is parsed as a single node in the XML parser, but the HTML parser recognizes the h1 tag.
例如,以下文本在 XML 分析器中分析为单个节点,但 HTML 分析器可识别该 h1 标记。

<?xml-stylesheet ><h1>Hello</h1>)"> ?>

This is because XML defines the structure of Processing Instructions as the following:
这是因为 XML 将处理指令的结构定义为:

'<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'

However, HTML enters the bogus comment state when it encounters <?:
但是,HTML 在遇到以下情况时会进入虚假注释状态 <? :

    This is an unexpected-question-mark-instead-of-tag-name parse error. Create a comment token whose data is the empty string. Reconsume in the bogus comment state.

Because the bogus comment state uses > instead of ?> for the end token, there is a mismatch between how the HTML parser and XML parser parse the Processing Instructions.
由于虚假注释状态使用 > 而不是 ?> 结束令牌,因此 HTML 解析器和 XML 解析器解析处理指令的方式不匹配。

   Switch to the data state. Emit the current comment token.

Due to this difference, injecting the Processing Instructions allows the sanitizer bypass if the sanitized XML document is later used in the HTML document.
由于这种差异,如果稍后在 HTML 文档中使用了经过清理的 XML 文档,则注入处理指令允许绕过清理程序。

And as DOMPurify didn’t scan for the Processing Instructions, @slonser_ managed to bypass the sanitizer by inserting the following payload:
由于 DOMPurify 没有扫描处理指令,@slonser_通过插入以下有效负载设法绕过清理程序:

<?xml-stylesheet > <img src=x onerror="alert('DOMPurify bypassed!!!')"> ?>

Taking a look at the patch

To process the Processing Instructions properly, DOMPurify applied the following patch:
为了正确处理处理指令,DOMPurify 应用了以下修补程序:

diff --git a/src/purify.js b/src/purify.js
index 4594ba09..5b7bc2aa 100644
--- a/src/purify.js
+++ b/src/purify.js
@@ -909,7 +909,10 @@ function createDOMPurify(window = getGlobal()) {
       root.ownerDocument || root,
       // eslint-disable-next-line no-bitwise
-      NodeFilter.SHOW_ELEMENT | NodeFilter.SHOW_COMMENT | NodeFilter.SHOW_TEXT,
+      NodeFilter.SHOW_ELEMENT |
+        NodeFilter.SHOW_COMMENT |
+        NodeFilter.SHOW_TEXT |

As the NodeFilter.SHOW_PROCESSING_INSTRUCTION option is specified, DOMPurify is now properly scanning the Processing Instruction and removing it if it’s not allowed. So, what could be wrong with this patch?
指定该 NodeFilter.SHOW_PROCESSING_INSTRUCTION 选项后,DOMPurify 现在正在正确扫描处理指令,并在不允许时将其删除。那么,这个补丁可能有什么问题呢?

Confusing nodeName 令人困惑的 nodeName

It turns out, that the Processing Instruction returns the value specified in the <?tag as the nodeName.
事实证明,Processing Instruction 返回 中指定的 <?tag 值。 nodeName

The nodeName getter steps are to return the first matching statement, switching on the interface this implements:
    Its target.

For example, tag will be returned when accessing the nodeName property of Processing Instruction that can be represented as <?tag ?>.
例如, tag 在访问可表示为 <?tag ?> 的 Processing Instruction nodeName 属性时,将返回 。

Because DOMPurify highly depends on the nodeName of nodes to determine whether the node is allowed, this causes confusion when sanitizing the node:
由于 DOMPurify 高度依赖于 nodeName 节点来确定是否允许节点,因此在清理节点时会导致混淆:

src/purify.js line 992-1013
SRC/purify.js 第 992-1013 行

    /* Now let's check the element's type and name */
    const tagName = transformCaseFunc(currentNode.nodeName);
    /* Remove element if anything forbids its presence */
    if (!ALLOWED_TAGS[tagName] || FORBID_TAGS[tagName]) {

Bypassing DOMPurify with Processing Instructions again
再次绕过 DOMPurify with Processing Instructions

We can use the arbitrary nodeName with Processing Instructions, so what we have to do is create Processing Instructions with an allowed tag name.
我们可以将任意 nodeName 与 Processing Instructions 一起使用,因此我们要做的是创建具有允许标签名称的 Processing Instructions。

For example, the following Processing Instructions bypass the DOMPurify when sanitized as the XML document:
例如,以下处理指令在作为 XML 文档清理时会绕过 DOMPurify:

<?img a ?>

As we saw earlier, HTML and XML have inconsistent parsing for the Processing Instructions.
正如我们之前所看到的,HTML 和 XML 对处理指令的解析不一致。

So, by using the following XML, we can bypass the DOMPurify and execute alert(1) if it’s later used in the HTML document:
因此,通过使用以下 XML,我们可以绕过 DOMPurify 并在以后在 HTML 文档中使用它时执行 alert(1) 它:

<?img ><img src onerror=alert(1)>?>

You can confirm it by using the following script with DOMPurify 3.0.10:
您可以通过在 DOMPurify 3.0.10 中使用以下脚本来确认它:

document.documentElement.innerHTML = DOMPurify.sanitize("<?img ><img src onerror=alert(1)>?>", {PARSER_MEDIA_TYPE: "application/xhtml+xml"})

Hunting for the another bypass

To prevent the issue mentioned above, the following patch is applied to remove all Processing Instructions.

diff --git a/src/purify.js b/src/purify.js
index 061ba1a8..1d984685 100644
--- a/src/purify.js
+++ b/src/purify.js
@@ -1009,6 +1009,12 @@ function createDOMPurify(window = getGlobal()) {
       return true;

+    /* Remove any ocurrence of processing instructions */
+    if (currentNode.nodeType === 7) {
+      _forceRemove(currentNode);
+      return true;
+    }
     /* Remove element if anything forbids its presence */
     if (!ALLOWED_TAGS[tagName] || FORBID_TAGS[tagName]) {
       /* Check if we have a custom element to handle */

As it completely removes Processing Instructions, it’s no longer possible to use the parser inconsistency of the Processing Instructions.

But, are there any other inconsistent parsing?

After reading the specification of XML, I noticed that there is an interesting section:

CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters that would otherwise be recognized as markup. CDATA sections begin with the string " <![CDATA[ " and end with the string " ]]> "

Luckily for me, the CDATA section has a separate NodeFilter option, which wasn’t enabled on DOMPurify.
幸运的是,CDATA 部分有一个单独的 NodeFilter 选项,该选项在 DOMPurify 上未启用。

  const unsigned long SHOW_CDATA_SECTION = 0x8;

So, what I had to do was find the inconsistency between XML and HTML parsers.

At first glance, the HTML parser seems to be parsing a CDATA section in a way compatible with XML:
乍一看,HTML 解析器似乎以与 XML 兼容的方式解析 CDATA 部分:

CDATA sections must consist of the following components, in this order:
   1. The string "<![CDATA[".
   2. Optionally, text, with the additional restriction that the text must not contain the string "]]>".
   3. The string "]]>".

However, upon further investigation, it turns out HTML only supports the CDATA section inside the SVG and MathML namespace, and not in the HTML namespace.
然而,经过进一步的调查,事实证明 HTML 只支持 SVG 和 MathML 命名空间中的 CDATA 部分,而不支持 HTML 命名空间中的 CDATA。

The string "[CDATA[" (the five uppercase letters "CDATA" with a U+005B LEFT SQUARE BRACKET character before and after)
    Consume those characters. If there is an adjusted current node and it is not an element in the HTML namespace, then switch to the CDATA section state. Otherwise, this is a cdata-in-html-content parse error. Create a comment token whose data is the "[CDATA[" string. Switch to the bogus comment state.

If the CDATA section appears in the HTML namespace, it switches to the bogus comment state, which uses > instead of ]]> for the end token.
如果 CDATA 部分出现在 HTML 命名空间中,它将切换到虚假注释状态,该状态使用 > instead 而不是 ]]> for 结束标记。

   Switch to the data state. Emit the current comment token.

So, similar to the Processing Instructions, the following XML creates the h1 tag when parsed with an HTML parser:
因此,与处理说明类似,以下 XML 在使用 HTML 解析器解析时创建 h1 标记:

<![CDATA[ ><h1>Hello</h1> ]]>

As with the Processing Instructions, this inconsistency allows the DOMPurify bypass with the following payload:
与处理说明一样,这种不一致允许 DOMPurify 绕过以下有效负载:

<![CDATA[ ><img src onerror=alert(1)> ]]>

You can confirm it by using the following script with DOMPurify 3.0.11:
您可以通过在 DOMPurify 3.0.11 中使用以下脚本来确认它:

document.documentElement.innerHTML = DOMPurify.sanitize("<![CDATA[ ><img src onerror=alert(1)> ]]>", {PARSER_MEDIA_TYPE: "application/xhtml+xml"})

To fix this inconsistency, DOMPurify applied the following patch:
为了解决这种不一致,DOMPurify 应用了以下修补程序:

diff --git a/src/purify.js b/src/purify.js
index 1d984685..72c925a0 100644
--- a/src/purify.js
+++ b/src/purify.js
@@ -913,7 +913,8 @@ function createDOMPurify(window = getGlobal()) {
       NodeFilter.SHOW_ELEMENT |
         NodeFilter.SHOW_COMMENT |
         NodeFilter.SHOW_TEXT |
+        NodeFilter.SHOW_CDATA_SECTION,

Since the CDATA section has #cdata-section as the nodeName, this patch can’t be bypassed in the way that I did for the Processing Instructions, unless #cdata-section is explicitly allowed.
由于 CDATA 部分具有 #cdata-section nodeName ,因此无法像我在处理指令中那样绕过此修补程序,除非 #cdata-section 明确允许。

The nodeName getter steps are to return the first matching statement, switching on the interface this implements:

About us 关于我们

Flatt Security Inc. provides security assessment services. We are willing to have offers from overseas. If you have any questions, please contact us at .
Flatt Security Inc.提供安全评估服务。我们愿意接受来自海外的报价。如果您有任何问题,请通过 与我们联系。

Thank you for reading this article.

原文始发于RyotaK:Bypassing DOMPurify with good old XML

版权声明:admin 发表于 2024年4月3日 下午9:30。
转载请注明:Bypassing DOMPurify with good old XML | CTF导航