CVE-2024-20697: WINDOWS LIBARCHIVE REMOTE CODE EXECUTION VULNERABILITY

In this excerpt of a Trend Micro Vulnerability Research Service vulnerability report, Guy Lederfein and Jason McFadyen of the Trend Micro Research Team detail a recently patched remote code execution vulnerability in Microsoft Windows. This bug was originally discovered by the Microsoft Offensive Research & Security Engineering team. Successful exploitation could result in arbitrary code execution in the context of the application using the vulnerable library. The following is a portion of their write-up covering CVE-2024-20697, with a few minimal modifications.
在趋势科技漏洞研究服务漏洞报告的摘录中,趋势科技研究团队的 Guy Lederfein 和 Jason McFadyen 详细介绍了 Microsoft Windows 中最近修补的远程代码执行漏洞。这个错误最初是由Microsoft Offensive Research & Security Engineering团队发现的。成功利用此漏洞可导致使用易受攻击的库在应用程序上下文中执行任意代码。以下是他们撰写的有关 CVE-2024-20697 的部分内容,并进行了一些最小的修改。


An integer overflow vulnerability exists in the Libarchive library included in Microsoft Windows. The vulnerability is due to insufficient bounds checks on the block length of a RARVM filter used for Intel E8 preprocessing, included in the compressed data of a RAR archive.
Microsoft Windows 中包含的 Libarchive 库中存在一个整数溢出漏洞。该漏洞是由于对用于英特尔 E8 预处理的 RARVM 过滤器的块长度的边界检查不足,该过滤器包含在 RAR 存档的压缩数据中。

A remote attacker could exploit this vulnerability by enticing a target user into extracting a crafted RAR archive. Successful exploitation could result in arbitrary code execution in the context of the application using the vulnerable library.
远程攻击者可利用此漏洞,诱使目标用户提取构建的 RAR 存档。成功利用此漏洞可导致使用易受攻击的库在应用程序上下文中执行任意代码。

The Vulnerability 漏洞

The RAR file format supports data compression, error recovery, and multiple volume spanning. Several versions of the RAR format exist: RAR1.3, RAR1.5, RAR2, RAR3, and the most recent version, RAR5. Different compression and decompression algorithms are used for different versions of RAR.
RAR 文件格式支持数据压缩、错误恢复和多卷跨越。RAR 格式存在多个版本:RAR1.3、RAR1.5、RAR2、RAR3 和最新版本 RAR5。不同的压缩和解压缩算法用于不同版本的 RAR。

The following describes the RAR format used by versions 1.5, 2.x, and 3.x. A RAR archive consists of a series of variable-length blocks.
1.5、2.x、3.x版本使用的RAR格式如下。RAR 存档由一系列可变长度的块组成。

Each block begins with a header. The following table is the common structure of a RAR block header:
每个块都以标头开头。下表是 RAR 块标头的通用结构:

“}” data-block-type=”22″ data-immersive-translate-walked=”3b908ee4-7508-40cb-b8d2-6671f0719fa2″>

The RarBlock Marker is the first block of a RAR archive and serves as the signature of a RAR formatted file:
RarBlock 标记是 RAR 存档的第一个块,用作 RAR 格式文件的签名:

“}” data-block-type=”22″ data-immersive-translate-walked=”3b908ee4-7508-40cb-b8d2-6671f0719fa2″>

This block always contains the following byte sequence at the beginning of every RAR file:
此块始终在每个 RAR 文件的开头包含以下字节序列:

           0x52 0x61 0x72 0x21 0x1A 0x07 0x00 (ASCII: "Rar!\x1A\x07\x00")

The ArcHeader is the second block in a RAR file and has the following structure:
ArcHeader 是 RAR 文件中的第二个块,其结构如下:

“}” data-block-type=”22″ data-immersive-translate-walked=”3b908ee4-7508-40cb-b8d2-6671f0719fa2″>

The ArcHeader block is followed by one or more FileHeader blocks. These blocks have the following structure:
ArcHeader 块后跟一个或多个 FileHeader 块。这些块具有以下结构:

“}” data-block-type=”22″ data-immersive-translate-walked=”3b908ee4-7508-40cb-b8d2-6671f0719fa2″>

Note that the above offsets are relative to the existence of the optional fields.
请注意,上述偏移量与可选字段的存在有关。

The EndBlock block will signify the end of the RAR archive. This block has the following structure:
EndBlock 块将表示 RAR 存档的结束。此块具有以下结构:

“}” data-block-type=”22″ data-immersive-translate-walked=”3b908ee4-7508-40cb-b8d2-6671f0719fa2″>

For each FileHeader block in the RAR archive, if the Method field is not set to “Store” (0x30), then the Data field will contain the compressed file data. The method of decompression depends on the RAR version used to compress the data. The RAR version needed to extract the compressed data is recorded in the UnpVer field of the FileHeader block.
对于 RAR 存档中的每个 FileHeader 块,如果 Method 字段未设置为 “Store” ( 0x30 ),则 Data 字段将包含压缩文件数据。解压缩方法取决于用于压缩数据的 RAR 版本。提取压缩数据所需的 RAR 版本记录在 FileHeader 块的 UnpVer 字段中。

Of relevance to this report is the RAR extraction method used by RAR format version 2.9 (a.k.a. RAR4), which is used when the UnpVer field is set to 29. The compressed data may be compressed either using the Lempel-Ziv (LZ) algorithm or using Prediction by Partial Matching (PPM) compression. This report will not describe in full detail the extraction algorithm, but only summarize the relevant parts for understanding the vulnerability. For a reference implementation of the extraction algorithm, see the Unpack::Unpack29() function in the UnRAR source code.
与此报告相关的是 RAR 格式版本 2.9(又名 RAR4)使用的 RAR 提取方法,当 UnpVer 字段设置为 29 时使用该方法。可以使用 Lempel-Ziv (LZ) 算法或使用部分匹配预测 (PPM) 压缩来压缩压缩压缩数据。本报告不会对提取算法进行全面详细描述,而只是总结了了解漏洞的相关部分。有关提取算法的参考实现,请参阅 UnRAR 源代码中的 Unpack::Unpack29() 函数。

When the libarchive library attempts to extract the contents of a file from a RAR archive, if the file data is compressed (i.e. the Method field is not set to “Store”), the function read_data_compressed() will be called to extract the compressed data. The compressed data is composed of multiple blocks, each of which can be compressed using the LZ algorithm (denoted by the first bit of the block set to 0) or using PPM compression (denoted by the first bit of the block set to 1). Initially, the function parse_codes() will be called to decode the tables necessary to extract the file data. If a block of data compressed using the LZ algorithm is encountered, the expand() function will be called to decompress the data. In the expand() function, symbols are read from the compressed data by calling read_next_symbol() in a loop. In the function read_next_symbol(), the symbol will be decoded according to the Huffman table decoded in function parse_codes().
当 libarchive 库尝试从 RAR 存档中提取文件的内容时,如果文件数据被压缩(即 Method 字段未设置为“Store”),则将调用该函数 read_data_compressed() 来提取压缩数据。压缩数据由多个块组成,每个块都可以使用 LZ 算法(由设置为 0 的块的第一位表示)或使用 PPM 压缩(由设置为 1 的块的第一位表示)进行压缩。最初,将调用该函数 parse_codes() 来解码提取文件数据所需的表。如果遇到使用 LZ 算法压缩的数据块,将调用该 expand() 函数来解压缩数据。在函数中 expand() ,通过循环调用 read_next_symbol() 从压缩数据中读取符号。在函数 read_next_symbol() 中,符号将根据函数 parse_codes() 中解码的霍夫曼表进行解码。

If the decoded symbol is 257, the function read_filter() will be called to read a RARVM filter, which has the following structure:
如果解码的符号为 257,则将调用该函数 read_filter() 来读取具有以下结构的 RARVM 过滤器:

“}” data-block-type=”22″ data-immersive-translate-walked=”3b908ee4-7508-40cb-b8d2-6671f0719fa2″>

Note that the above offsets are relative to the existence of the optional fields.
请注意,上述偏移量与可选字段的存在有关。

The calculation of the size of the Code field is as follows: If the lowest 3 bits of the Flags field (will be referred to as LENGTH) are less than 6, the code size is (LENGTH + 1). If LENGTH is set to 6, the code size is (LengthExt1 + 7). If LENGTH is set to 7, the code size is (LengthExt1 << data-preserve-html-node=”true” 8) | LengthExt2. After the code length is calculated and the code itself is copied into a buffer, the code, its length, and the filter flags are sent to the parse_filter() function to parse the code section.
Code 字段大小的计算如下:如果 Flags 字段的最低 3 位(称为 LENGTH)小于 6,则代码大小为 (LENGTH + 1)。如果 LENGTH 设置为 6,则代码大小为 (LengthExt1 + 7)。如果 LENGTH 设置为 7,则代码大小为 (LengthExt1 << data-preserve-html-node=“true” 8) |长度Ext2。计算代码长度并将代码本身复制到缓冲区后,代码、其长度和筛选器标志将发送到 parse_filter() 函数以分析代码部分。

Within the code section, numbers are parsed by calling the function membr_next_rarvm_number(). This function reads 2 bits, and according to their value, determines how many bits to read to parse the value. If the first 2 bits are 0, 4 value bits will be read; if they are 1, 8 value bits will be read; if they are 2, 16 value bits will be read; and if they are 3, 32 value bits will be read.
在代码部分中,通过调用函数 membr_next_rarvm_number() 来解析数字。此函数读取 2 位,并根据它们的值确定要读取多少位来解析值。如果前 2 位为 0,则读取 4 个值位;如果它们为 1,则将读取 8 个值位;如果它们是 2,则将读取 16 个值位;如果它们是 3,则将读取 32 个值位。

Function parse_filter() will parse the code section, which has the following structure:
函数 parse_filter() 将解析代码部分,该部分具有以下结构:

“}” data-block-type=”22″ data-immersive-translate-walked=”3b908ee4-7508-40cb-b8d2-6671f0719fa2″>

Note that if the READ_REGISTERS flag is not set, the registers will be initialized, such that the 5th register is set to the block length, which is either read from the code section (if the READ_BLOCK_LENGTH flag is set), or carried over from the block length of the previous filter.
请注意,如果未设置READ_REGISTERS标志,则将初始化寄存器,以便将第 5 个寄存器设置为块长度,该长度要么从代码部分读取(如果设置了 READ_BLOCK_LENGTH 标志),要么从前一个滤波器的块长度继承。

After these fields are parsed in parse_filter(), the ByteCode field and its length are sent to the function compile_program(). In this function, the first byte of the bytecode is verified to be equal to the XOR of all other bytes in the bytecode. If true, it will set the fingerprint field of the rar_program_code struct to the value of the CRC-32 algorithm run on the full bytecode, combined with the bytecode length shifted left 32 bits.
解析这些字段后 parse_filter() ,字节码字段及其长度被发送到函数 compile_program() 。在此函数中,验证字节码的第一个字节等于字节码中所有其他字节的 XOR。如果为 true,它将 rar_program_code 结构体的指纹字段设置为在全字节码上运行的 CRC-32 算法的值,并结合字节码长度向左移动 32 位。

Back in the function parse_filter(), after all fields are calculated for the filter, therar_filter struct will be initialized by calling create_filter() with the rar_program_code struct containing the fingerprint field and the register values calculated. These values will be set to the prog field and the initialregisters fields of the rar_filter struct, respectively.
回到函数 parse_filter() 中,在为过滤器计算完所有字段后,将通过调用 create_filter() 包含指纹字段和计算的寄存器值的 rar_program_code 结构来初始化 rar_filter 结构。这些值将分别设置为 rar_filter 结构的 prog 字段和 initialregisters 字段。

Once processing of the filter is done, function run_filters() is called to run the parsed filter. This function initializes the vm field of the rar_filters struct with a structure of type rar_virtual_machine. This structure contains a registers field, which is an array of 8 integers, and a memory field of size 0x40004. Then, each filter is executed by calling execute_filter(). If the fingerprint field of the rar_program_code struct associated with the executed filter is equal to either 0x35AD576887 or 0x393CD7E57E, the execute_filter_e8() function is called. This function reads the block length from the 5th field of the initialregisters array. Then, a loop is run for replacing instances of 0xE8 and/or 0xE9 within the VM memory, with the block length used as the loop exit condition.
筛选器处理完成后,将调用函数 run_filters() 来运行已分析的筛选器。此函数使用 rar_virtual_machine 类型的结构初始化 rar_filters 结构的 vm 字段。此结构包含一个寄存器字段,该字段是一个由 8 个整数组成的数组,以及一个大小 0x40004 为 的内存字段。然后,通过调用 execute_filter() 来执行每个过滤器。如果与执行的过滤器关联的 rar_program_code 结构的指纹字段等于 0x35AD576887 或 0x393CD7E57E ,则调用该 execute_filter_e8() 函数。此函数从 initialregisters 数组的第 5 个字段读取块长度。然后,运行一个循环来替换 VM 内存中的 0xE8 实例和/或 0xE9 实例,并将块长度用作循环退出条件。

An integer overflow vulnerability exists in the Libarchive library included in Microsoft Windows. The vulnerability is due to insufficient bounds checks on the block length of a RARVM filter used for Intel E8 preprocessing, included in the compressed data of a RAR archive. Specifically, if the archive contains a RARVM filter whose fingerprint field is calculated as either 0x35AD576887 or 0x393CD7E57E, it will be executed by calling execute_filter_e8(). If the 5th register of the filter is set to a block length of 4, the loop condition in this function, which is set to the block length minus 5, will overflow to 0xFFFFFFFF. Since the VM memory has a size of 0x40004, this will result in memory accesses that are out of the bounds of the heap-based buffer representing the VM memory.
Microsoft Windows 中包含的 Libarchive 库中存在一个整数溢出漏洞。该漏洞是由于对用于英特尔 E8 预处理的 RARVM 过滤器的块长度的边界检查不足,该过滤器包含在 RAR 存档的压缩数据中。具体来说,如果存档包含一个 RARVM 过滤器,其指纹字段的计算为 or 0x35AD576887 0x393CD7E57E ,则将通过调用 execute_filter_e8() 来执行。如果滤波器的第 5 个寄存器设置为块长度 4,则此函数中的循环条件(设置为块长度减去 5)将溢出到 0xFFFFFFFF 。由于 VM 内存的大小为 0x40004 ,这将导致内存访问超出表示 VM 内存的基于堆的缓冲区的边界。

A remote attacker could exploit this vulnerability by enticing a target user into extracting a crafted RAR archive, containing a RARVM filter that has its 5th register set to 4. Successful exploitation could result in arbitrary code execution in the context of the application using the vulnerable library.
远程攻击者可利用此漏洞,诱使目标用户提取构建的 RAR 存档,其中包含将其第 5 个寄存器设置为 4 的 RARVM 筛选器。成功利用此漏洞可导致使用易受攻击的库在应用程序上下文中执行任意代码。

Notes: 笔记:

• All multi-byte integers are in little-endian byte order.
• 所有多字节整数均按小端字节顺序排列。

• All offsets and sizes are in bytes unless otherwise specified.
• 除非另有说明,否则所有偏移量和大小均以字节为单位。

• Since there is no official documentation of the RAR4 format, the description is based on the UnRAR and libarchive source code. Field names are either copied from source code or given based on functionality.
• 由于没有 RAR4 格式的官方文档,因此描述基于 UnRAR 和 libarchive 源代码。字段名称要么从源代码中复制,要么根据功能给出。

Detection Guidance 检测指南

To detect an attack exploiting this vulnerability, the detection device must monitor and parse traffic on the common ports where a RAR archive might be sent, such as FTP, HTTP, SMTP, IMAP, SMB, and POP3.
若要检测利用此漏洞的攻击,检测设备必须监视和解析可能发送 RAR 存档的公共端口上的流量,例如 FTP、HTTP、SMTP、IMAP、SMB 和 POP3。

The detection device must look for the transfer of RAR files and be able to parse the RAR file format. Currently, there is no official documentation of the RAR file format. This detection guidance is based on the source code for extracting RAR archives provided by the UnRAR program and the libarchive library.
检测设备必须查找 RAR 文件的传输并能够解析 RAR 文件格式。目前,没有 RAR 文件格式的官方文档。本检测指南基于 UnRAR 程序和 libarchive 库提供的用于提取 RAR 存档的源代码。

The common structure of a RAR block header is detailed above. The detection device must first look for a RarBlock Marker, which is the first block of a RAR archive and serves as the signature of a RAR formatted file:
上面详细介绍了 RAR 块标头的通用结构。检测设备必须首先查找 RarBlock 标记,该标记是 RAR 存档的第一个块,用作 RAR 格式文件的签名:

“}” data-block-type=”22″ data-immersive-translate-walked=”3b908ee4-7508-40cb-b8d2-6671f0719fa2″>

The detection device can identify this block by looking for the following byte sequence:
检测设备可以通过查找以下字节序列来识别此块:

          0x52 0x61 0x72 0x21 0x1A 0x07 0x00 ("Rar!\x1A\x07\x00")

If found, the device must then identify the ArcHeader, which is the second block in a RAR file and is detailed above. The ArcHeader block is followed by one or more FileHeader blocks, whose structure is also detailed above. Note that the above offsets are relative to the existence of the optional fields.
如果找到,设备必须标识 ArcHeader,这是 RAR 文件中的第二个块,如上所述。ArcHeader 块后跟一个或多个 FileHeader 块,其结构也详见上文。请注意,上述偏移量与可选字段的存在有关。

The detection device must parse each FileHeader block and inspect its Method field. If the value of the Method field is greater than 0x30, the detection device must inspect the Data field of the FileHeader block, containing the compressed file data. The compressed data may be compressed either using the Lempel-Ziv (LZ) algorithm or using Prediction by Partial Matching (PPM) compression. This detection guidance will not describe in full detail the extraction algorithm. For a reference implementation of the extraction algorithm, see the Unpack::Unpack29() function in the UnRAR source code.
检测设备必须分析每个 FileHeader 块并检查其 Method 字段。如果 Method 字段的值大于 0x30 ,则检测设备必须检查 FileHeader 块的 Data 字段,其中包含压缩文件数据。可以使用 Lempel-Ziv (LZ) 算法或使用部分匹配预测 (PPM) 压缩来压缩压缩压缩数据。本检测指南不会详细介绍提取算法。有关提取算法的参考实现,请参阅 UnRAR 源代码中的 Unpack::Unpack29() 函数。

The compressed data is composed of multiple blocks, each of which can be compressed using the LZ algorithm (denoted by the first bit of the block set to 0) or using PPM compression (denoted by the first bit of the block set to 1). The detection device must extract each block according to the algorithm used to compress it. If a block compressed using the LZ algorithm is encountered, the detection device must decode the Huffman tables from the beginning of the compressed data. The detection device must then iterate over the remaining compressed data and decode each symbol based on the generated Huffman tables. If the symbol 257 is encountered, the following data must be parsed as a RARVM filter, which has the following structure:
压缩数据由多个块组成,每个块都可以使用 LZ 算法(由设置为 0 的块的第一位表示)或使用 PPM 压缩(由设置为 1 的块的第一位表示)进行压缩。检测设备必须根据用于压缩的算法提取每个块。如果遇到使用 LZ 算法压缩的块,则检测设备必须从压缩数据的开头对霍夫曼表进行解码。然后,检测设备必须遍历剩余的压缩数据,并根据生成的霍夫曼表对每个符号进行解码。如果遇到符号 257,则必须将以下数据解析为 RARVM 过滤器,该过滤器具有以下结构:

“}” data-block-type=”22″ data-immersive-translate-walked=”3b908ee4-7508-40cb-b8d2-6671f0719fa2″>

Note that the above offsets are relative to the existence of the optional fields.
请注意,上述偏移量与可选字段的存在有关。

The detection device must then calculate the size of the Code field. The calculation of the size of the Code field is as follows: If the lowest 3 bits of the Flags field (will be referred to as LENGTH) are less than 6, the code size is (LENGTH + 1). If LENGTH is set to 6, the code size is (LengthExt1 + 7). If LENGTH is set to 7, the code size is (LengthExt1 << data-preserve-html-node=”true” 8) | LengthExt2. After the size of the Code field is calculated, the Code field must be parsed according to the following structure:
然后,检测设备必须计算“代码”字段的大小。Code 字段大小的计算如下:如果 Flags 字段的最低 3 位(称为 LENGTH)小于 6,则代码大小为 (LENGTH + 1)。如果 LENGTH 设置为 6,则代码大小为 (LengthExt1 + 7)。如果 LENGTH 设置为 7,则代码大小为 (LengthExt1 << data-preserve-html-node=“true” 8) |长度Ext2。计算“代码”字段的大小后,必须按照以下结构对“代码”字段进行解析:

“}” data-block-type=”22″ data-immersive-translate-walked=”3b908ee4-7508-40cb-b8d2-6671f0719fa2″>

All numerical fields within this structure (FilterNumBlockStartBlockLength, register values, and ByteCodeLen) must be read according to the algorithm implemented in the RarVM::ReadData() function of the UnRAR source code. The algorithm reads 2 bits of data, signifying the number of bits of data containing the numerical value. Note that some of the fields in this structure are optional and depend on flags set in the Flags field of the RARVM filter structure.
此结构中的所有数值字段(FilterNum、BlockStart、BlockLength、寄存器值和 ByteCodeLen)都必须根据 UnRAR 源代码 RarVM::ReadData() 函数中实现的算法进行读取。该算法读取 2 位数据,表示包含数值的数据位数。请注意,此结构中的某些字段是可选的,并且依赖于 RARVM 筛选器结构的 Flags 字段中设置的标志。

After extracting all necessary fields, the detection device must check for the following conditions:
提取所有必要字段后,检测设备必须检查以下情况:

• The CRC-32 checksum of the ByteCode field is 0xAD576887 and the ByteCodeLen field is 0x35 OR the CRC-32 checksum of the ByteCode field is 0x3CD7E57E and the ByteCodeLen field is 0x39.
• ByteCode 字段的 CRC-32 校验和 为 0xAD576887 ,ByteCodeLen 字段为 0x35 U,或者 ByteCode 字段 0x3CD7E57E 的 CRC-32 校验和 0x39 为 。

• The READ_REGISTERS flag is set and the value of the 5th register of the Registers field is set to 4 OR the READ_BLOCK_LENGTH flag is set and the value of the BlockLength field is set to 4. If both these conditions are met, the traffic should be considered suspicious. An attack exploiting this vulnerability is likely underway.
• 设置READ_REGISTERS标志,并将 Registers 字段的第 5 个寄存器的值设置为 4 U,或者设置 READ_BLOCK_LENGTH 标志,并将 BlockLength 字段的值设置为 4 。如果同时满足这两个条件,则应将流量视为可疑流量。利用此漏洞的攻击可能正在进行中。

Notes: 笔记:

• All multi-byte integers are in little-endian byte order.
• 所有多字节整数均按小端字节顺序排列。

• All offsets and sizes are in bytes unless otherwise specified.
• 除非另有说明,否则所有偏移量和大小均以字节为单位。

Conclusion 结论

Microsoft patched this vulnerability in January 2024 and assigned it CVE-2024-20697. While they did not recommend any mitigating factors, there are some additional measures you can take to help protect from this bug being exploited. This includes not extracting RAR archive files from untrusted sources and filtering traffic using the guidance provided in the section “Detection Guidance” section of this blog. Still, it is recommended to apply the vendor patch to completely address this issue.
Microsoft 于 2024 年 1 月修补了此漏洞,并将其分配为 CVE-2024-20697。虽然他们没有建议任何缓解因素,但您可以采取一些其他措施来帮助防止此错误被利用。这包括不从不受信任的来源中提取 RAR 存档文件,以及使用本博客的“检测指南”部分中提供的指南过滤流量。尽管如此,还是建议应用供应商补丁来完全解决此问题。

原文始发于 Trend Micro Research Team:CVE-2024-20697: WINDOWS LIBARCHIVE REMOTE CODE EXECUTION VULNERABILITY

版权声明:admin 发表于 2024年4月21日 下午2:46。
转载请注明:CVE-2024-20697: WINDOWS LIBARCHIVE REMOTE CODE EXECUTION VULNERABILITY | CTF导航

相关文章