Blackbox-Fuzzing of IoT Devices Using the Router TL-WR902AC as Example

IoT 2个月前 admin
155 0 0

All files created in context of this term paper are also published in full on GitHub and can be accessed using the following URL: otsmr/blackbox-fuzzing.
在本学期论文上下文中创建的所有文件也完整地发布在 GitHub 上,可以使用以下 URL 访问:otsmr/blackbox-fuzzing。

Introduction 介绍

Fuzzing has become “one of the most effective ways” of finding bugs in software. With this or similar claims, many current fuzzing-related papers start [google-scholar]. The main goal of our last term paper about the topic “Internet of Vulnerable Things” was to find a memory-related bug and then write an exploit for this vulnerability. We were able to find a vulnerability by reversing the firmware, but no memory related bugs were found. Finding a buffer overflow by reversing a binary by hand is not only time-consuming, but also requires a lot of experience. Fuzzing at the same time aims to be the “most effective way” to find such memory related vulnerabilities. Google, for example, introduced OSS-Fuzz, which continuously fuzzes open source software and has already found over 10,000 vulnerabilities across 1,000 projects [oss-fuzz].
模糊测试已成为发现软件中错误的“最有效方法之一”。有了这个或类似的主张,目前许多与模糊测试相关的论文都开始了[google-scholar]。我们上学期关于“易受攻击的物联网”主题的论文的主要目标是找到一个与内存相关的错误,然后为这个漏洞编写一个漏洞。我们能够通过反转固件来发现漏洞,但未发现与内存相关的错误。通过手动反转二进制文件来查找缓冲区溢出不仅耗时,而且需要大量经验。同时,模糊测试旨在成为查找此类内存相关漏洞的“最有效方法”。例如,谷歌推出了 OSS-Fuzz,它不断对开源软件进行模糊测试,并且已经在 1,000 个项目中发现了 10,000 多个漏洞 [oss-fuzz]。

The goal of this term paper is again to find a memory-related vulnerability, but this time by using fuzzing. The goal vulnerability should be exploitable over the network without knowledge of the admin credentials. This paper describes the way to achieve this goal. For this, the paper is separated into two parts. The first part focuses on how to find a potent target, which tools can be used, and what a good fuzzing target should consist of. The second part then describes how to develop and debug a harness that is able to fuzz a specific function in a binary. Then the developed harness is used by AFL++ to fuzz the target function. In the following, a short background is given and what the current state of the art is when it comes to IoT device fuzzing.
本学期论文的目标再次是找到与内存相关的漏洞,但这次是通过模糊测试。目标漏洞应可在不了解管理员凭据的情况下通过网络利用。本文介绍了实现这一目标的方法。为此,论文分为两部分。第一部分重点介绍如何找到一个有效的目标,可以使用哪些工具,以及一个好的模糊目标应该由什么组成。然后,第二部分介绍如何开发和调试能够在二进制文件中模糊化特定函数的工具。然后,AFL++ 使用开发出的线束对目标函数进行模糊处理。在下文中,给出了一个简短的背景,以及物联网设备模糊测试的当前技术水平。

State of the Art
最先进的技术

Fuzzing IoT devices is not as easy as fuzzing an open source project. Often the source code is proprietary, which makes gray-box fuzzing, which instruments the source code for the best fuzzing performance, impossible [afl-persistent]. Also, the CPU architecture is often not natively supported by fuzzers which requires an emulator like QEMU [qemu] which also slows down the fuzzing speed [afl-persistent]. Another issue is the hardware peripherals, which complicates the development of a general approach. The paper “Embedded Fuzzing: A Review of Challenges, Tools, and Solutions” [embedded-fuzzing] gives an overview of different fuzzing strategies, like hardware-based embedded fuzzing. Most of these strategies need the source code of the target program, like when porting the fuzzers source code, like AFL, to ARM-based IoT devices to run the fuzzer on the IoT hardware. Running the fuzzer on the device’s hardware also has performance problems because they often have low-level CPUs, which are slower than normal desktop CPUs. Another approach presented in this paper is emulation-based embedded fuzzing. Where either a single targeted program is executed in an emulator to perform coverage-guided fuzzing or the full system.
对物联网设备进行模糊测试并不像对开源项目进行模糊测试那么容易。通常源代码是专有的,这使得灰盒模糊测试(它检测源代码以获得最佳模糊测试性能)是不可能的 [afl-persistent]。此外,模糊器通常不支持 CPU 架构,这需要像 QEMU [qemu] 这样的模拟器,这也减慢了模糊测试速度 [afl-persistent]。另一个问题是硬件外设,这使得通用方法的开发变得复杂。论文“嵌入式模糊测试:挑战、工具和解决方案回顾”[嵌入式模糊测试]概述了不同的模糊测试策略,例如基于硬件的嵌入式模糊测试。这些策略中的大多数都需要目标程序的源代码,例如将模糊器源代码(如 AFL)移植到基于 ARM 的 IoT 设备以在 IoT 硬件上运行模糊器时。在设备硬件上运行模糊器也存在性能问题,因为它们通常具有低级 CPU,这些 CPU 比普通台式机 CPU 慢。本文介绍的另一种方法是基于仿真的嵌入式模糊测试。在仿真器中执行单个目标程序以执行覆盖引导的模糊测试或整个系统。

The above-mentioned approaches all target a binary directly by using an emulator or by instrumenting the source code. These approaches require a fuzzing setup that must often be specifically crafted for a single IoT device and are hard to generalize. For that, researchers created a program IoTFuzzer which aims to be an automated fuzzing framework aiming to “finding memory corruption vulnerabilities without access to their firmware images [iotfuzzer].” IoTFuzzers based on the observation that most IoT devices have a mobile app to control them, and such apps contain information about the protocol used to communicate with the device. The program then identifies and reuses program-specific logic to mutate the test cases to effectively test IoT targets [iotfuzzer].
上述方法都通过使用仿真器或检测源代码直接针对二进制文件。这些方法需要模糊测试设置,该设置通常必须专门针对单个 IoT 设备进行设计,并且难以推广。为此,研究人员创建了一个程序 IoTFuzzer ,旨在成为一个自动模糊测试框架,旨在“在不访问其固件映像 [iotfuzzer] 的情况下查找内存损坏漏洞”。 IoTFuzzers 基于大多数物联网设备都有一个移动应用程序来控制它们的观察,并且此类应用程序包含有关用于与设备通信的协议的信息。然后,程序识别并重用特定于程序的逻辑来改变测试用例,以有效地测试物联网目标 [iotfuzzer]。

Background 背景

Harness 利用

A harness describes a sequence of API calls processing the fuzzer provided inputs. On the contrary to a normal application, which often does not need a harness, a library that implements reusable functions must be called with the correct parameters and also in the right sequence, so the state between multiple shared function calls can be called. Randomly fuzzing the library without building the state machine is unlikely to be successful and will, in contrast, create a lot of false-positive crashes when the library dependencies are not enforced. This can happen when, for example, a buffer size check is skipped by the fuzzer resulting in a spurious buffer overflow.
工具描述处理模糊器提供的输入的一系列 API 调用。与通常不需要工具的普通应用程序相反,实现可重用函数的库必须使用正确的参数和正确的顺序调用,以便可以调用多个共享函数调用之间的状态。在不构建状态机的情况下随机模糊测试库不太可能成功,相反,当不强制执行库依赖项时,会造成大量误报崩溃。例如,当模糊器跳过缓冲区大小检查导致虚假缓冲区溢出时,可能会发生这种情况。

In this paper, normal applications will be fuzzed, but because of the hardware dependencies of the use of sockets and multi-threading, we need to create a harness for them as well. The harness is loaded in the context of the binary and can call internal functions of the targeted program, as shown in Code 10.
在本文中,普通应用程序将被模糊处理,但由于使用套接字和多线程的硬件依赖性,我们也需要为它们创建一个工具。工具加载到二进制文件的上下文中,可以调用目标程序的内部函数,如代码 10 所示。

Corpus 语料库

The term “corpus” describes valid input samples or test cases and serves as a foundational reference for generating new input data during the fuzzing process. In Code 10 this would be, for example, an HTTP request. Fuzzers then leverage this corpus to create mutated or diversified test cases, aiding in the detection of software vulnerabilities through the exploration of various input scenarios.
术语“语料库”描述了有效的输入样本或测试用例,并作为在模糊测试过程中生成新输入数据的基本参考。例如,在代码 10 中,这将是一个 HTTP 请求。然后,模糊测试器利用该语料库创建变异或多样化的测试用例,通过探索各种输入场景来帮助检测软件漏洞。

Finding a potent target
寻找有效目标

The most time-consuming part of black box fuzzing is finding a potential vulnerable function in the firmware. The first step is to find interesting binaries that, for example, are accessible over the network, use insecure functions or do not have security features like stack canary enabled, which is buffer overflow protection. Our last paper ([iovt]) already described how to extract the firmware from the targeted router and how to find a potentially dangerous binary. For this, the tool EMBA [emba] was used. EMBA ranks all binaries found in the firmware by the count of unsecure functions like strcpy, network access, and security protection like stack canary or the NX-Bit which become interesting when exploiting a buffer overflow, which can be found in Code 1.
黑匣子模糊测试最耗时的部分是在固件中查找潜在的易受攻击的功能。第一步是找到有趣的二进制文件,例如,可以通过网络访问、使用不安全的功能或没有启用堆栈金丝雀等安全功能,这是缓冲区溢出保护。我们的上一篇文章([iovt])已经描述了如何从目标路由器中提取固件以及如何找到具有潜在危险的二进制文件。为此,使用了EMBA [emba]工具。EMBA根据不安全功能(如 strcpy 网络访问和安全保护(如堆栈金丝雀或NX-Bit)的数量对固件中发现的所有二进制文件进行排名,这些功能在利用缓冲区溢出时变得有趣,可以在代码1中找到。

[+] STRCPY - top 10 results:
 235   : libcmm.so       : common linux file: no  |  No RELRO  |  No Canary  |  NX disabled  |  No Symbols  |  No Networking |
 77    : wscd            : common linux file: no  |  No RELRO  |  No Canary  |  NX disabled  |  No Symbols  |  Networking    |
 [snip]
 28    : httpd           : common linux file: yes |  RELRO     |  No Canary  |  NX enabled   |  No Symbols  |  Networking    |
 27    : cli             : common linux file: no  |  No RELRO  |  No Canary  |  NX disabled  |  No Symbols  |  No Networking |

Code 1: EMBAs result of unsecure uses of the function strcpy.
代码 1:由于不安全地使用函数 strcpy,导致 EMBA。

Because the goal of this paper is to find a memory vulnerability that can be exploited over the network without the knowledge of the admin credentials, the vulnerable function must be callable over the network and should interact directly with the provided user input. But having network interaction does not mean the binary is also directly accessible over the network. To find out which binaries are listening, we can use the UART root shell, which was already established in [iovt].
由于本文的目标是查找可在管理员凭据不知情的情况下通过网络利用的内存漏洞,因此易受攻击的函数必须可通过网络调用,并且应直接与提供的用户输入交互。但是,具有网络交互并不意味着二进制文件也可以通过网络直接访问。要找出哪些二进制文件正在侦听,我们可以使用 UART 根 shell,它已经在 [iovt] 中建立。

 ~ # netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:20002         0.0.0.0:*               LISTEN      1045/tmpd
tcp        0      0 0.0.0.0:1900            0.0.0.0:*               LISTEN      1034/upnpd
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      1027/httpd
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1224/dropbear
udp        0      0 0.0.0.0:20002           0.0.0.0:*                           1048/tdpd
[...]

Code 2: Using the UART root shell to execute netstat
代码 2:使用 UART 根 shell 执行 netstat

Reversing the binary 反转二进制文件

The first binary that looks promising is wscd. The binary has the most unsafe strcpy calls (except for the libcmm.so library) and network interaction, which in the case of wscd means it connects to a UPnP device and does not listen on a specific port. It has, as shown later, an easy function to fuzz, which is why this binary was selected as an example in this paper to explain the general procedure. Before reversing, we can use the UART root shell to find out if the binary is running and how it was started.
第一个看起来很有希望的二进制文件是 wscd 。二进制文件具有最不安全 strcpy 的调用(libcmm.so 库除外)和网络交互,在这种情况下, wscd 这意味着它连接到 UPnP 设备并且不侦听特定端口。如后所示,它具有一个简单的模糊函数,这就是为什么本文选择这个二进制文件作为示例来解释一般过程的原因。在反转之前,我们可以使用 UART 根 shell 来了解二进制文件是否正在运行以及它是如何启动的。

$ ps
 PID USER       VSZ STAT COMMAND
 962 admin     1096 S    wscd -i ra0 -m 1 -w /var/tmp/wsc_upnp/
1018 admin     1080 S    wscd_5G -i rai0 -m 1 -w /var/tmp/wsc_upnp_5G/

Code 3: Using the command ps to display all running programs.
代码 3:使用命令 ps 显示所有正在运行的程序。

With ps we not only see that the binary is running but also what the arguments are, which are important to verify if a potential function is called at all. The meaning of these arguments can be gained from the CLI help, which is displayed when calling the binary without any arguments.
我们 ps 不仅可以看到二进制文件正在运行,还可以看到参数是什么,这对于验证是否调用了潜在函数很重要。可以从 CLI 帮助中获得这些参数的含义,该帮助在调用不带任何参数的二进制文件时显示。

$ chroot root /qemu-mipsel-static /usr/bin/wscd
Usage: wscd [-i infName] [-a ipaddress] [-p port] [-f descDoc] [-w webRootDir] -m UPnPOpMode -D [-d debugLevel] -h
 -i:  Interface name this daemon will run wsc protocol(if not set, will use the default interface name - ra0)
       e.g.: ra0
 -w: Filesystem path where descDoc and web files related to the device are stored
       e.g.: /etc/xml/
 -m: UPnP system operation mode
       1: Enable UPnP Device service(Support Enrolle or Proxy functions)
       2: Enable UPnP Control Point service(Support Registratr function)
       3: Enable both UPnP device service and Control Point services.
 [...]

Code 4: Options of the binary wscd.
代码 4:二进制 wscd 的选项。

As shown in Code 4 wscd is started with “Enabled UPnP Device service” which looks promising. After verifying that the binary is actually running on the router, the binary can then be analyzed using Ghidra to search for suspect functions. For fuzzing, parsing functions are especially interesting because they are usually complex, and often the input that is parsed has length fields for the containing data, like the TCP packet contains the length of the payload.
如代码 4 wscd 所示,以“启用 UPnP 设备服务”开头,看起来很有前途。在验证二进制文件是否确实在路由器上运行后,可以使用 Ghidra 分析二进制文件以搜索可疑函数。对于模糊测试,解析函数特别有趣,因为它们通常很复杂,并且通常解析的输入具有包含数据的长度字段,例如 TCP 数据包包含有效负载的长度。

Blackbox-Fuzzing of IoT Devices Using the Router TL-WR902AC as Example

Figure 1: Using Ghidra to search for parsing functions.
图 1:使用 Ghidra 搜索解析函数。

Another benefit of a parsing functions is that they often do not interact with other parts of the code or have user interaction over the network. So the parsing function can be called directly with the input without modifying the binary or overwriting other functions, so the function can be fuzzed.
解析函数的另一个好处是,它们通常不与代码的其他部分交互,也不通过网络进行用户交互。因此,解析函数可以直接与输入一起调用,而无需修改二进制文件或覆盖其他函数,因此可以对函数进行模糊处理。

Before starting to fuzz the function, it should be checked if the function is triggered at all, because the function is only interesting when it is called with a user-controlled input. For this, Ghidra can be used to search for references to the target function. In the case of the parser_parse function there are multiple ways. Because we know how the program is started, the calls can be reduced to a single function call tree, shown in Code 5.
在开始模糊函数之前,应该检查函数是否被触发,因为只有当使用用户控制的输入调用函数时,函数才有意义。为此,可以使用 Ghidra 来搜索对目标函数的引用。在 parser_parse 函数的情况下,有多种方法。因为我们知道程序是如何启动的,所以调用可以简化为单个函数调用树,如代码 5 所示。

main()
 if ((WscUPnPOpMode & 1) != 0) // Argument -m 1
  WscUPnPDevStart()
   UpnpDownloadXmlDoc() -> my_http_Download() -> http_Download()
     if (http_MakeMessage())
      http_RequestAndResponse()
       http_RecvMessage()

Code 5: Call tree of the function parser_parse
代码 5:函数 parser_parse 的调用树

After a target function is found, we can now create a fuzzing setup to fuzz the function, which is described in the next part. But first other potent functions are presented.
找到目标函数后,我们现在可以创建一个模糊设置来模糊函数,这将在下一部分中描述。但首先介绍了其他有效的功能。

Other potential vulnerable functions
其他潜在的易受攻击的功能

For this paper, multiple potential binaries were manually analyzed for suspect functions. The following is a short summary of other possible targets which were found.
在本文中,手动分析了多个潜在的二进制文件,以查找可疑函数。以下是已发现的其他可能目标的简短摘要。

The binary httpd is the backend for the admin web interface. The binary is accessible over the network on port 80. One interesting function in httpd is the httpd_parser_main function. While skimming the parser implementation using Ghidra several different suspect code parts could be identified. One of the suspect parts is the parsing of the Content-Type. In the following, a basic HTTP request can be found.
二进制 httpd 是管理 Web 界面的后端。二进制文件可通过端口 80 上的网络访问。一个有趣的 httpd 函数是 httpd_parser_main 函数。在使用 Ghidra 浏览解析器实现时,可以识别出几个不同的可疑代码部分。其中一个可疑部分是解析 Content-Type .在下文中,可以找到一个基本的 HTTP 请求。

POST / HTTP/1.1\r\n
Content-Type: multipart/form-data; boundary=X;\r\n
Host: example.com\r\n
\r\n
\r\n
DATA\r\n

Below is a snippet from the httpd_parser_main function which parses the Content-Type from the user provided http request.
下面是该 httpd_parser_main 函数的一个片段,用于解析 Content-Type 用户提供的 http 请求。

// user_input_ptr points to
//  "Content-Type: multipart/form-data; boundary=X;\r\nHost: example.com\r\n..."
cursor = strstr(user_input_ptr,"multipart/form-data");

if (user_input_ptr == cursor) {
 cursor = strstr(user_input_ptr,"boundary=");
 user_input_ptr = cursor + 9;

 // user_input_ptr points now to "X;\r\nHost: example.com\r\n..."

 if (cursor != (char *)0x0) {

  do {
    while (cursor = user_input_ptr, *cursor == " ") {
      user_input_ptr = cursor + 1;
    }
    user_input_ptr = cursor + 1;
  } while (*cursor == "\t");

  // cursor points now to "X;\r\nHost: example.com\r\n..."

  // strchr returns a pointer to the first occurrence of ";" in the user request.
  // If ";" is not found, the function returns a null pointer.
  user_input_ptr = strchr(cursor, ";");
  if (user_input_ptr != (char *)0x0) {
    // The character ";" is replaced by an null byte to terminate the string
    *user_input_ptr = "\0";
    // cursor points now to "X\0\r\nHost: example.com\r\n..."
  }

  // DAT_00444050 global array from 0x00444050 to 0x0044414f (255 Bytes)
  strcpy(&DAT_00444050, cursor);
  // DAT_00444050 contains now "X"
 }
}

Code 6: Call tree of the function parser_parse
代码 6:函数 parser_parse 的调用树

The vulnerability in this code is the function call strcpy and the assumption that the Content-Type ends with a semicolon. Because strcpy copies the buffer until the next null byte, and as shown in Code 6 the null byte is only added when a semicolon is found. By removing the semicolon, the next null byte is at the end of the input buffer, e.g., the end of the HTTP request. So the global variable DAT_00444050 can be overflowed, which then overwrites data beyond the address 0x0044414f. The challenging part is not only to find an interesting global variable beyond this address that could be overwritten, but also that no null bytes can be used because of strcpy. But when there is one such mistake, there are probably more to find.
此代码中的漏洞是函数调用 strcpy 和以分号结尾的 Content-Type 假设。因为 strcpy 将缓冲区复制到下一个 null 字节,并且如代码 6 所示,仅当找到分号时才会添加 null 字节。通过删除分号,下一个 null 字节位于输入缓冲区的末尾,例如 HTTP 请求的末尾。因此,全局变量 DAT_00444050 可能会溢出,然后覆盖地址0x0044414f之外的数据。具有挑战性的部分不仅是在此地址之外找到一个有趣的全局变量,该变量可以被覆盖,而且还由于 strcpy .但是,当有一个这样的错误时,可能会发现更多。

The binary tdpd is used by the mobile app and is accessible over UDP on the local network. tdpd has almost the same functions as the tmpd which are mostly just never called. The main function only listens for messages over the UDP port and always responds with basic information about the router, like the name or model. There is barely any interaction with the user-provided input, which is therefore not interesting to fuzz.
二进制 tdpd 由移动应用程序使用,可通过本地网络上的 UDP 访问。 tdpd 具有与 tmpd 大多数从未调用的函数几乎相同的功能。main 函数仅侦听 UDP 端口上的消息,并始终响应有关路由器的基本信息,例如名称或型号。与用户提供的输入几乎没有任何交互,因此对模糊不感兴趣。

Another interesting pair of binaries are upnpd and ushare. Both binaries are handling UPnP messages which therefore need to parse XML. Because a copyright string can be found in the binary, it can be assumed that these programs were not developed by TP-Link.
另一对有趣的二进制文件是 upnpd 和 ushare。这两个二进制文件都在处理 UPnP 消息,因此需要解析 XML。由于可以在二进制文件中找到版权字符串,因此可以假设这些程序不是由 TP-Link 开发的。

$ strings usr/bin/ushare | grep "(C)"
Benjamin Zores (C) 2005-2007, for GeeXboX Team.

Both binaries are loading the shared libraries libupnp.so and libixml.so which have the same functions as the open source project pupnp [pupnp]. Because the focus of this paper is black box fuzzing, these binaries are ignored. But gray box fuzzing this library could have potential because in 2021 a memory leak was found in libixml.so [pupnp-mem-leak].
两个二进制文件都加载了共享库, libupnp.so 并且 libixml.so 具有与开源项目 pupnp [pupnp] 相同的功能。由于本文的重点是黑盒模糊测试,因此忽略了这些二进制文件。但是对这个库进行灰盒模糊测试可能具有潜力,因为在 2021 年,在 [pupnp-mem-leak] 中 libixml.so 发现了内存泄漏。

The binary tmpd is the backend of the mobile app. The interesting part is that the router and the mobile app are communicating over a custom binary protocol. In the following, a message from the client to the server is shown.
二进制 tmpd 是移动应用程序的后端。有趣的是,路由器和移动应用程序通过自定义二进制协议进行通信。在下面,将显示从客户端到服务器的消息。

00000000  01 00 05 00 00 08 00 00  00 00 00 17 50 7b 6e fe  |............P{n.|
00000010  01 01 02 00 00 00 00 00                           |........        |

Code 7: Message from the mobile app to the router.
代码 7:从移动应用程序到路由器的消息。

To understand the binary protocol, the binary tmpd was reversed using Ghidra. With this information, the message in Code 7 can be broken down into the following:
为了理解二进制协议,使用 Ghidra 反转了二进制 tmpd 协议。有了这些信息,代码 7 中的消息可以分解为以下内容:

01 00 05 00 : Version
00 08 00 00 : Size (8 Bytes)
00 00 00 17 : Datatype
50 7b 6e fe : Checksum (CRC32)
01 01       : Options
02 00       : Function id
00 00 00 00 : Function parameters

Code 8: Custom binary protocol broken down.
代码 8:自定义二进制协议分解。

This looks promising because such binary protocols must be parsed. But the most suspect part of the binary protocol is not the length field, but the use of the function ID and function parameters.
这看起来很有希望,因为必须解析这样的二进制协议。但二进制协议中最可疑的部分不是长度字段,而是函数 ID 和函数参数的使用。

Blackbox-Fuzzing of IoT Devices Using the Router TL-WR902AC as Example

Figure 2: Reversed function from tmpd which parses the function id and their parameters.
图 2:tmpd 的反向函数,用于解析函数 id 及其参数。

Figure 2 shows a part of the decompiled parser function of the custom protocol. In line 16, the function ID is extracted, and the corresponding function is then called in line 29. The suspect behavior is that the function is called with parameters extracted without any check from the user-controlled input buffer. We could now try to find a function in the jumping table shown in Figure 3 where this could be dangerous, like when the parameter is used to index a buffer or interpreted as a string. Instead of manually reversing and searching the over 100 functions, which would be time-consuming, we can use a fuzzer which would do this automatically.
图 2 显示了自定义协议的反编译解析器函数的一部分。在第 16 行中,提取函数 ID,然后在第 29 行调用相应的函数。可疑行为是,调用函数时提取的参数没有从用户控制的输入缓冲区进行任何检查。现在,我们可以尝试在图 3 所示的跳转表中找到一个函数,其中可能很危险,例如当参数用于索引缓冲区或解释为字符串时。与其手动反转和搜索 100 多个函数,这将很耗时,我们可以使用一个自动执行此操作的模糊器。

Blackbox-Fuzzing of IoT Devices Using the Router TL-WR902AC as Example

Figure 3: Reversed function from tmpd which parses the function ID and its parameters.
图 3:tmpd 的反向函数,用于解析函数 ID 及其参数。

Unfortunately, the tmpd binary is only locally reachable over the network, as shown in Code 2. To connect to this binary, the app first connects to the router via SSH in the mode direct-tcpip which just forwards the packets to the local process. And the SSH connection is protected by the admin credentials. But as described in [iovt] the SSH connection can easily be compromised because the server host key is never checked by the app. By dropping every packet routed to the internet, the admin can be tricked into logging in to the router while a man-in-the-middle attack is performed to steal the credentials.
遗憾的是, tmpd 二进制文件只能通过网络本地访问,如代码 2 所示。要连接到此二进制文件,应用程序首先通过 SSH 连接到路由器,该模式 direct-tcpip 仅将数据包转发到本地进程。SSH 连接受管理员凭据保护。但如 [iovt] 中所述,SSH 连接很容易受到损害,因为应用程序从未检查过服务器主机密钥。通过丢弃路由到互联网的每个数据包,可以诱骗管理员登录路由器,同时执行中间人攻击以窃取凭据。

Fuzzing with AFL++ and QEMU
使用 AFL++ 和 QEMU 进行模糊测试

In this section, a harness is developed targeting one of the previously found function. After the harness is developed, the state-of-the-art fuzzer AFL++ [aflpp] is used to fuzz the target function. Because the binaries are compiled for the mipsel architecture, the emulator QEMU is used to execute the binary. The basic fuzzing setup used in this paper is mostly inspired by the blog entry “Firmware Fuzzing 101” by Adam Van Prooyen [b101].
在本节中,针对先前发现的功能之一开发了一种线束。线束开发完成后,使用最先进的模糊器 AFL++ [aflpp] 对目标函数进行模糊处理。由于二进制文件是针对 mipsel 体系结构编译的,因此模拟器 QEMU 用于执行二进制文件。本文中使用的基本模糊测试设置主要受到 Adam Van Prooyen [b101] 的博客文章“固件模糊测试 101”的启发。

Fuzzing environment 模糊测试环境

To easily create a reproducible fuzzing environment, Docker is the best choice. We created a Dockerfile that installs every necessary tool, like a cross-compiler for mipsel CPU architecture or gdb-multiarch which can be used to debug the harness.
为了轻松创建可重现的模糊测试环境,Docker 是最佳选择。我们创建了一个 Dockerfile,用于安装所有必要的工具,例如用于 mipsel CPU 架构的交叉编译器,或者 gdb-multiarch 可用于调试工具。

Furthermore, AFLplusplus is downloaded and compiled together with QEMU which is built in a version with minor tweaks to allow non-instrumented binaries to be run under afl-fuzz.
此外,AFLplusplus 与 QEMU 一起下载和编译,QEMU 构建在一个版本中,稍作调整,允许在 afl-fuzz 下运行未检测的二进制文件。

FROM debian:latest

RUN apt update && apt install -y \
      curl \
      vim \
      gcc-mipsel-linux-gnu \
      openssh-server \
      qemu-user-static \
      gdb-multiarch
# Qemu statics are installed at /usr/bin/qemu-mipsel-static

# Compiling AFL++
RUN apt install -y git make build-essential clang ninja-build pkg-config libglib2.0-dev libpixman-1-dev
RUN git clone https://github.com/AFLplusplus/AFLplusplus /AFLplusplus
WORKDIR /AFLplusplus
RUN make all
WORKDIR /AFLplusplus/qemu_mode
RUN CPU_TARGET=mipsel ./build_qemu_support.sh

RUN echo "#!/bin/bash\n\nsleep infinity" >> /entry.sh
RUN chmod +x /entry.sh

WORKDIR /share
ENTRYPOINT [ "/entry.sh" ]

Dockerfile which installs necessary tools.
Dockerfile,用于安装必要的工具。

The image can then be built using docker build.
然后可以使用 docker build 构建映像。

docker build -t fuzz .

When the image is built, it can be easily used with docker run which then starts the container.
生成映像后,可以很容易地使用它 docker run ,然后启动容器。

docker run -d --rm -v $PWD/:/share --name fuzz fuzz

Using the option -d will start the container in the background. With docker exec multiple shells can be started inside the container, which is helpful to start the executable in one session using QEMU and in the other session gdb-multiarch.
使用该选项 -d 将在后台启动容器。可以在 docker exec 容器内启动多个 shell,这有助于使用 QEMU 在一个会话和另一个会话 gdb-multiarch 中启动可执行文件。

docker exec -it fuzz /bin/bash

Overwrite the main function
覆盖 main 函数

In the previous section, a potent fuzz target was identified. The problem is that when executing the binary, we will never reach the function call because the parser_parse function is only called if a TCP packet is received over a socket. This would be not only bad for performance, but also hard to set up. This is why the entry of the fuzzer should be at an different location than the normal main function. For this, the environment variable LD_PRELOAD which enables injecting a harness that has access to internal functions, can be used. As the man page of ld.so, which is responsible for linking the shared libraries needed by an executable at runtime, describes, LD_PRELOAD can be used “to selectively override functions in other shared objects [man-pages].”
在上一节中,确定了一个有效的模糊靶标。问题是,在执行二进制文件时,我们将永远无法访问函数调用,因为只有在通过套接字接收 TCP 数据包时才会调用该 parser_parse 函数。这不仅不利于性能,而且难以设置。这就是为什么模糊器的入口应该位于与正常主函数不同的位置。为此,可以使用环境变量 LD_PRELOAD ,该变量可以注入可以访问内部功能的线束。正如 的 ld.so 手册页所描述的那样,它负责在运行时链接可执行文件所需的共享库, LD_PRELOAD 可以用来“有选择地覆盖其他共享对象 [手册页] 中的函数”。

The function __uClibc_main is best suited for this purpose. To overwrite this function, a C file must be created that contains a function with the same name.
该功能 __uClibc_main 最适合此目的。若要覆盖此函数,必须创建一个包含同名函数的 C 文件。

void __uClibc_main(void *main, int argc, char** argv) {
    // Harness code, e.g. call the function parser_append
    printf("My custom __uClibc_main was called!");
}

The C file can then be cross-compiled to a shared object in the mipsel architecture using mipsel-linux-gnu-gcc. The option -fPIC enables “Position Independent Code” which means that the machine code does not depend on being located at a specific address by using relative addressing instead of absolute.
然后,可以使用 mipsel-linux-gnu-gcc 将 C 文件交叉编译为 mipsel 体系结构中的共享对象。该选项 -fPIC 启用“位置无关代码”,这意味着机器代码不依赖于使用相对寻址而不是绝对寻址位于特定地址。

$ mipsel-linux-gnu-gcc parser_parse_hook.c -o parser_parse_hook.o -shared -fPIC

The newly created shared library can then be loaded by adding the environment variable LD_PRELOAD to the QEMU command.
然后,可以通过 LD_PRELOAD 将环境变量添加到 QEMU 命令来加载新创建的共享库。

$ chroot root /qemu-mipsel-static -E LD_PRELOAD=/parser_parse_hook.o /usr/bin/wscd
My custom __uClibc_main was called!

With the command chroot the current and root directories can be changed for the command provided. This is helpful because the executable wscd opens other files, like shared libraries from the firmware. We can see this behavior by adding the argument -strace to QEMU.
使用该命令 chroot ,可以更改所提供命令的当前目录和根目录。这很有帮助,因为可执行文件 wscd 会打开其他文件,例如固件中的共享库。我们可以通过向 QEMU 添加参数 -strace 来查看此行为。

chroot root /qemu-mipsel-static -E LD_PRELOAD=/parser_parse_hook.o -strace /usr/bin/wscd /corpus/notify.txt
38180 mmap(NULL,4096,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS|0x4000000,-1,0) = 0x7f7e7000
38180 stat("/etc/ld.so.cache",0x7ffffa48) = -1 errno=2 (No such file or directory)
38180 open("/parser_parse_hook.o",O_RDONLY) = 3
38180 fstat(3,0x7ffff920) = 0
38180 close(3) = 0
38180 munmap(0x7f7e6000,4096) = 0
38180 open("/lib/libpthread.so.0",O_RDONLY) = 3
38180 open("/lib/libc.so.0",O_RDONLY) = 3
[...]

As we can see, the executable opens multiple libraries in the /lib/ folder on the firmware and not on the host.
正如我们所看到的,可执行文件在固件上的 /lib/ 文件夹中打开多个库,而不是在主机上。

Developing and debug the harness
开发和调试线束

After the setup is created, we can now start developing a harness. As described in the background section, the harness is the driver between the fuzzer and the target function. The harness loads the fuzz input, which is stored by AFL++ in a file. With the file path as parameters, the harness then calls the fuzzing target; in this case, this would be parser_append. The functions can be called by using the address.
创建设置后,我们现在可以开始开发线束了。如背景部分所述,线束是模糊器和目标功能之间的驱动器。线束加载模糊输入,该输入由 AFL++ 存储在文件中。以文件路径为参数,工具随后调用模糊测试目标;在本例中,这将是 parser_append .可以使用地址调用函数。

void __uClibc_main(void *main, int argc, char** argv)
{
  // Verify that a filename is provided
  if (argc != 2) exit(1);

  // Create function pointer to the fuzz target
  int (*parser_request_init)(void *, int) = (void *) 0x00412564;
  int (*parser_append)(void *, void *, int) = (void *) 0x00412e98;

  // Open the fuzz input file
  int fd = open(argv[1], O_RDONLY);
  char fuzz_buf[2048 + 1];
  int fuzz_buf_len = read(fd, fuzz_buf, sizeof(fuzz_buf) - 1);
  if (fuzz_buf_len < 0) exit(1);
  fuzz_buf[fuzz_buf_len] = 0;

  // Call the target functions
  uint8_t parsed_data[220];
  parser_request_init(parsed_data, 8);
  int status = parser_append(parsed_data, fuzz_buf, fuzz_buf_len);
  printf("Response is %d\n", status);
  exit(0);
}

Code 10: Harness code with the fuzz target `parser_append` in the binary wscd.
代码 10:在二进制 wscd 中使用模糊目标“parser_append”来利用代码。

As shown in Code 10 the function parser_parse is not called directly but by using the function parser_append. Before this function is called, the initialization function parser_request_init must be called, which initializes the output struct of the parser_parse function.
如代码 10 所示,该函数 parser_parse 不是直接调用的,而是使用函数 parser_append 调用的。在调用此函数之前,必须调用初始化函数,该函数 parser_request_init 初始化 parser_parse 函数的输出结构。

While in the case of the parser_parse the harness is pretty easy to set up, other targets require more sophisticated harnesses like the httpd_parser_main function. For example, before calling the target, the function http_init_main must be called, which ends in a SIGSEGV. To find out where this segmentation fault is caused, it is useful to debug the code with a debugger like gdb. To do this, QEMU can be started with the option -g which spawns a gdb-server at the provided port.
parser_parse 虽然线束很容易设置,但其他目标需要更复杂的线束,例如功能 httpd_parser_main 。例如,在调用目标之前,必须调用该函数,该函数 http_init_main 以 SIGSEGV 结尾。要找出导致此分段错误的位置,使用调试器(如 gdb )调试代码很有用。为此,可以使用在提供的端口生成 a gdb-server 的选项 -g 启动 QEMU。

chroot root /qemu-mipsel-static -strace -g 1234 -E LD_PRELOAD="/httpd_parser_main.o" /usr/bin/httpd
corpus/httpd/simple.txt

Because the binary is in the mipsel architecture gdb-multiarch must be used. After gdb is started, the following init script can be loaded with gdb using sources <path to script>.
因为二进制文件在 mipsel 架构 gdb-multiarch 中必须使用。gdb 启动后,可以使用 sources <path to script> gdb 加载以下 init 脚本。

set solib-absolute-prefix /share/root/
file /share/root/usr/bin/httpd
target remote :1234
# break bevor fuzz target is called
# break __uClibc_main
break http_parser_main
display/4i $pc

Because of the chroot the script first changed the absolute prefix path so that when the binary loads a shared object, gdb will find the file. Then the targeted file is set, because QEMUs gdb-server does not support file transfer, so gdb tries to load the files from the disk instead. After gdb is configured, the script then connects to the gdb-server with target remote and creates a breakpoint at the start of the target function. With display, the output is just improved, so when stepping through, the next four lines of assembly will be shown. Using si we can step one instruction, which is useful when the harness has a segmentation fault using the default corpus, which should always work. As shown in Code 11 the binary has a segmentation fault in the function fprintf.
由于 chroot 的原因,脚本首先更改了绝对前缀路径,以便当二进制文件加载共享对象时,gdb 将找到该文件。然后设置目标文件,因为 QEMU 的 gdb-server 不支持文件传输,因此 gdb 会尝试从磁盘加载文件。配置 gdb 后,脚本将连接到 target remote gdb-server,并在目标函数的开头创建一个断点。通过显示,输出只是得到了改善,因此在步进时,将显示接下来的四行装配。使用 si 我们可以执行第一条指令,当使用默认语料库时,当线束出现分段错误时,这很有用,这应该始终有效。如代码 11 所示,二进制文件在函数 fprintf 中存在分段错误。

(gdb) si
0x004059b0 in http_parser_makeHeader ()
1: x/4i \$pc
=> 0x4059b0 <http_parser_makeHeader+120>:       jalr    t9
   0x4059b4 <http_parser_makeHeader+124>:       addiu   a1,a1,16248
   0x4059b8 <http_parser_makeHeader+128>:       li      v0,200
   0x4059bc <http_parser_makeHeader+132>:       lw      gp,16(sp)
(gdb) ni
0x7f56a8ac in fprintf () from /share/root/lib/libc.so.0
1: x/4i \$pc
=> 0x7f56a8ac \<fprintf+44>:     bal     0x7f56db80 <vfprintf>
   0x7f56a8b0 \<fprintf+48>:     nop
   0x7f56a8b4 \<fprintf+52>:     lw      ra,36(sp)
   0x7f56a8b8 \<fprintf+56>:     jr      ra
   0x7f56a8bc \<fprintf+60>:     addiu   sp,sp,40
(gdb) n
Single stepping until exit from function fprintf,
which has no line number information.

Program received signal SIGSEGV, Segmentation fault.

Code 11: Segmentation fault in printf.
代码 11:printf 中的分段错误。

To investigate the error, Ghidra can be used to find out with which parameters the function is called.
为了调查错误,可以使用 Ghidra 来找出调用函数的参数。

fprintf(
 *(FILE **)(iVar1 + 0x101c),
 "HTTP/1.1 %d %s\r\n",
 *(undefined4 *)(&DAT_0042ee68 + (uint)(byte)(&DAT_00414570)[statuscode & 0x3f] * 8),
 (&PTR_DAT_0042ee6c)[(uint)(byte)(&DAT_00414570)[statuscode & 0x3f] * 2]
);

The SIGSEGV is probably caused by the fact that the first parameter is not a file descriptor but a null pointer. Where iVar1`` is just a reference to the input of the httpd_parser_main` function. This means that the fuzzing input must have a file descriptor at position 0x101c. So the input must be adjusted to the following struct.
SIGSEGV 可能是由于第一个参数不是文件描述符而是空指针。其中 iVar1`` is just a reference to the input of the httpd_parser_main“功能。这意味着模糊测试输入必须在位置 0x101c 处有一个文件描述符。因此,必须将输入调整为以下结构。

typedef struct  {
  int _a;     // 4 Bytes
  int _b;     // 4 Bytes
  int socket; // 4 Bytes
  int ip;     // 4 Bytes
  int mac;    // 4 Bytes
  unsigned char body[0x1008]; 0x101c - 4*5 = 0x1008 Bytes
  FILE * fd_out; // expected to be a valid file descriptor
} HttpMainT;

Because fd_out must just be a valid file descriptor pointer, it can easily be set to stdout. Executing the httpd_parser_main again will now produce a valid HTTP output.
因为必须只是一个有效的文件描述符指针,所以 fd_out 可以很容易地将其 stdout 设置为 。现在,再次执行将 httpd_parser_main 生成有效的 HTTP 输出。

$ chroot root /qemu-mipsel-static -E LD_PRELOAD=/httpd_parser_main.o \
    /usr/bin/httpd /httpd_corpus.txt

bind: No such file or directory
[ dm_shmInit ] 086:  shmget to exitst shared memory failed. Could not create shared memory.
rdp_getObj is called with: 4274932gdpr_getSystemGDPREntry Error
gdpr_getNewSystemGDPREntry OK
#Msg: getsockname error
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Content-Length: 24257
Set-Cookie: JSESSIONID=deleted; Expires=Thu, 01 Jan 1970 00:00:01 GMT; Path=/; HttpOnly
Connection: close

<!DOCTYPE html>
[...]

The harness works now and can be used to fuzz the function using AFL++ which will be explained in the next section.
线束现在可以工作了,可以使用 AFL++ 对功能进行模糊处理,这将在下一节中解释。

Generate corpus data 生成语料库数据

As mentioned in the background, a seed corpus describes valid input samples, which serves as a foundational reference for generating new input data during the fuzzing process.
如背景所述,种子语料库描述了有效的输入样本,它作为在模糊测试过程中生成新输入数据的基础参考。

These inputs are typically chosen to represent different aspects of the target programs. The seed corpus is used by a fuzzer to generate mutated or evolved test cases that are then run against the target software to uncover bugs, crashes, or other problems. This corpus plays an important role in directing the fuzzer to relevant areas of the program and increasing the probability of detecting vulnerabilities or unexpected behaviors. By providing a diverse and representative set of initial inputs, the seed corpus helps the fuzzer explore different paths in the target faster and thereby increases coverage.
通常选择这些输入来表示目标程序的不同方面。模糊测试器使用种子语料库生成突变或进化的测试用例,然后针对目标软件运行这些测试用例,以发现错误、崩溃或其他问题。该语料库在将模糊器定向到程序的相关区域并增加检测漏洞或意外行为的概率方面发挥着重要作用。通过提供多样化且具有代表性的初始输入集,种子语料库可帮助模糊器更快地探索目标中的不同路径,从而增加覆盖率。

When it comes to functions parsing network data, these inputs can be created by using Wireshark to record different packets.
当涉及到解析网络数据的函数时,可以通过使用 Wireshark 记录不同的数据包来创建这些输入。

For the function, httpd_parse_main four different corpora were created. Each targeting different paths in the binary. One example is the login request, which contains the username and password. For this corpus, the harness had to be modified because TP-Link uses (weak) cryptography to “protect” the password. For this, the password is encrypted in the browser using AES and then decrypted on the backend. Whereby the password is generated in the browser and then encrypted using RSA. Then the encrypted data is signed. Because a fuzzer can not create a signature or encrypt data, some functions were overwritten and now just decodes the data from base64. For this, the data were first extracted in plaintext from the browser using the debugger shown in Figure 4.
对于该功能, httpd_parse_main 创建了四个不同的语料库。每个都针对二进制文件中的不同路径。一个例子是登录请求,其中包含用户名和密码。对于这个语料库,必须修改线束,因为TP-Link使用(弱)加密来“保护”密码。为此,密码在浏览器中使用AES进行加密,然后在后端解密。据此,在浏览器中生成密码,然后使用RSA进行加密。然后对加密数据进行签名。由于模糊器无法创建签名或加密数据,因此某些函数被覆盖,现在仅解码 base64 中的数据。为此,首先使用图 4 所示的调试器从浏览器中以纯文本形式提取数据。

Blackbox-Fuzzing of IoT Devices Using the Router TL-WR902AC as Example

Figure 4: Extracting the data bevor encryption.
图 4:提取数据加密。

In the target, the function rsa_tmp_decrypt_bypart was then overwritten to replace the logic from decrypting the data to just decoding from base64.
在目标中,该函数 rsa_tmp_decrypt_bypart 随后被覆盖,以替换从解密数据到仅从 base64 解码的逻辑。

// Replacing the logic with b64_decode
int rsa_tmp_decrypt_bypart(uint8_t *input, int input_len, uint8_t *output) { // other params just key data
  int (*b64_decode)(uint8_t *, int, uint8_t *, int) = (void *) 0x0040bf00;
  b64_decode(output, 0x1000, input, input_len);
  int * seqnumber = (int *) 0x00444db0;
  *seqnumber = 0x3ac28e29-input_len+12;
  return 0; // says it was okay
}

Code 12: Function rsa_tmp_decrypt_bypart now just decodes base64 instead of decrypt the data.
代码 12:函数 rsa_tmp_decrypt_bypart 现在只是解码 base64 而不是解密数据。

While executing the corpus, the target function always returns an HTML document with the error “408 Request Timeout”. Using Ghidra and GDB the problem could be identified. The error always happens after the function call to http_stream_fgets. The problematic line was the check for the line break character \n.
在执行语料库时,目标函数始终返回一个 HTML 文档,并显示错误“408 请求超时”。使用 Ghidra 和 GDB,可以识别问题。该错误始终在函数调用 之后 http_stream_fgets 发生。有问题的行是 对换行符 \n 的检查。

if (((cVar1 == '\n') && (param_3 < pcVar4)) && (pcVar4[-1] == '\r')) {

This condition enforces that after every line break, a carriage return must follow. After adding the carriage return, all the created corpora worked.
此条件强制要求在每次换行后,必须遵循回车符。添加回车符后,所有创建的语料库都正常工作。

Fuzz the target 模糊目标

In the last section, we developed multiple harnesses and executed them using QEMU. In this section, QEMU is replaced by AFL++ which gets the generated corpora as seed input to fuzz the target function. In the section “Fuzzing environment” a docker image was created that already pulls AFL++ from GitHub and then uses an AFL++-provided script to build a patched version of QEMU. So AFL++ can now be started with the following command that gets different parameters, like -Q which tells AFL++ to use the patched version of QEMU.
在上一节中,我们开发了多个线束,并使用 QEMU 执行它们。在本节中,QEMU 被 AFL++ 取代,AFL++ 将生成的语料库作为种子输入来模糊目标函数。在“模糊测试环境”部分中,创建了一个 docker 映像,该映像已从 GitHub 拉取 AFL++,然后使用 AFL++ 提供的脚本来构建 QEMU 的修补版本。因此,AFL++ 现在可以使用以下命令启动,该命令获取不同的参数,例如 -Q 告诉 AFL++ 使用 QEMU 的修补版本。

QEMU_LD_PREFIX=/share/root AFL_PRELOAD=/share/root/httpd_parser_main.o \
  /AFLplusplus/afl-fuzz -Q \
  -i /share/root/corpus/httpd/ -o /share/afl-out/httpd/ \
  -- /share/root/usr/bin/httpd @@

Code 13: Fuzzing the binary httpd using the harness and afl-fuzz.
代码 13:使用线束对 afl-fuzz 二进制文件 httpd 进行模糊测试。

Unlike before, the command chroot is no longer necessary and is replaced by the variable QEMU_LD_PREFIX. Which tells QEMU where to search for shared objects. Also, the LD_PRELOAD variable is replaced by the AFL-specific version AFL_PRELOAD. The last argument in the command is the two @ characters. They will be replaced by AFL++ with a file path that holds the fuzzing input. When started, AFL++ shows the progress using the terminal UI shown in Figure 5.
与以前不同的是,该命令 chroot 不再是必需的,而是由变量 QEMU_LD_PREFIX .它告诉 QEMU 在哪里搜索共享对象。此外,该 LD_PRELOAD 变量被 AFL 特定版本 AFL_PRELOAD 替换。命令中的最后一个参数是两个 @ 字符。它们将被 AFL++ 替换,其中包含保存模糊测试输入的文件路径。启动时,AFL++ 使用图 5 所示的终端 UI 显示进度。

Blackbox-Fuzzing of IoT Devices Using the Router TL-WR902AC as Example

Figure 5: The status screen from AFL++.
图 5:AFL++ 的状态屏幕。

The AFL++ status screen provides essential insights into the current fuzzing process. The docs of AFL++ have a nice overview of the terms used in the status screen [afl-screen]. When debugging the corpus with the following environment variables, the UI can be disabled and with AFL_DEBUG a detailed logging enabled, which shows the current fuzzer input and the stdout from the target program.
AFL++ 状态屏幕提供了对当前模糊测试过程的基本见解。的文档 AFL++ 对状态屏幕 [afl-screen] 中使用的术语有一个很好的概述。使用以下环境变量调试语料库时,可以禁用 UI 并 AFL_DEBUG 启用详细日志记录,以显示当前模糊器输入和来自目标程序的 stdout 输入。

export AFL_DEBUG=1 && export AFL_NO_UI=1
unset AFL_DEBUG && unset AFL_NO_UI

As shown in Figure 5 fuzzing a binary can take quite some time. According to the docs it “should be expected to run for days or weeks” and “some jobs will be allowed to run for months.” To improve the time needed, the exec speed should be above 100 execs/sec. When, for example, the target httpd_main_parser was fuzzed, the exec speed was at the beginning by around 30/sec. To improve the speed, the target binary was searched for suspect functions, which are probably the cause of the slowdown. One of the suspect functions was rsa_gdpr_generate_key because generating an RSA key is known to be slow. After overwriting the function, the speed improved to 600 executions per second.
如图 5 所示,对二进制文件进行模糊测试可能需要相当长的时间。根据文档,它“应该运行数天或数周”,并且“某些工作将被允许运行数月”。为了缩短所需的时间,执行速度应高于 100 执行/秒。例如,当目标 httpd_main_parser 被模糊化时,执行速度在开始时约为 30/秒。为了提高速度,在目标二进制文件中搜索可疑函数,这可能是速度变慢的原因。其中一个可疑的功能是因为 rsa_gdpr_generate_key 生成 RSA 密钥的速度很慢。覆盖函数后,速度提高到每秒 600 次执行。

One indicator that helps indicate when to stop fuzzing is the cycle counter. AFL++ will highlight the number in green when “the fuzzer has not been seeing any action for a longer while,” which helps to make the call to stop the fuzzer.
一个有助于指示何时停止模糊测试的指标是循环计数器。当“模糊器长时间没有看到任何动作”时,AFL++ 将突出显示绿色数字,这有助于发出停止模糊器的呼叫。

But the most interesting number is probably “total crashes”. This shows when the program crashes because of the current fuzzing input and is probably a memory-related bug. To verify that this is a real bug gdb can be used again to find the position of the bug.
但最有趣的数字可能是“总崩溃”。这显示了程序何时由于当前的模糊测试输入而崩溃,并且可能是与内存相关的错误。为了验证这是一个真正的错误, gdb 可以再次使用来查找错误的位置。

Conclusion 结论

Fuzzing may be the most effective way to find security vulnerabilities. In this term paper, three different functions were fuzzed, but none were found. While the black box fuzzing setup itself is not that complex and time-consuming, finding a potent target and developing a working harness are. Most of the time, the harness has to be debugged, and then the underlining logic in the binary must be reversed, which again consumes a long time.
模糊测试可能是查找安全漏洞的最有效方法。在本学期论文中,模糊了三种不同的函数,但没有发现。虽然黑匣子模糊测试设置本身并不那么复杂和耗时,但找到一个有效的目标并开发一个有效的线束却是。大多数时候,线束必须调试,然后必须反转二进制文件中的下划线逻辑,这又会消耗很长时间。

原文始发于tsmr:Blackbox-Fuzzing of IoT Devices Using the Router TL-WR902AC as Example

版权声明:admin 发表于 2024年5月31日 下午7:18。
转载请注明:Blackbox-Fuzzing of IoT Devices Using the Router TL-WR902AC as Example | CTF导航

相关文章