Apple Gets an ‘F’ for Slicing Apples

 

Background 背景

I’m currently working on Volume II of the “The Art of Mac Malware” (TAOMM) series. This 2nd book is a comprehensive resource that covers the programmatic detection of macOS malware code via behavioral-based heuristics. In other words, it’s all about writing tools to detect malware.
我目前正在编写“Mac 恶意软件的艺术”(TAOMM) 系列的第二卷。这 2 nd 书是一本综合资源,涵盖了通过基于行为的启发法对 macOS 恶意软件代码进行编程检测。换句话说,这一切都是为了编写检测恶意软件的工具。

There are many ways to detect malware and a multi-faced approach is of course recommended. However, one good place to start, is to enumerate all running process and for each, examine the process’ main binary for anomalies or characteristics that may reveal if it is malware.
检测恶意软件的方法有很多,当然建议采用多方面的方法。然而,一个好的起点是枚举所有正在运行的进程,并针对每个进程检查该进程的主要二进制文件是否存在异常或特征,这些异常或特征可能会揭示它是否是恶意软件。

There are a myriad of things to look at when attempting to classify a process’ binary as benign or malicious. Programmatic methods are covered in detail in the soon to be published TAOMM Volume II, but a few include: code signing information, dependencies, export/imports, and whether the binary is packed or encrypted. And while there are dedicated APIs to extract the code signing information of binary, if you’re interested in almost anything else, you’ll have to parse the binary yourself.
在尝试将进程的二进制文件分类为良性或恶意时,需要考虑很多因素。即将发布的 TAOMM 第二卷详细介绍了编程方法,但其中一些包括:代码签名信息、依赖项、导出/导入以及二进制文件是否打包或加密。虽然有专用的 API 可以提取二进制文件的代码签名信息,但如果您对其他任何内容感兴趣,则必须自己解析二进制文件。

In this blog post, we’ll show that one of the foundational APIs related to parsing such binaries is fundamentally broken (even on the latest version of macOS, 14.3.1). And while this bug isn’t exploitable, per se, it still has security implications, especially in the context of detecting malware! 👀
在这篇博文中,我们将展示与解析此类二进制文件相关的基础 API 之一从根本上被破坏了(即使在最新版本的 macOS 14.3.1 上也是如此)。虽然这个错误本身不可利用,但它仍然具有安全隐患,特别是在检测恶意软件的情况下! 👀

Parsing Binaries 解析二进制文件

Parsing “Apple” binaries is a rather in-depth and nuanced topic. So much so, that there is a whole chapter dedicated exclusively to it in both Volume I, and in the upcoming Volume II. Here though, a high-level discussion will suffice.
解析“Apple”二进制文件是一个相当深入且微妙的主题。以至于在第一卷和即将出版的第二卷中都有一整章专门讨论它。不过,在这里进行一次高层讨论就足够了。

The native binary format used by all Apple devices is the venerable Mach-O. If you’re interested in programmatically analyzing binaries that run on macOS (for example to extract dependencies or to see if its packed) you’ll have to leverage a Mach-O parser.
所有 Apple 设备使用的原生二进制格式是古老的 Mach-O。如果您有兴趣以编程方式分析 macOS 上运行的二进制文件(例如提取依赖项或查看其是否已打包),则必须利用 Mach-O 解析器。

However, Mach-Os are often distributed via universal (or “fat” in Apple parlance) binaries. Such universal binaries are really just containers for multiple architecture-specific (but generally speaking logically equivalent) Mach-O binaries. In Apple parlance embedded Mach-O’s are known as slices.
然而,Mach-O 通常通过通用(或苹果术语中的“fat”)二进制文件进行分发。这种通用二进制文件实际上只是多个特定于架构的(但一般来说逻辑上等效的)Mach-O 二进制文件的容器。用 Apple 的话说,嵌入式 Mach-O 被称为切片。

Apple Gets an 'F' for Slicing Apples

A fat (universal) binary contains multiple Mach-Os, or slices
胖(通用)二进制文件包含多个 Mach-O 或切片

The clear benefit to this is that a single file can be distributed to users that will natively run on any number of architectures. This works, because at runtime, for example when a user launches a (universal) binary the loader will parse the universal binary and automatically select the most compatible Mach-O to run.
这样做的明显好处是,可以将单个文件分发给在任意数量的体系结构上本机运行的用户。这是可行的,因为在运行时,例如当用户启动(通用)二进制文件时,加载程序将解析通用二进制文件并自动选择最兼容的 Mach-O 来运行。

Using macOS’s file command you can examine a binary to see if its fat, and if so, what architecture-specific Mach-Os it contains. For example, you can see that LuLu is a universal binary containing two embedded Mach-Os: one for 64-bit Intel (x86_64) systems and another for Apple Silicon (arm64):
使用 macOS 的 file 命令,您可以检查二进制文件是否是胖的,如果是,则它包含哪些特定于体系结构的 Mach-O。例如,您可以看到 LuLu 是一个包含两个嵌入式 Mach-O 的通用二进制文件:一个用于 64 位 Intel ( x86_64 ) 系统,另一个用于 Apple Silicon ( arm64 ):

% file /Applications/LuLu.app/Contents/MacOS/LuLu 

LuLu: Mach-O universal binary with 2 architectures: 
[x86_64:Mach-O 64-bit executable x86_64] [arm64:Mach-O 64-bit executable arm64]

LuLu (for architecture x86_64):   Mach-O 64-bit executable x86_64
LuLu (for architecture arm64):    Mach-O 64-bit executable arm64

Circling back to the programmatic detection of malware, one we have a list of running processes we’ll want to closely examine each process’ binary. However, if the item was distributed as a fat binary, while the on-disk image will be fat/universal binary what’s actually running is just the most compatible Mach-O slice. This slice is what we’re interested in …as generally speaking there isn’t much point in examining any other of the embedded Mach-O’s. This means not only will we have to first parse the universal binary, but also identify the slice that the loader decided was the most compatible to run on the system that we’re scanning.
回到恶意软件的编程检测,我们有一个正在运行的进程列表,我们需要仔细检查每个进程的二进制文件。然而,如果该项目作为胖二进制文件分发,而磁盘上的映像将是胖/通用二进制文件,则实际运行的只是最兼容的 Mach-O 切片。这一部分是我们感兴趣的……一般来说,检查任何其他嵌入式 Mach-O 都没有多大意义。这意味着我们不仅必须首先解析通用二进制文件,还要识别加载程序认为最适合在我们正在扫描的系统上运行的切片。

Though you might think this is trivial (“on Intel systems it’ll be the Intel slice, while on Apple Silicon it will be the arm slice”) it is actually a bit more complicated, as there can be any number of slices even with varying degree of compatibility for the same architecture plus even emulation cases whereas an Intel slice can actual run on an Apple Silicon system thanks to Rosetta.
虽然您可能认为这很简单(“在 Intel 系统上它将是 Intel 切片,而在 Apple Silicon 上它将是 Arm 切片”),但它实际上有点复杂,因为即使有任何数量的切片也可以相同架构的不同程度的兼容性,甚至仿真案例,而英特尔切片实际上可以在 Apple Silicon 系统上运行,这要归功于 Rosetta。

Internally, as we’ll see Apple uses the term “grade” and a grading algorithm to select the best slice.
在内部,我们将看到苹果使用术语“等级”和分级算法来选择最佳切片。

This term and approach makes sense as a universal binary can have multiple slices that are to a varying degree, compatible and thus could all run on a given system. However, the but the system should select the “best” or most compatible one (which all comes down to the slice that has a CPU type and sub type that matches most closely to the host systems CPU).
这个术语和方法是有意义的,因为通用二进制文件可以有多个切片,这些切片在不同程度上是兼容的,因此都可以在给定的系统上运行。但是,系统应该选择“最佳”或最兼容的一个(这一切都取决于具有与主机系统 CPU 最匹配的 CPU 类型和子类型的片)。

Luckily, Apple has provided APIs that can help us find the relevant slice …and once we have that in hand, we can happily scan it!
幸运的是,Apple 提供了 API 可以帮助我们找到相关的切片……一旦我们掌握了它,我们就可以愉快地扫描它了!

Interested in Learning More About Apple Binaries?
有兴趣了解有关 Apple 二进制文件的更多信息吗?

Volume I of “The Art of Mac Malware” (that focuses on analysis) contains a chapter on both fat (universal) and Mach-O binaries.
《Mac 恶意软件的艺术》第一卷(侧重于分析)包含有关 fat(通用)和 Mach-O 二进制文件的一章。

You can read this (and all other chapters of Vol I) online for free:
您可以免费在线阅读本文(以及第一卷的所有其他章节):

 

CH 5: Binary Triage
“CH 5:二元分类”

Finding the Best Slice
寻找最佳切片

Efficiency for security tools is paramount. A such, when we encounter a running process that is backed by a universal binary containing multiple slices, we’re solely interested in only the slice that is actual running.
安全工具的效率至关重要。因此,当我们遇到由包含多个切片的通用二进制文件支持的正在运行的进程时,我们只对实际运行的切片感兴趣。

Traditionally one would use the NXFindBestFatArch API, which, as its name implies would return the best slice. Behind the scenes this APIs parses the universal binary’s header and each fat_arch structure that describes the embedded Mach-Os (slices):
传统上,人们会使用 NXFindBestFatArch API,顾名思义,它会返回最佳切片。该 API 在幕后解析通用二进制文件的标头以及描述嵌入式 Mach-O(切片)的每个 fat_arch 结构:

Apple Gets an 'F' for Slicing Apples

Each slice (Mach-O) is described via a fat_arch structure
每个切片(Mach-O)通过 fat_arch 结构进行描述

Here’s the definition of the fat_arch structure, which is found in Apple’s macho/fat.h:
下面是 fat_arch 结构的定义,可以在 Apple 的 macho/fat.h 中找到:

struct fat_arch {
    int32_t     cputype;    /* cpu specifier (int) */
    int32_t     cpusubtype; /* machine specifier (int) */
    uint32_t    offset;     /* file offset to this object file */
    uint32_t    size;       /* size of this object file */
    uint32_t    align;      /* alignment as a power of 2 */
};

As shown in the above image, there is one fat_arch structure for each embedded Mach-O (slice). The fat_arch structure describes the CPU type and sub type that the slice is compatible with as well as the offset of the slice in the universal (fat) file.
如上图所示,每个嵌入式Mach-O(切片)都有一个 fat_arch 结构。 fat_arch 结构描述了该片所兼容的 CPU 类型和子类型以及该片在通用(FAT)文件中的偏移量。

Lot’s more on CPU types and sub types soon, but generally speaking, the CPU type is the primary architecture or family of the CPU, whereas the sub type is the variant. For example CPUs whose type is ARM, you’ll encounter variants such as ARMv7 and ARMv7s (used in older mobile devices), and ARM64e (now used by Apple in their latest Apple Silicon devices).
很快就会有更多关于 CPU 类型和子类型的信息,但一般来说,CPU 类型是 CPU 的主要架构或系列,而子类型是变体。例如,类型为 ARM 的 CPU,您会遇到 ARMv7 和 ARMv7s (在较旧的移动设备中使用)和 ARM64e (现在由Apple 在其最新的 Apple Silicon 设备中)。

Let’s now look at a snippet of code that, given a pointer to a universal (fat) header, uses the NXFindBestFatArch to find and return the best slice:
现在让我们看一段代码,给定一个指向通用(胖)标头的指针,使用 NXFindBestFatArch 查找并返回最佳切片:

 1struct fat_arch* parseFat(struct fat_header* header)
 2{
 3    //current fat_arch
 4    struct fat_arch* currentArch = NULL;
 5    
 6    //local architecture
 7    const NXArchInfo *localArch = NULL;
 8    
 9    //best matching slice
10    struct fat_arch *bestSlice = NULL;
11    
12    //get local architecture
13    localArch = NXGetLocalArchInfo();
14
15    //swap?
16    if(FAT_CIGAM == header->magic)
17    {
18        //swap fat header
19        swap_fat_header(header, localArch->byteorder);
20        
21        //swap (all) fat arch
22        swap_fat_arch((struct fat_arch*)((unsigned char*)header 
23         + sizeof(struct fat_header)), header->nfat_arch, localArch->byteorder);
24    }
25    
26    printf("Fat header\n");
27    printf("fat_magic: %#x\n", header->magic);
28    printf("nfat_arch: %x\n",  header->nfat_arch);
29    
30    //first arch, starts right after fat_header
31    currentArch = (struct fat_arch*)((unsigned char*)header + sizeof(struct fat_header));
32    
33    //get best slice
34    bestSlice = NXFindBestFatArch(localArch->cputype, 
35                                  localArch->cpusubtype, arch, header->nfat_arch);
36    
37    return bestSlice;
38} 

In a nutshell this code first gets the current system’s local architecture via the NXGetLocalArchInfo API. Then (after accounting for any changes in endianness), it invokes the aforementioned NXFindBestFatArch API. You can see we pass it the system’s local architecture CPU type and sub type, as well as the pointer to the current (first) fat_arch, as well as the total number of fat_arch structures.
简而言之,这段代码首先通过 NXGetLocalArchInfo API 获取当前系统的本地架构。然后(在考虑字节序的任何变化之后),它调用前面提到的 NXFindBestFatArch API。你可以看到我们向它传递了系统的本地架构CPU类型和子类型,以及指向当前(第一个) fat_arch 的指针,以及 fat_arch 结构的总数。

The NXFindBestFatArch API will iterate over all slices and return a pointer to the fat_arch, that describes the most compatible embedded Mach-O. In other words, the one the loader will select and execute on the system when the universal binary is run. Recalling that each fat_arch structure contains an offset the slice it describes, we have all the information needed to go off an scan select Mach-O slice:
NXFindBestFatArch API 将迭代所有切片并返回指向 fat_arch 的指针,该指针描述最兼容的嵌入式 Mach-O。换句话说,当通用二进制文件运行时,加载程序将选择并在系统上执行。回想一下每个 fat_arch 结构都包含它所描述的切片的偏移量,我们拥有扫描选择 Mach-O 切片所需的所有信息:

 1//read in binary
 2NSData* data = [NSData dataWithContentsOfFile:<path to some file>];
 3    
 4//typecast
 5struct fat_header*fatHeader = (struct fat_header*)data.bytes;
 6
 7//find best architecture/slice
 8struct fat_arch* bestArch = parseFat(fatHeader);
 9
10//init pointer to best slice
11struct mach_header_64* machoHeader = (struct mach_header_64*)(data.bytes + bestArch->offset);
12
13//go scan the Mach-O

…as is well an good, right?
……这也是一件好事,对吧?

The New macho_* APIs 新的 macho_* API

In recent versions of macOS Apple decided to deprecate the NXFindBestFatArch API:
在最新版本的 macOS 中,Apple 决定弃用 NXFindBestFatArch API:

extern struct fat_arch *NXFindBestFatArch(cpu_type_t cputype,
                      cpu_subtype_t cpusubtype,
                      struct fat_arch *fat_archs,
                      uint32_t nfat_archs) __CCTOOLS_DEPRECATED_MSG("use macho_best_slice()");

As we can see in the function definition, they now recommend using macho_best_slice, which is declared in mach-o/utils.h. Here you’ll also find other new Mach-O APIs such as macho_for_each_slice and macho_arch_name_for_mach_header.
正如我们在函数定义中看到的,他们现在建议使用 macho_best_slice ,它在 mach-o/utils.h 中声明。在这里您还可以找到其他新的 Mach-O API,例如 macho_for_each_slice 和 macho_arch_name_for_mach_header 。

Unfortunately there isn’t much info on the macho_best_slice API:
不幸的是,关于 macho_best_slice API 的信息不多:

Apple Gets an 'F' for Slicing Apples

The macho_best_slice API is Little Known
macho_best_slice API 鲜为人知

…so let’s explore this API more!
…所以让我们更多地探索这个 API!

At first blush macho_best_slice appears to be a big improvement over NXFindBestFatArch, as it abstracts away a lot of steps and low-level details, such having to load the file into memory yourself, dealing with the endianness of the headers, and looking up the current system architecture in the first place.
乍一看, macho_best_slice 似乎比 NXFindBestFatArch 有了很大的改进,因为它抽象出了许多步骤和低级细节,例如必须自己将文件加载到内存中,处理与标头的字节顺序,并首先查找当前的系统架构。

So now in theory you should just be able to call:
所以现在理论上你应该能够调用:

1macho_best_slice(<path to some file>, ^(const struct mach_header* _Nonnull slice, uint64_t sliceFileOffset, size_t sliceSize) {
2        
3        printf("best architecture\n");
4        printf("offset: %llu (%#llx)\n", sliceFileOffset, sliceFileOffset);
5        printf("size: %zu (%#zx)\n", sliceSize, sliceSize);
6            
7});

Ah, one function call to get a pointer to Mach-O header, file offset and size of best slice? Amazing.
啊,一个函数调用来获取指向 Mach-O 标头、文件偏移量和最佳切片大小的指针?惊人的。

Ok, well I hate to rain on the parade, but the macho_best_slice is broken, for everybody except Apple 🤬
好吧,我讨厌在游行中下雨,但是 macho_best_slice 坏了,除了 Apple 之外的所有人都这样 🤬

What’s wrong with macho_best_slice?
macho_best_slice 有什么问题吗?

Following Apple’s directives I updated my code, swapping out the now deprecate NXFindBestFatArch API for the shiny new macho_best_slice.
按照 Apple 的指示,我更新了代码,将现已弃用的 NXFindBestFatArch API 替换为闪亮的新 macho_best_slice 。

Apparently I test my code more than Apple (which let’s not forget is one of the most valuable companies in the world) …and immediately noticed an perplexing issue.
显然,我测试代码的次数比苹果公司(别忘了苹果公司是世界上最有价值的公司之一)还要多……并且立即注意到一个令人困惑的问题。

Many of my calls to macho_best_slice would return with an error …even though I was 100% sure the path I specified definitely was universal binary containing a compatible (runnable) Mach-O. 🤔
我对 macho_best_slice 的许多调用都会返回错误……即使我 100% 确定我指定的路径肯定是包含兼容(可运行)Mach-O 的通用二进制文件。 🤔

First, here’s the code I’m using to replicate the issue:
首先,这是我用来复制问题的代码:

 1#import <mach-o/utils.h>
 2#import <Foundation/Foundation.h>
 3
 4int main(int argc, const char * argv[]) {
 5    
 6    int result = macho_best_slice(argv[1], 
 7    ^(const struct mach_header* _Nonnull slice, uint64_t sliceFileOffset, size_t sliceSize) {
 8        
 9        printf("Best architecture\n");
10        printf(" Name: %s\n\n", macho_arch_name_for_mach_header(slice));
11        printf(" Size: %zu (%#zx)\n", sliceSize, sliceSize);
12        printf(" Offset: %llu (%#llx)\n", sliceFileOffset, sliceFileOffset);
13        
14    });
15    
16    if(0 != result)
17    {
18        printf("ERROR: macho_best_slice failed with %d/%#x\n", result, result);
19    }
20    
21    return 0;
22}

…it’s super simple, ya? Basically we just call the macho_best_slice API with a user-specified file, and for universal binaries that contain a compatible Mach-O slice, the callback block should be invoked.
……这非常简单,是吗?基本上,我们只是使用用户指定的文件调用 macho_best_slice API,对于包含兼容的 Mach-O 切片的通用二进制文件,应该调用回调块。

Let’s compile and run it and show that it does work …sometimes (largely to rule out I’m doing something totally wrong). We’ll use macOS’s Calculator app, which is a universal binary containing Intel and Arm slices:
让我们编译并运行它,并证明它有时确实有效(主要是为了排除我做错了什么)。我们将使用 macOS 的计算器应用程序,它是包含 Intel 和 Arm 切片的通用二进制文件:

% file /System/Applications/Calculator.app/Contents/MacOS/Calculator 
Calculator: Mach-O universal binary with 2 architectures: 
[x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]

Calculator (for architecture x86_64):    Mach-O 64-bit executable x86_64
Calculator (for architecture arm64e):    Mach-O 64-bit executable arm64e

My MacBook has an Apple Silicon chip, thus runs a version of macOS compiled for arm64:
我的 MacBook 有一个 Apple Silicon 芯片,因此运行为 arm64 编译的 macOS 版本:

% sw_vers
ProductName:        macOS
ProductVersion:     14.3.1
BuildVersion:       23D60

% uname -a
Darwin Patricks-MacBook-Pro.local 23.3.0 Darwin Kernel Version 23.3.0: Wed Dec 20 21:31:00 PST 2023; root:xnu-10002.81.5~7/RELEASE_ARM64_T6020 arm64

…which means when Calculator is run, the loader should select and execute the arm64e slice. Looking at Activity Monitor you can see the ‘Kind’ column is set to “Apple”, meaning yes, the Apple Silicon slice was indeed, as expected, selected and running.
…这意味着当计算器运行时,加载器应该选择并执行 arm64e 切片。查看 Activity Monitor,您可以看到“Kind”列设置为“Apple”,这意味着是的,Apple Silicon 切片确实如预期那样被选中并正在运行。

Apple Gets an 'F' for Slicing Apples

On an Apple Silicon system, Calculator’s Apple Silicon slice is executed
在 Apple Silicon 系统上,执行计算器的 Apple Silicon 切片

Also the macho_best_slice correctly locates and returns this (arm64e) slice to us:
macho_best_slice 也正确定位并返回这个 ( arm64e ) 切片给我们:

% ./getBestSlice /System/Applications/Calculator.app/Contents/MacOS/Calculator
Best architecture:
 Name: arm64e
 Size: 262304 (0x400a0)
 Offset: 278528 (0x44000)

However, if we (re)run the code on LuLu it fails:
然而,如果我们在 LuLu 上(重新)运行代码,它就会失败:

% ./getBestSlice LuLu.app/Contents/MacOS/LuLu                   
ERROR: macho_best_slice failed with 86/0x56

The error, (86 or 0x56) maps to EBADARCH. According to Apple this means there is a “bad CPU type in executable” and that, though the specified path exists and is mach-o or fat binary, none of the slices are loadable.
错误( 86 或 0x56 )映射到 EBADARCH 。根据 Apple 的说法,这意味着“可执行文件中存在错误的 CPU 类型”,并且尽管指定的路径存在并且是 mach-o 或 fat 二进制文件,但没有一个切片是可加载的。

…this is (very) strange as I personally compiled LuLu with native arm64 support, and it does run quite happily on my and everybody else’s Apple Silicon systems!
…这(非常)奇怪,因为我亲自编译了具有本机 arm64 支持的 LuLu,并且它确实在我和其他人的 Apple Silicon 系统上运行得非常愉快!

So what’s the issue? We’ll get to it shortly, but first I noticed that the macho_best_slice did function as expected when passed any Apple binary …but always failed with EBADARCH for any 3rd-party binary. So what’s the difference? As you might have noticed in the file output for Calculator all Apple binaries have an architecture type of arm64e. On the other hand, 3rd-party binaries will just plain old arm64. As we’ll see this subtle difference ultimately triggers the bug in macho_best_slice.
那么问题出在哪里呢?我们很快就会谈到它,但首先我注意到 macho_best_slice 在传递任何 Apple 二进制文件时确实按预期运行……但对于任何 3 rd b2> -方二进制文件。那么有什么区别呢?您可能已经注意到,在计算器的 file 输出中,所有 Apple 二进制文件的架构类型均为 arm64e 。另一方面,3 个 rd 方二进制文件将只是普通的旧 arm64 。正如我们将看到的,这种细微的差异最终会触发 macho_best_slice 中的错误。

arm64e is an extended/enhanced (hence the ‘e’) version of ARM used by Apple that supports additional features such as pointer authentication (PAC).
arm64e 是 Apple 使用的 ARM 的扩展/增强(因此称为“e”)版本,支持指针身份验证 (PAC) 等附加功能。

Reversing macho_best_slice 倒车 macho_best_slice

Though the implementation of macho_best_slice is open-source, (you can view it here) getting to the bottom of a bug, I like to follow along in a disassembler and debugger.
尽管 macho_best_slice 的实现是开源的(您可以在此处查看)深入了解错误的根源,但我喜欢在反汇编器和调试器中进行跟踪。

In a debugger, we can set a breakpoint on macho_best_slice. Then (re)executing my simple code, we see that once the breakpoint is hit, we’re at the function within libdyld:
在调试器中,我们可以在 macho_best_slice 上设置断点。然后(重新)执行我的简单代码,我们看到一旦命中断点,我们就到达了 libdyld 中的函数:

% lldb getBestSlice LuLu.app/Contents/MacOS/LuLu     
...

(lldb) settings set -- target.run-args  "LuLu.app/Contents/MacOS/LuLu"
(lldb) b macho_best_slice
Breakpoint 1: where = libdyld.dylib`macho_best_slice, address = 0x00000001804643fc

(lldb) r
Process 23832 launched: 'getBestSlice' (arm64)
Process 23832 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000189dfc3fc libdyld.dylib`macho_best_slice
libdyld.dylib`macho_best_slice:
->  0x189dfc3fc <+0>:  pacibsp 

This library’s path has traditionally been /usr/lib/system/libdyld.dylib, though now its found in the dyld cache.
该库的路径传统上是 /usr/lib/system/libdyld.dylib ,但现在可以在 dyld 缓存中找到。

Loading up libdyld.dylib and looking at macho_best_slice’s decompilation, we can see it just opens the specified file and then invokes _macho_best_slice_in_fd:
加载 libdyld.dylib 并查看 macho_best_slice 的反编译,我们可以看到它只是打开指定的文件,然后调用 _macho_best_slice_in_fd :

1int _macho_best_slice(int arg0, int arg1) {
2    ...
3    r0 = open(arg0, 0x0);
4    if (r0 != -0x1) {
5            _macho_best_slice_in_fd(r0, r19);
6            close(r0);
7    ...
8    return r0;
9}

…this, as expected matches its source code:
…正如预期的那样,它与其源代码相匹配:

 1int macho_best_slice(const char* path, void (^bestSlice)(const struct mach_header* slice, uint64_t sliceFileOffset, size_t sliceSize)__MACHO_NOESCAPE)
 2{
 3    int fd = ::open(path, O_RDONLY, 0);
 4    if ( fd == -1 )
 5        return errno;
 6
 7    int result = macho_best_slice_in_fd(fd, bestSlice);
 8    ::close(fd);
 9
10    return result;
11}

The _macho_best_slice_in_fd function invokes various dyld3 functions, before invoking macho_best_slice_fd_internal:
_macho_best_slice_in_fd 函数在调用 macho_best_slice_fd_internal 之前调用各种 dyld3 函数:

1int macho_best_slice_in_fd(int fd, void (^bestSlice)(const struct mach_header* slice, uint64_t sliceFileOffset, size_t sliceSize)__MACHO_NOESCAPE)
2{
3    const Platform     platform    = MachOFile::currentPlatform();
4    const GradedArchs* launchArchs = &GradedArchs::forCurrentOS(false, false);
5    const GradedArchs* dylibArchs  = &GradedArchs::forCurrentOS(false, false);
6    ...
7
8    return macho_best_slice_fd_internal(fd, platform, *launchArchs, *dylibArchs, false, bestSlice);
9}

As dyld is open source we can take a look at the dyld functions such as GradedArchs::forCurrentOS.
由于 dyld 是开源的,我们可以看一下 dyld 函数,例如 GradedArchs::forCurrentOS 。

 1const GradedArchs& GradedArchs::forCurrentOS(bool keysOff, bool osBinariesOnly)
 2{
 3#if __arm64e__
 4    if ( osBinariesOnly )
 5        return (keysOff ? arm64e_keysoff_pb : arm64e_pb);
 6    else
 7        return (keysOff ? arm64e_keysoff : arm64e);
 8#elif __ARM64_ARCH_8_32__
 9    return arm64_32;
10#elif __arm64__
11    return arm64;
12#elif __ARM_ARCH_7K__
13    return armv7k;
14#elif __ARM_ARCH_7S__
15    return armv7s;
16#elif __ARM_ARCH_7A__
17    return armv7;
18#elif __x86_64__
19    return isHaswell() ? x86_64h : x86_64;
20#elif __i386__
21    return i386;
22#else
23    #error unknown platform
24#endif
25}

As we’re executing on an arm64e system and GradedArchs::forCurrentOS is invoked with false for both its arguments, this method will return arm64e. We can confirm this in a debugger, by printing out the return value (found in the x0 register):
当我们在 arm64e 系统上执行并且 GradedArchs::forCurrentOS 是通过 false 调用其两个参数时,此方法将返回 arm64e 。我们可以在调试器中通过打印返回值(在 x0 寄存器中找到)来确认这一点:

% lldb getBestSlice LuLu.app/Contents/MacOS/LuLu     
...

* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000189df3240 libdyld.dylib`dyld3::GradedArchs::forCurrentOS(bool, bool)
libdyld.dylib`dyld3::GradedArchs::forCurrentOS:
->  0x189df3240 <+0>:  adrp   x8, 13
    
(lldb) finish
Process 27157 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step out
    frame #0: 0x0000000189dfc490 libdyld.dylib`macho_best_slice_in_fd + 48
libdyld.dylib`macho_best_slice_in_fd:
->  0x189dfc490 <+48>: mov    x22, x0
    
(lldb) reg read $x0
x0 = 0x0000000189e00a88 libdyld.dylib`dyld3::GradedArchs::arm64e

As we can see, this is a variable of type const GradedArchs that look in the source code, we can see been set in the following manner: GradedArchs::arm64e = GradedArchs({GRADE_arm64e, 1});GRADE_arm64e is #defined as such: CPU_TYPE_ARM64, CPU_SUBTYPE_ARM64E, false
我们可以看到,这是一个 const GradedArchs 类型的变量,在源代码中我们可以看到它是通过以下方式设置的: GradedArchs::arm64e = GradedArchs({GRADE_arm64e, 1}); 。 GRADE_arm64e 的#define 如下: CPU_TYPE_ARM64, CPU_SUBTYPE_ARM64E, false

Back to the macho_best_slice_fd_internal function, we see it contains the core logic to identify the best slice. It’s a long function that you can view in its entirety here. However, in the context of tracking down this bug, much of it is not relevant …but let’s still walk thru relevant parts.
回到 macho_best_slice_fd_internal 函数,我们看到它包含识别最佳切片的核心逻辑。这是一个很长的函数,您可以在此处查看完整内容。然而,在追踪这个错误的背景下,其中大部分内容都是不相关的……但我们仍然浏览相关部分。

After mapping in the file its examining to find the best slice, it calls the dyld3::FatFile::isFatFile function to make sure it really looking at a universal file. You can take a look at this open-source function if you’re interested. Not to surprising it simply checks for the FAT magic values FAT_MAGIC and FAT_MAGIC_64.
在文件中映射后,它会检查找到最佳切片,然后调用 dyld3::FatFile::isFatFile 函数以确保它确实查看通用文件。如果您有兴趣,可以看一下这个开源函数。毫不奇怪,它只是检查 FAT 魔法值 FAT_MAGIC 和 FAT_MAGIC_64 。

As the binary we passed, LuLu, is a universal binary, the isFatFile method, as expected returns a non-zero value (the address of where the universal binary has been mapped into memory):
由于我们传递的二进制文件 LuLu 是通用二进制文件,因此 isFatFile 方法按预期返回一个非零值(通用二进制文件映射到内存的地址):

(lldb) x/i $pc
    0x189dfc558: bl     0x189df2e20        ; dyld3::FatFile::isFatFile(void const*)
->  0x189dfc55c  cbz    x0, 0x189dfc65c           
Target 0: (getBestSlice) stopped.

(lldb) reg read $x0
x0 = 0x0000000100010000

Moving on, it then makes use of the dyld3::FatFile::forEachSlice function iterate over each embedded slice. This function takes a callback block to invoke for each slice. The block passed by macho_best_slice_fd_internal to dyld3::FatFile::forEachSlice performs the following for each slice:
接下来,它使用 dyld3::FatFile::forEachSlice 函数迭代每个嵌入的切片。该函数需要一个回调块来为每个切片调用。由 macho_best_slice_fd_internal 传递到 dyld3::FatFile::forEachSlice 的块对每个切片执行以下操作:

  • Invokes dyld3::MachOFile::isMachO 调用 dyld3::MachOFile::isMachO
  • Invokes dyld3::GradedArchs::grade 调用 dyld3::GradedArchs::grade
 1ff->forEachSlice(diag, statbuf.st_size, ^(uint32_t sliceCpuType, uint32_t sliceCpuSubType, const void* sliceStart, uint64_t sliceSize, bool& stop) {
 2    if ( const MachOFile* mf = MachOFile::isMachO(sliceStart) ) {
 3        if ( mf->filetype == MH_EXECUTE ) {
 4            int sliceGrade = launchArchs.grade(mf->cputype, mf->cpusubtype, isOSBinary);
 5            if ( (sliceGrade > bestGrade) && launchableOnCurrentPlatform(mf) ) {
 6                sliceOffset = (char*)sliceStart - (char*)mappedFile;
 7                sliceLen    = sliceSize;
 8                bestGrade   = sliceGrade;
 9            }
10        }
11        ...

As these functions are part of dyld they are both open-source. The isMachO method performs a few basic sanity checks such as looking for Mach-O magic values in order to ascertain that the current slice is a really Mach-O. The grade function is more interesting, and we’ll dive into it shorty, but basically it checks if the specified CPU type and sub type (for example for a Mach-O slice) is compatible with the current system.
由于这些函数是 dyld 的一部分,因此它们都是开源的。 isMachO 方法执行一些基本的健全性检查,例如查找 Mach-O 魔法值,以确定当前切片是否真正是 Mach-O。 grade 函数更有趣,我们将简短地深入研究它,但基本上它会检查指定的 CPU 类型和子类型(例如 Mach-O 切片)是否与当前系统兼容。

If none of the slices pass the isMachO method and “grade” check, the macho_best_slice_fd_internal sets a return variable to EBADARCH (0x56), as shown in the following disassembly:
如果没有切片通过 isMachO 方法和“grade”检查,则 macho_best_slice_fd_internal 将返回变量设置为 EBADARCH (0x56),如下所示拆卸:

10x189dfc61c  cbz  w8, loc_189dfc700
2...
3
40x189dfc700  mov   w21, #0x56   ;EBADARCH
5...
6
70x189dfc748  mov        x0, x21
8...
90x189dfc764  retab

We can also see this source:
我们还可以看到这个来源:

1if ( bestGrade != 0 ) {
2            if ( bestSlice )
3                bestSlice((MachOFile*)((char*)mappedFile + sliceOffset), (size_t)sliceOffset, (size_t)sliceLen);
4}
5else
6    result = EBADARCH;

This is eventually propagated back to macho_best_slice which is then returned to the caller. This explains where the error code that our program printed out is coming from. But we still don’t know why.
这最终会传播回 macho_best_slice ,然后返回给调用者。这解释了我们的程序打印出的错误代码是从哪里来的。但我们仍然不知道为什么。

Recall we’ve invoked macho_best_slice on LuLu’s universal binary that contains two slices. Each slice passes the isMachO check, and the first slice (the x86_64 Intel Mach-O) as expected fails the grade method …as well, Intel binaries aren’t natively compatible on Apple Silicon systems. So far, so good. However, the 2nd slice is a arm64 Mach-O which is definitely compatible …yet dyld3::GradedArchs::grade still returns 0 (false/fail). WTF!?
回想一下,我们在包含两个切片的 LuLu 通用二进制文件上调用了 macho_best_slice 。每个片都通过 isMachO 检查,而第一个片( x86_64 Intel Mach-O)正如预期的那样未能通过 grade 方法……同样,Intel 二进制文件本身不兼容 Apple Silicon 系统。到目前为止,一切都很好。然而,2 nd 切片是一个 arm64 Mach-O,它绝对兼容……但 dyld3::GradedArchs::grade 仍然返回 0(假/失败)。卧槽!?

Looking at the source code for the dyld3::GradedArchs::grade method (found in the dyld MachOFile.cpp file), ultimately gives us insight in what is going wrong.
查看 dyld3::GradedArchs::grade 方法的源代码(在 dyld MachOFile.cpp 文件中找到),最终让我们了解出了什么问题。

 1int GradedArchs::grade(uint32_t cputype, uint32_t cpusubtype, bool isOSBinary) const
 2{
 3    for (const CpuGrade* p = _orderedCpuTypes; p->type != 0; ++p) {
 4        if ( (p->type == cputype) && (p->subtype == (cpusubtype & ~CPU_SUBTYPE_MASK)) ) {
 5            if ( p->osBinary ) {
 6                if ( isOSBinary )
 7                    return p->grade;
 8            }
 9            else {
10                return p->grade;
11            }
12        }
13    }
14    return 0;
15}

From the code you can see that given a CPU type and sub type it checks these against the values in a (GradedArchs) class array named _orderedCpuTypes. If there is match a “grade” is returned, otherwise 0 (fail).
从代码中您可以看到,给定一个 CPU 类型和子类型,它会根据名为 _orderedCpuTypes 的 ( GradedArchs ) 类数组中的值来检查这些类型和子类型。如果匹配,则返回“等级”,否则返回 0(失败)。

From the source code we know this grade method was invoked via the launchArchs object:
从源代码中我们知道这个 grade 方法是通过 launchArchs 对象调用的:

1const GradedArchs* launchArchs = &GradedArchs::forCurrentOS(false, false);
2
3...
4int sliceGrade = launchArchs.grade(mf->cputype, mf->cpusubtype, isOSBinary);

Recall the call to GradedArchs::forCurrentOS returned and object initialize via GradedArchs({GRADE_arm64e, 1}) (where GRADE_arm64e is set to CPU_TYPE_ARM64, CPU_SUBTYPE_ARM64E, false
回想一下对 GradedArchs::forCurrentOS 的调用返回并通过 GradedArchs({GRADE_arm64e, 1}) 初始化对象(其中 GRADE_arm64e 设置为 CPU_TYPE_ARM64, CPU_SUBTYPE_ARM64E, false

Both universal and Mach-O binaries (either stand-alone or slices) contain a header that contains the CPU type and sub type that they are compatible with. We already saw this in the fat_arch structure that describes each embedded Mach-O slice. However you’ll also find the (same) CPU information in the Mach-O header whose type is mach_header (define in macho/loader.h):
通用和 Mach-O 二进制文件(独立或切片)都包含一个标头,其中包含它们兼容的 CPU 类型和子类型。我们已经在描述每个嵌入式 Mach-O 切片的 fat_arch 结构中看到了这一点。但是,您还会在类型为 mach_header 的 Mach-O 标头中找到(相同的)CPU 信息(在 macho/loader.h 中定义):

struct mach_header {
    uint32_t    magic;      /* mach magic number identifier */
    int32_t     cputype;    /* cpu specifier */
    int32_t     cpusubtype; /* machine specifier */
    uint32_t    filetype;   /* type of file */
    uint32_t    ncmds;      /* number of load commands */
    uint32_t    sizeofcmds; /* the size of all the load commands */
    uint32_t    flags;      /* flags */
};

As we noted, the dyld3::GradedArchs::grade function takes this CPU information for each embedded slice and checks to see if its compatible with the CPU of the current system. Logically, this makes total sense.
正如我们所指出的, dyld3::GradedArchs::grade 函数获取每个嵌入式片的 CPU 信息,并检查其是否与当前系统的 CPU 兼容。从逻辑上讲,这是完全有道理的。

But of course the devil is the details …and details are best observed in a debugger. Here we’ll focus on the _orderedCpuTypes array that is defined in dyld’s MachOFile.h file, within the GradedArchs class:
但当然,细节是魔鬼……而细节最好在调试器中观察。这里我们将重点关注 GradedArchs 类中 dyld 的 MachOFile.h 文件中定义的 _orderedCpuTypes 数组:

// private:
// should be private, but compiler won't statically initialize static members above
    struct CpuGrade { uint32_t type; uint32_t subtype; bool osBinary; uint16_t grade; };
    const CpuGrade    _orderedCpuTypes[3];  // zero terminated
};

Back to the debugger. First we set a breakpoint on dyld3::GradedArchs::grade. In the case of LuLu, which recall has two embedded slices, this is called twice.
回到调试器。首先我们在 dyld3::GradedArchs::grade 上设置断点。在 LuLu 的情况下,召回有两个嵌入切片,这被调用两次。

% lldb getBestSlice LuLu.app/Contents/MacOS/LuLu     
...

* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 13.1
    frame #0: 0x0000000189df31ec libdyld.dylib`dyld3::GradedArchs::grade(unsigned int, unsigned int, bool) const
libdyld.dylib`dyld3::GradedArchs::grade:

(lldb) reg read $w1
      w1 = 0x01000007
(lldb) reg read $w2
      w2 = 0x00000003

As grade is a C++ method, the first argument is the class object pointer, here a GradedArchs. This means the CPU type and sub type can be found in the 2nd and 3rd arguments respectively, which on arm64 will be in the x1 and x2 registers respectively. However since the grade method declares these arguments as uint32_ts they’ll actually be in w1 and w2 (which represent the lower 32-bits of the 64-bit x registers).
由于 grade 是一个 C++ 方法,因此第一个参数是类对象指针,这里是 GradedArchs 。这意味着CPU类型和子类型可以分别在2个 nd 和3个 rd 参数中找到,在 arm64 上将在 x1 分别寄存器。然而,由于 grade 方法将这些参数声明为 uint32_t ,因此它们实际上位于 w1 和 w2 (代表较低的 32 位)中。 -64 位 x 寄存器的位)。

In the debugger output you can see these values are 0x01000007 and 0x00000003.
在调试器输出中,您可以看到这些值是 0x01000007 和 0x00000003 。

Using otool to dump LuLu’s universal header we see:
使用 otool 转储 LuLu 的通用标头,我们看到:

otool -h LuLu.app/Contents/MacOS/LuLu 
LuLu (architecture x86_64):
Mach header
      magic  cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
 0xfeedfacf 16777223          3  0x00           2    30       4272 0x00200085

LuLu (architecture arm64):
Mach header
      magic  cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
 0xfeedfacf 16777228          0  0x00           2    30       4432 0x00200085

The first slice has a CPU type of 16777223, or 0x1000007 …which is exactly what we see in the debugger. This value maps to x86_64 that identifies LuLu’s first slice as an Mach-O compiled for 64-bit versions of Intel CPUs. Sub type 3 maps to CPU_SUBTYPE_I386_ALL, which means any Intel CPU.
第一个片的 CPU 类型为 16777223 或 0x1000007 …这正是我们在调试器中看到的。该值映射到 x86_64 ,将 LuLu 的第一个切片标识为针对 64 位版本的 Intel CPU 编译的 Mach-O。子类型 3 映射到 CPU_SUBTYPE_I386_ALL ,这意味着任何 Intel CPU。

While we’re here, lets look at the otool output for the second arm64 slice. Its CPU type is set to 16777228 (or 0x100000C) and sub type 0. This maps to CPU type CPU_TYPE_ARM64 and sub type CPU_SUBTYPE_ARM64_ALL. As its name implies, CPU_SUBTYPE_ARM64_ALL specifies compatibility with all arm64 CPU variants!
在这里,让我们看看第二个 arm64 切片的 otool 输出。其 CPU 类型设置为 16777228 (或 0x100000C )和子类型 0。这映射到 CPU 类型 CPU_TYPE_ARM64 和子类型 CPU_SUBTYPE_ARM64_ALL 。顾名思义, CPU_SUBTYPE_ARM64_ALL 指定与所有arm64 CPU变体的兼容性!

Before we head back to the debugger, here is the annotated disassembly for the dyld3::GradedArchs::grade, which will assist in our debugging endeavors:
在我们返回调试器之前,这里是 dyld3::GradedArchs::grade 的带注释的反汇编,它将有助于我们的调试工作:

 10x189df31ec: mov    x8, #0x0                ; current p index/offset
 20x189df31f0: and    w9, w2, #0xffffff       ; cpusubtype & ~CPU_SUBTYPE_MASK
 30x189df31f4: ldr    w10, [x0, x8]           ; p->type = _orderedCpuTypes[index]->type
 40x189df31f8: cbz    w10, 0x189df322c        ; p->type == 0 ? -> go to done
 50x189df31fc: cmp    w10, w1                 ; p->type == cputype ?
 60x189df3200: b.ne   0x189df3220             ; no match, go to next
 70x189df3204: add    x10, x0, x8             ; p = _orderedCpuTypes[index]  
 80x189df3208: ldr    w11, [x10, #0x4]        ; extract p->subtype
 90x189df320c: cmp    w11, w9                 ; p->subtype == cpusubtype ?
100x189df3210: b.ne   0x189df3220             ; no match, go to next
110x189df3214: ldrb   w10, [x10, #0x8]        ; extract p->osBinary
120x189df3218: cbz    w10, 0x189df3234        ; !p->osBinary go to leave, returning p->grade 
130x189df321c: tbnz   w3, #0x0, 0x189df3234   ; !isOSBinary arg go to leave, returning p->grade 
140x189df3220: add    x8, x8, #0xc            ; p index++
150x189df3224: cmp    x8, #0x30               ; p index < 0x30
160x189df3228: b.ne   0x189df31f4             ; next          
170x189df322c: mov    w0, #0x0                ; return 0
180x189df3230: ret  
19
200x189df3234: add    x8, x0, x8              ; p = _orderedCpuTypes[index]  
210x189df3238: ldrh   w0, [x8, #0xa]          ; return p->grade 
220x189df323c: ret

The main takeaway is the _orderedCpuTypes array can be found at x0 (which makes sense as it’s an instance variable of the GradedArchs class).
主要要点是 _orderedCpuTypes 数组可以在 x0 中找到(这是有道理的,因为它是 GradedArchs 类的实例变量)。

Want to learn more about reversing/understanding Arm disassembly?
想要了解更多关于逆向/理解 Arm 拆卸的信息?

Grab Maria Markstedter’s excellent “Blue Fox: Arm Assembly Internals and Reverse Engineering” book.
阅读 Maria Markstedter 出色的《Blue Fox:手臂装配内部结构和逆向工程》一书。

From the definition of the _orderedCpuTypes array (in dyld’s MachOFile.h) we know it contains at maximum 3 CpuGrade structures. Also as we have the definition of the CpuGrade structure we know its size (0xC) and members (the first two being the CPU type and sub type, as 32-bit values).
从 _orderedCpuTypes 数组的定义(在 dyld 的 MachOFile.h 中)我们知道它最多包含 3 个 CpuGrade 结构。另外,由于我们有了 CpuGrade 结构的定义,所以我们知道它的大小 (0xC) 和成员(前两个是 CPU 类型和子类型,为 32 位值)。

Finally from the source code and disassembly, know that the _orderedCpuTypes will be zero terminated. We can see this in the source code as the loop termination condition: p->type != 0; while in the disassemble: ldr w10, [x0, x8] / cbz w10, 0x189df322c.
最后从源代码和反汇编得知, _orderedCpuTypes 将以零终止。我们可以在源代码中看到这是循环终止条件: p->type != 0; 而在反汇编中: ldr w10, [x0, x8] / cbz w10, 0x189df322c 。

This is all pertinent as we can then dump _orderedCpuTypes array to see what CPUs types/sub types it contains:
这都是相关的,因为我们可以转储 _orderedCpuTypes 数组来查看它包含哪些 CPU 类型/子类型:

(lldb) x/wx $x0
0x0100000c 0x00000002 0x00010000 
0x00000000 0x00000000 0x00000000 
0x00000000 0x00000000 0x00000000 
0x00000000 0x00000000 0x00000000

The first (and only entry) contains CPU type: 0x0100000c (CPU_TYPE_ARM64) and sub type (CPU_SUBTYPE_ARM64E). This matches what was returned by GradedArchs::forCurrentOS (i.e. GRADE_arm64e).
第一个(也是唯一的条目)包含 CPU 类型:0x0100000c ( CPU_TYPE_ARM64 ) 和子类型 ( CPU_SUBTYPE_ARM64E )。这与 GradedArchs::forCurrentOS 返回的内容匹配(即 GRADE_arm64e )。

Back to the debugger, we’ll ignore the first call to dyld3::GradedArchs::grade as it makes sense that the first, (the x86_64 slice), will receive a failing grade on an Apple Silicon system, as Intel code is not natively compatible.
回到调试器,我们将忽略对 dyld3::GradedArchs::grade 的第一次调用,因为第一个( x86_64 切片)将在 Apple Silicon 系统上收到不及格的等级,因为英特尔代码本身不兼容。

When dyld3::GradedArchs::grade is called again, we can confirm its to grade LuLu’s arm64 slice, which we know is compatible and thus should clearly receive a passing grade:
当再次调用 dyld3::GradedArchs::grade 时,我们可以确认它对 LuLu 的 arm64 切片进行评分,我们知道它是兼容的,因此应该明确获得及格评分:

% lldb getBestSlice LuLu.app/Contents/MacOS/LuLu     
...

* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 13.1
    frame #0: 0x0000000189df31ec libdyld.dylib`dyld3::GradedArchs::grade(unsigned int, unsigned int, bool) const
libdyld.dylib`dyld3::GradedArchs::grade:

(lldb) reg read $w1
      w1 = 0x0100000c
(lldb) reg read $w2
      w2 = 0x00000000

Recall CPU_TYPE_ARM64 is 0x100000c and CPU_SUBTYPE_ARM64_ALL is 0x0, which are the values that the grade method is invoked with the 2nd time.
回想一下 CPU_TYPE_ARM64 是 0x100000c 和 CPU_SUBTYPE_ARM64_ALL 是 0x0 ,它们是调用 grade 方法时使用的值2 nd 时间。

Knowing that LuLu’s arm64 slice has CPU_TYPE_ARM64 (0x100000c) and CPU_SUBTYPE_ARM64_ALL (0x0) and that the _orderedCpuTypes array only an entry for CPU_TYPE_ARM64 (0x100000c) and CPU_SUBTYPE_ARM64E (0x2), you can see that though the dyld3::GradedArchs::grade function will see that the CPU type CPU_TYPE_ARM64 is a match, there will be a sub type mismatch as when compared directly CPU_SUBTYPE_ARM64_ALL (0x0) != CPU_SUBTYPE_ARM64E (0x2).
知道 LuLu 的 arm64 切片具有 CPU_TYPE_ARM64 ( 0x100000c ) 和 CPU_SUBTYPE_ARM64_ALL ( 0x0 ) 并且 _orderedCpuTypes 数组仅包含 CPU_TYPE_ARM64 ( 0x100000c ) 和 CPU_SUBTYPE_ARM64E ( 0x2 ) 的条目,您可以看到,尽管 dyld3::GradedArchs::grade 函数将看到 CPU 类型 CPU_TYPE_ARM64 匹配,直接比较 CPU_SUBTYPE_ARM64_ALL 时会出现子类型不匹配的情况 ( 0x0 ) != CPU_SUBTYPE_ARM64E ( 0x2 )。

We noted that as its name implies, CPU_SUBTYPE_ARM64_ALL specifies compatibility with all arm64 CPU variants. This means that yes, even an arm CPU with a sub type of CPU_SUBTYPE_ARM64E (e.g. Apple Silicon) can execute code whose sub type is CPU_SUBTYPE_ARM64_ALL. Thus LuLu’s 2nd arm64 slice, should be graded as ok, even though it’s sub type is CPU_SUBTYPE_ARM64_ALL.
我们注意到,顾名思义, CPU_SUBTYPE_ARM64_ALL 指定与所有 arm64 CPU 变体的兼容性。这意味着,即使是子类型为 CPU_SUBTYPE_ARM64E 的 Arm CPU(例如 Apple Silicon)也可以执行子类型为 CPU_SUBTYPE_ARM64_ALL 的代码。因此,LuLu 的第二个 arm64 切片应该被评为“ok”,即使它的子类型是 CPU_SUBTYPE_ARM64_ALL 。

It would appear that the loader does not make use of the macho_best_slice API. Why? Well LuLu is launched the loader correctly identifies LuLu’s arm64 slice as compatible with Apple Silicon systems, and executes it appropriately.
加载程序似乎没有使用 macho_best_slice API。为什么? LuLu 启动后,加载程序正确识别 LuLu 的 arm64 切片与 Apple Silicon 系统兼容,并正确执行它。

So (finally) we understand the issue, that, depending on your view point is either:
所以(最终)我们理解了这个问题,根据您的观点,是:

  1. The dyld3::GradedArchs::grade method does not take into the account the nuances of the CPU sub types of type _ALL. Namely that any CPU with a more specific sub type (e.g. CPU_SUBTYPE_ARM64E) can also still execute code compiled with a CPU sub type of *_ALL (e.g. CPU_SUBTYPE_ARM64_ALL).
    dyld3::GradedArchs::grade 方法没有考虑 _ALL 类型的 CPU 子类型的细微差别。也就是说,具有更具体子类型(例如 CPU_SUBTYPE_ARM64E )的任何 CPU 仍然可以执行使用 *_ALL CPU 子类型(例如 CPU_SUBTYPE_ARM64_ALL )编译的代码。

  2. When the GradedArchs “launchArchs” object was initialized it should have been initialized not with just GRADE_arm64e, but also GRADE_arm64 (well also likely GRADE_x86_64 too, as Intel Mach-Os can technically run on macOS thanks to Rosetta’s emulation).
    当 GradedArchs “launchArchs” 对象被初始化时,它不仅应该被初始化为 GRADE_arm64e ,还应该被初始化为 GRADE_arm64 (也可能是 GRADE_x86_64 同样,由于 Rosetta 的仿真,Intel Mach-O 在技术上可以在 macOS 上运行)。

Update: 更新:

After posting this write-up, Marc-Etienne noted that bug is likely the case of the latter, whereas the wrong GradedArchs were used.
在发布这篇文章后,Marc-Etienne 指出该错误很可能是后者的情况,而使用了错误的 GradedArchs。

Moreover he pointed out that Apple likely should have invoked not GradedArchs::forCurrentOS but rather GradedArchs::launchCurrentOS. As we noted, the former returns just GRADE_arm64e, which means arm64 slices are deemed incompatible. We look at the latter, launchCurrentOS, below.
此外,他指出苹果可能不应该调用 GradedArchs::forCurrentOS,而是调用 GradedArchs::launchCurrentOS。正如我们所指出的,前者仅返回 GRADE_arm64e,这意味着 arm64 切片被视为不兼容。我们在下面看看后者,即 launchCurrentOS。

As Marc-Etienne pointed out, the issue is likely that the wrong GradedArchs were passed to macho_best_slice_fd_internal. Recall this incorrect grade was initialized via:
正如 Marc-Etienne 指出的那样,问题很可能是错误的 GradedArchs 传递给了 macho_best_slice_fd_internal 。回想一下这个不正确的成绩是通过以下方式初始化的:


const GradedArchs* launchArchs = &GradedArchs::forCurrentOS(false, false);

…which returns only GRADE_arm64e.
…仅返回 GRADE_arm64e 。

However, if Apple had instead invoked GradedArchs::launchCurrentOS, grades for arm64earm64, and x86_64 would have been returned and our universal would have been processed correctly with the arm64 Mach-O being identified as both valid, and as the best slice.
但是,如果 Apple 改为调用 GradedArchs::launchCurrentOS ,则 arm64e 、 arm64 和 x86_64 的成绩将被返回,并且我们的通用将会已正确处理, arm64 Mach-O 被识别为有效且最佳切片。

Let’s look at the GradedArchs::launchCurrentOS method.
让我们看一下 GradedArchs::launchCurrentOS 方法。

 1const GradedArchs& GradedArchs::launchCurrentOS(const char* simArches)
 2{
 3#if TARGET_OS_SIMULATOR
 4    // on Apple Silicon, there is both an arm64 and an x86_64 (under rosetta) simulators
 5    // You cannot tell if you are running under rosetta, so CoreSimulator sets SIMULATOR_ARCHS
 6    if ( strcmp(simArches, "arm64 x86_64") == 0 )
 7        return launch_AS_Sim;
 8    else
 9        return x86_64;
10#elif TARGET_OS_OSX
11  #if __arm64__
12    return launch_AS;
13  #else
14    return isHaswell() ? launch_Intel_h : launch_Intel;
15  #endif
16#else
17    // all other platforms use same grading for executables as dylibs
18    return forCurrentOS(true, false);
19#endif
20}

You can see, on macOS (TARGET_OS_OSX) which on any Apple Silicon system will have __arm64__ set, the method returns launch_ASlaunch_AS is a GradedArchs containing all three grades:
您可以看到,在任何 Apple Silicon 系统上都会设置 __arm64__ 的 macOS ( TARGET_OS_OSX ) 上,该方法返回 launch_AS 。 launch_AS 是包含所有三个等级的 GradedArchs :

const GradedArchs GradedArchs::launch_AS =  GradedArchs({GRADE_arm64e,  3}, {GRADE_arm64,  2}, {GRADE_x86_64, 1});

One more thing to point out, which is, it now should make sense now why macho_best_slice succeeds for Apple binaries! Such binaries have slices with CPU sub types set to CPU_SUBTYPE_ARM64E and thus check in dyld3::GradedArchs::grade which is passed the CPU type and sub type from (just) GRADE_arm64e succeeds.
还有一件事需要指出,那就是,现在应该可以理解为什么 macho_best_slice 在 Apple 二进制文件中取得成功了!此类二进制文件的 CPU 子类型设置为 CPU_SUBTYPE_ARM64E 的切片,因此签入 dyld3::GradedArchs::grade 会成功地从(仅) GRADE_arm64e 传递 CPU 类型和子类型。

We can confirm this by invoking macho_best_slice on an Apple binary. As expected we can see in debugger that the CPU type passed to the grade method is 0x100000C / CPU_TYPE_ARM64, and the sub type is 0x80000002 which (after masking out the CPU capabilities found in the top two bits) maps to CPU_SUBTYPE_ARM64E (0x2):
我们可以通过在 Apple 二进制文件上调用 macho_best_slice 来确认这一点。正如预期的那样,我们可以在调试器中看到传递给 grade 方法的 CPU 类型是 0x100000C / CPU_TYPE_ARM64 ,子类型是 0x80000002 (屏蔽掉前两位中的 CPU 功能后)映射到 CPU_SUBTYPE_ARM64E ( 0x2 ):

% lldb getBestSlice /System/Applications/Calculator.app/Contents/MacOS/Calculator 
...

* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 13.1
    frame #0: 0x0000000189df31ec libdyld.dylib`dyld3::GradedArchs::grade(unsigned int, unsigned int, bool) const
libdyld.dylib`dyld3::GradedArchs::grade:

(lldb) reg read $w1
      w1 = 0x100000c
(lldb) reg read $w2
      w2 = 0x80000002

…and thus macho_best_slice happily returns and our program is given the right slice:
…因此 macho_best_slice 愉快地返回,并且我们的程序被赋予了正确的切片:

% ./getBestSlice /System/Applications/Calculator.app/Contents/MacOS/Calculator
Best architecture:
 Name: arm64e
 Size: 262304 (0x400a0)
 Offset: 278528 (0x44000)

Conclusion 结论

After observing that the macho_best_slice API was broken for any 3rd party binary, we took a deep dive into its (and also dyld’s) internals to figure out exactly why.
在观察到 macho_best_slice API 对于任何 3 rd 方二进制文件都已损坏后,我们深入研究了其(以及 dyld )的内部结构以找出原因到底为什么。

What we found was that a “grading” algorithm used to identify compatible slices contained a fundamental flaw: It does not take into account the nuances of *_ALL CPU sub types or, was not invoked with the appropriate grades.
我们发现,用于识别兼容切片的“分级”算法存在一个根本缺陷:它没有考虑 *_ALL CPU 子类型的细微差别,或者没有使用适当的等级进行调用。

And while this might seem like a rather unimpactful bug, in the context of malware detection, this is not the case. As we noted at the start, security tools depend on macOS’s APIs to correctly return the correct slice in order to scan binaries for malicious code. And if the system APIs are flawed its possible that malicious binaries can avoid being scanned, and thus will pass unnoticed!
虽然这看起来像是一个影响不大的错误,但在恶意软件检测的背景下,情况并非如此。正如我们在开始时指出的,安全工具依赖 macOS 的 API 来正确返回正确的切片,以便扫描二进制文件中是否存在恶意代码。如果系统 API 有缺陷,恶意二进制文件可能会避免被扫描,从而不会被注意到!

So instead of fighting the efforts of the EU to make the Apple ecosystem more open and user-friendly, maybe Apple should instead make sure its code works in the first place? 🫣
因此,与其对抗欧盟让苹果生态系统更加开放和用户友好的努力,也许苹果应该首先确保其代码有效? 

原文始发于Patrick Wardle:Apple Gets an ‘F’ for Slicing Apples

版权声明:admin 发表于 2024年2月28日 下午3:31。
转载请注明:Apple Gets an ‘F’ for Slicing Apples | CTF导航

相关文章