Smashing the state machine: the true potential of web race conditions

For too long, web race condition attacks have focused on a tiny handful of scenarios. Their true potential has been masked thanks to tricky workflows, missing tooling, and simple network jitter hiding all but the most trivial, obvious examples.

In this paper, I’ll introduce new classes of race condition that go far beyond the limit-overrun exploits you’re probably already familiar with. With these I’ll exploit both multiple high-profile websites and Devise, a popular authentication framework for Rails.
在本文中,我将介绍新的竞争条件类别,这些条件远远超出了您可能已经熟悉的极限溢出漏洞。有了这些,我将利用多个知名网站和 Devise,一个流行的 Rails 身份验证框架。

I’ll also introduce the single-packet attack; a jitter-dodging strategy that can squeeze 30 requests sent from Melbourne to Dublin into a sub-1ms execution window. This paper is accompanied by a full complement of free online labs, so you’ll be able to try out your new skill set immediately.
我还将介绍单数据包攻击;一种抖动规避策略,可以将从墨尔本发送到都柏林的 30 个请求压缩到不到 1 毫秒的执行窗口。本文附有一整套免费在线实验室,因此您可以立即尝试您的新技能。

This research paper accompanies a presentation at Black Hat USA, DEF CON & Nullcon:
这篇研究论文伴随着在Black Hat USA,DEF CON&Nullcon上的演讲:

It is also available in a print/download-friendly PDF format.
它也以易于打印/下载的 PDF 格式提供。

Outline 大纲

Background 背景

Race condition fundamentals

To begin, let’s recap race condition fundamentals. I’ll keep this brief – if you’d prefer an in-depth introduction, check out our new Web Security Academy topic.
首先,让我们回顾一下竞争条件的基本原理。我将保持简短 – 如果您更喜欢深入的介绍,请查看我们新的 Web 安全学院主题。

Most websites handle concurrent requests using multiple threads, all reading and writing from a single, shared database. Application code is rarely crafted with concurrency risks in mind and as a result, race conditions plague the web. Exploits are typically limit-overrun attacks – they use synchronized requests to overcome some kind of limit, for example:
大多数网站使用多个线程处理并发请求,所有线程都从单个共享数据库读取和写入。应用程序代码很少在设计时考虑到并发风险,因此,竞争条件困扰着 Web。漏洞利用通常是限制溢出攻击 – 它们使用同步请求来克服某种限制,例如:

The underlying cause of these is also similar – they all exploit the time-gap between the security check and the protected action. For example, two threads may simultaneously query a database and confirm that the TOP10 discount code hasn’t been applied to the cart, then both attempt to apply the discount, resulting in it being applied twice. You’ll often find these referred to as ‘time of check, time of use’ (TOCTOU) flaws for this reason.
其根本原因也相似 – 它们都利用了安全检查和受保护操作之间的时间间隔。例如,两个线程可以同时查询数据库并确认 TOP10 折扣代码尚未应用于购物车,然后两个线程都尝试应用折扣,导致它被应用两次。由于这个原因,您经常会发现这些缺陷被称为“检查时间、使用时间”(TOCTOU) 缺陷。

Please note that race-conditions are not limited to a specific web-app architecture. It’s easiest to reason about a multi-threaded single-database application, but more complex setups typically end up with state stored in even more places, and ORMs just hide the dangers under layers of abstraction. Single-threaded systems like NodeJS are slightly less exposed, but can still end up vulnerable.
请注意,竞争条件不仅限于特定的 Web 应用程序架构。对于多线程单数据库应用程序来说,推理是最容易的,但更复杂的设置通常最终会将状态存储在更多的地方,而 ORM 只是将危险隐藏在抽象层之下。像 NodeJS 这样的单线程系统暴露得稍微少一些,但最终仍然容易受到攻击。

Beyond limit-overrun exploits

I used to think race conditions were a well-understood problem. I had discovered and exploited plenty, implemented the ‘last-byte sync’ technique in Turbo Intruder, and used that to exploit various targets including Google reCAPTCHA. Over time, Turbo Intruder has become the de-facto tool for hunting web race conditions.
我曾经认为比赛条件是一个众所周知的问题。我发现并利用了大量技术,在 Turbo Intruder 中实现了“最后一字节同步”技术,并用它来利用包括 Google reCAPTCHA 在内的各种目标。随着时间的流逝,Turbo Intruder 已成为狩猎网络竞赛条件的事实上的工具。

However, there was one thing I didn’t understand. A blog post from 2016 by Josip Franjković detailed four vulnerabilities, and while three of them made perfect sense to me, one didn’t. In the post, Josip explained how he “somehow succeeded to confirm a random email address” by accident, and neither he nor Facebook’s security team were able to identify the cause until two months later. The bug? Changing your Facebook email address to two different addresses simultaneously could trigger an email containing two distinct confirmation codes, one for each address:
但是,有一件事我不明白。Josip Franjković 在 2016 年的一篇博文中详细介绍了四个漏洞,虽然其中三个对我来说非常有意义,但有一个则没有。在帖子中,Josip解释了他是如何“以某种方式成功地确认了一个随机的电子邮件地址”,直到两个月后,他和Facebook的安全团队都无法确定原因。错误?同时将 Facebook 电子邮件地址更改为两个不同的地址可能会触发一封电子邮件,其中包含两个不同的确认代码,每个地址一个:

/[email protected]&c=13475&code=84751

I had never seen a finding like this before, and it confounded every attempt to visualize what might be happening server-side. One thing was for sure – this wasn’t a limit-overrun.

Seven years later, I decided to try and figure out what happened.

The true potential of web race conditions

The true potential of race conditions can be summed up in a single sentence. Every pentester knows that multi-step sequences are a hotbed for vulnerabilities, but with race conditions, everything is multi-step.

To illustrate this, let’s plot the state machine for a serious vulnerability that I discovered by accident a while back. When a user logged in, they were presented with a ‘role selection’ page containing a range of buttons that would assign a role, and redirect to a specific application. The request flow looked something like:

POST /login POST /登录 302 Found 302 已找到
GET /role 获取 /角色 200 Found 200 已找到
POST /role POST /角色 302 Found 302 已找到
GET /application  获取/应用程序 200 OK 200 正常

In my head, the state machine for the user’s role looked like this:

Smashing the state machine: the true potential of web race conditions

I attempted to elevate privileges by forcibly browsing directly from the role selection page to an application without selecting a role, but this didn’t work and so I concluded that it was secure.

However, this state machine had a mistake. I had incorrectly assumed that the GET /role request didn’t change the application state. In actual fact, the application was initialising every session with administrator privileges, then overwriting them as soon as the browser fetched the role selection page. Here’s an accurate state machine:
但是,这个状态机有一个错误。我错误地认为 GET /role 请求没有更改应用程序状态。实际上,应用程序使用管理员权限初始化每个会话,然后在浏览器获取角色选择页面后立即覆盖它们。下面是一个精确的状态机:

Smashing the state machine: the true potential of web race conditions

By refusing to follow the redirect to /role and skipping straight to an application, anyone could gain super-admin privileges.
通过拒绝重定向到 /role 并直接跳到应用程序,任何人都可以获得超级管理员权限。

I only discovered this through extreme luck, and it took me hours of retrospective log digging to figure out the cause. This vulnerability pattern is frankly a weird one, but we can learn something valuable from the near-miss.

My primary mistake was the assumption that the GET request wouldn’t change the application state. However, there’s a second assumption that’s even more common – that “requests are atomic”. If we ditch this assumption too, we realize this pattern could occur in the span of a single login request:
我的主要错误是假设 GET 请求不会更改应用程序状态。然而,还有第二个更常见的假设——“请求是原子的”。如果我们也抛弃这个假设,我们就会意识到这种模式可能发生在单个登录请求的范围内:

Smashing the state machine: the true potential of web race conditions

This scenario captures the essence of ‘with race conditions, everything is multi-step’. Every HTTP request may transition an application through multiple fleeting, hidden states, which I’ll refer to as ‘sub-states’. If you time it right, you can abuse these sub-states for unintended transitions, break business logic, and achieve high-impact exploits. Let’s get started.
这个场景抓住了“在竞争条件下,一切都是多步骤”的本质。每个 HTTP 请求都可能通过多个转瞬即逝的隐藏状态转换应用程序,我将其称为“子状态”。如果时机正确,则可以滥用这些子状态进行意外转换,破坏业务逻辑,并实现高影响力的漏洞利用。让我们开始吧。

Single-packet attack 单包攻击

A sub-state is a short-lived state that an application transitions through while processing a single request, and exits before the request completes. Sub-states are only occupied for a brief time window – often around 1ms (0.001s). I’ll refer to this time window as the ‘race window’.
子状态是应用程序在处理单个请求时转换的短期状态,并在请求完成之前退出。子状态仅在很短的时间窗口内被占用 – 通常约为 1 毫秒(0.001 秒)。我将这个时间窗口称为“比赛窗口”。

To discover a sub-state, you need an initial HTTP request to trigger a transition through the sub-state, and a second request that interacts with the same resource during the race window. For example, to discover the vulnerability mentioned earlier you would send a request to log in, and a second request that attempted to access the admin panel. Vulnerabilities with small race windows have historically been extremely difficult to discover thanks to network jitter. Jitter erratically delays the arrival of TCP packets, making it tricky to get multiple requests to arrive close together, even when using techniques like last-byte sync:
若要发现子状态,需要一个初始 HTTP 请求来触发通过子状态的转换,以及第二个请求在竞争窗口期间与同一资源交互。例如,要发现前面提到的漏洞,您需要发送一个登录请求,以及另一个尝试访问管理面板的请求。由于网络抖动,具有小竞争窗口的漏洞历来极难发现。抖动会不规则地延迟 TCP 数据包的到达,使得让多个请求到达很棘手,即使使用最后一字节同步等技术也是如此:

Smashing the state machine: the true potential of web race conditions

In search of a solution, I’ve developed the ‘single-packet attack’. Using this technique, you can make 20-30 requests arrive at the server simultaneously – regardless of network jitter:
为了寻找解决方案,我开发了“单包攻击”。使用此技术,您可以使 20-30 个请求同时到达服务器 – 无论网络抖动如何:

Smashing the state machine: the true potential of web race conditions

I implemented the single-packet attack in the open-source Burp Suite extension Turbo Intruder. To benchmark it, I repeatedly sent a batch of 20 requests 17,000km from Melbourne to Dublin, and measured the gap between the start-of-execution timestamp of the first and last request in each batch. I’ve published the benchmark scripts in the examples folder so you can try them for yourself if you like.
我在开源 Burp Suite 扩展 Turbo Intruder 中实现了单数据包攻击。为了对其进行基准测试,我反复发送了一批 20 个请求,从墨尔本到 17,000 公里到都柏林,并测量了每个批次中第一个请求和最后一个请求的开始执行时间戳之间的差距。我已经在 examples 文件夹中发布了基准测试脚本,因此如果您愿意,您可以自己尝试一下。

Technique 技术 Median spread 中位数点差 Standard deviation 标准差
Last-byte sync 最后一个字节同步 4ms 3ms
Single-packet attack 单包攻击 1ms 0.3ms 0.3毫秒

By these measures, the single-packet attack is 4 to 10 times more effective. When replicating one real-world vulnerability, the single-packet attack was successful after around 30 seconds, and last-byte sync took over two hours.
通过这些措施,单数据包攻击的效率提高了 4 到 10 倍。在复制一个真实世界的漏洞时,单数据包攻击在大约 30 秒后成功,最后一个字节同步花了两个多小时。

One great side effect of this is that we’ve been able to launch a Web Security Academy topic containing labs with realistic race windows, without alienating users who live far away from our servers or have high-jitter connections. You can try the single-packet attack out for yourself by tackling our limit-overrun lab with the Turbo Intruder template. The race-window on this lab ended up so small that exploitation is near-impossible using multiple packets. It’s also available in Repeater via the new ‘Send group in parallel’ option in Burp Suite.
这样做的一大副作用是,我们已经能够启动一个 Web 安全学院主题,其中包含具有逼真竞争窗口的实验室,而不会疏远远离我们服务器或具有高抖动连接的用户。您可以使用 Turbo Intruder 模板处理我们的限制超限实验室,亲自尝试单数据包攻击。该实验室的竞争窗口最终非常小,以至于使用多个数据包几乎不可能利用。它也可以通过 Burp Suite 中新的“并行发送组”选项在 Repeater 中使用。

Let’s take a look under the hood.

Developing the single-packet attack

The single-packet attack was inspired by the 2020 USENIX presentation Timeless Timing Attacks. In that presentation, they place two entire HTTP/2 requests into a single TCP packet, then look at the response order to compare the server-side processing time of the two requests:
单数据包攻击的灵感来自 2020 年 USENIX 演示文稿 Timeless Timing Attacks。在该演示中,他们将两个完整的 HTTP/2 请求放入单个 TCP 数据包中,然后查看响应顺序以比较两个请求的服务器端处理时间:

Smashing the state machine: the true potential of web race conditions

This is a novel possibility with HTTP/2 because it allows HTTP requests to be sent over a single connection concurrently, whereas in HTTP/1.1 they have to be sequential.
这是 HTTP/2 的一种新可能性,因为它允许通过单个连接并发发送 HTTP 请求,而在 HTTP/1.1 中,它们必须是连续的。

The use of a single TCP packet completely eliminates the effect of network jitter, so this clearly has potential for race condition attacks too. However, two requests isn’t enough for a reliable race attack thanks to server-side jitter – variations in the application’s request-processing time caused by uncontrollable variables like CPU contention.
使用单个TCP数据包完全消除了网络抖动的影响,因此这显然也具有竞争条件攻击的可能性。但是,由于服务器端抖动(由 CPU 争用等不可控变量导致应用程序请求处理时间的变化),两个请求不足以进行可靠的竞争攻击。

I spotted an opportunity to adapt a trick from the HTTP/1.1 ‘last-byte sync’ technique. Since servers only process a request once they regard it as complete, maybe by withholding a tiny fragment from each request we could pre-send the bulk of the data, then ‘complete’ 20-30 requests with a single TCP packet:
我发现了一个机会,可以采用HTTP/1.1的“最后字节同步”技术。由于服务器只有在认为请求完成时才会处理请求,因此通过从每个请求中保留一个微小的片段,我们可以预先发送大部分数据,然后使用单个 TCP 数据包“完成”20-30 个请求:

Smashing the state machine: the true potential of web race conditions

After a few weeks of experimenting, I’d built an implementation that worked on all tested HTTP/2 servers.
经过几周的试验,我构建了一个适用于所有测试的 HTTP/2 服务器的实现。

Rolling your own implementation

This concept is honestly pretty obvious, and after implementing it I discovered someone else had the same idea back in 2020, but nobody noticed at the time and their algorithm & implementation didn’t receive the polish, testing and integration essential to prove its true value. The reason I’m so excited about the single-packet attack is that it’s powerful, universal, and trivial. Even after spending months refining it to work on all major webservers the algorithm is still so simple it fits on a single page, and so easy to implement that I expect it to end up in all major web testing tools.
老实说,这个概念非常明显,在实施它之后,我发现其他人在 2020 年也有同样的想法,但当时没有人注意到,他们的算法和实现没有得到证明其真正价值所必需的润色、测试和集成。我之所以对单数据包攻击如此兴奋,是因为它功能强大、通用且微不足道。即使花了几个月的时间改进它以在所有主要的 Web 服务器上工作,该算法仍然非常简单,适合单个页面,并且易于实现,我希望它最终会出现在所有主要的 Web 测试工具中。

The primary reason it’s so easy to implement is that thanks to some creative abuse of Nagle’s algorithm, it doesn’t require a custom TCP or TLS stack. You can just pick an HTTP/2 library to hook into (trust me, coding your own is not much fun), and apply the following steps:
它如此容易实现的主要原因是,由于对 Nagle 算法的一些创造性滥用,它不需要自定义 TCP 或 TLS 堆栈。你可以选择一个 HTTP/2 库来挂钩(相信我,你自己编码并不好玩),然后应用以下步骤:

First, pre-send the bulk of each request:

  • If the request has no body, send all the headers, but don’t set the END_STREAM flag. Withhold an empty data frame with END_STREAM set.
  • If the request has a body, send the headers and all the body data except the final byte. Withhold a data frame containing the final byte.

You might be tempted to send the full body and rely on not sending END_STREAM, but this will break on certain HTTP/2 server implementations that use the content-length header to decide when a message is complete, as opposed to waiting for END_STREAM.
您可能很想发送完整的正文并依赖不发送END_STREAM,但这会破坏某些 HTTP/2 服务器实现,这些实现使用 content-length 标头来决定消息何时完成,而不是等待END_STREAM。

Next, prepare to send the final frames:

  • Wait for 100ms to ensure the initial frames have been sent.
    等待 100 毫秒以确保初始帧已发送。
  • Ensure TCP_NODELAY is disabled – it’s crucial that Nagle’s algorithm batches the final frames.
    确保禁用TCP_NODELAY – Nagle 的算法对最终帧进行批处理至关重要。
  • Send a ping packet to warm the local connection. If you don’t do this, the OS network stack will place the first final-frame in a separate packet.
    发送 ping 数据包以预热本地连接。如果不这样做,操作系统网络堆栈会将第一个最终帧放在单独的数据包中。

Finally, send the withheld frames. You should be able to verify that they landed in a single packet using Wireshark.
最后,发送保留的帧。您应该能够使用 Wireshark 验证它们是否落在单个数据包中。

This approach worked on all dynamic endpoints on all tested servers. It doesn’t work for static files on certain servers but as static files aren’t relevant to race condition attacks, I haven’t attempted to find a workaround for this. In Turbo Intruder, the static-file quirk results in a negative timestamp as the response is received before the request is completed. This behavior can be used as a way of testing if a file is static or not.
此方法适用于所有测试服务器上的所有动态端点。它不适用于某些服务器上的静态文件,但由于静态文件与竞争条件攻击无关,因此我没有尝试找到解决方法。在 Turbo Intruder 中,静态文件怪癖会导致负时间戳,因为在请求完成之前收到响应。此行为可用作测试文件是否为静态文件的一种方式。

If you’re not sure which HTTP/2 stack to build on, I think Golang’s might be a good choice – I’ve seen that successfully extended for advanced HTTP/2 attacks in the past. If you’d like to see a reference implementation in Kotlin, feel free to use Turbo Intruder. The relevant code can be found in SpikeEngine and SpikeConnection.
如果您不确定要在哪个 HTTP/2 堆栈上构建,我认为 Golang 可能是一个不错的选择 – 我过去已经看到它成功地扩展到了高级 HTTP/2 攻击。如果您想查看 Kotlin 中的参考实现,请随时使用 Turbo Intruder。相关代码可以在 SpikeEngine 和 SpikeConnection 中找到。

Adapting to the target architecture

It’s worth noting that many applications sit behind a front-end server, and these may decide to forward some requests over existing connections to the back-end, and to create fresh connections for others.

As a result, it’s important not to attribute inconsistent request timing to application behavior such as locking mechanisms that only allow a single thread to access a resource at once. Also, front-end request routing is often done on a per-connection basis, so you may be able to smooth request timing by performing server-side connection warming – sending a few inconsequential requests down your connection before performing the attack. You can try this technique out for yourself on our multi-endpoint lab.
因此,重要的是不要将不一致的请求计时归因于应用程序行为,例如仅允许单个线程一次访问资源的锁定机制。此外,前端请求路由通常是基于每个连接完成的,因此您可以通过执行服务器端连接预热来平滑请求计时 – 在执行攻击之前向连接发送一些无关紧要的请求。您可以在我们的多端点实验室中亲自尝试此技术。

Methodology 方法论

Now that we’ve established ‘everything is multi-step’, and developed a technique to allow accurate request synchronization and make race conditions reliable, it’s time to start hunting vulnerabilities. Classic limit-overrun vulnerabilities can be discovered using a trivial methodology: identify a limit, and try to overrun it. Discovering exploitable sub-states for more advanced attacks is not quite so simple.

Over months of testing, I’ve developed the following black-box methodology to help. I recommend using this approach even if you have source-code access; in my experience it’s extremely challenging to identify race conditions through pure code analysis.

Smashing the state machine: the true potential of web race conditions

Predict potential collisions

Prediction is about efficiency. Since everything is multi-step, ideally we’d test every possible combination of endpoints on the entire website. This is impractical – instead, we need to predict where vulnerabilities are likely to occur. One tempting approach is to simply try and find replicas of the vulnerabilities described in this paper later on – this is nice and easy, but you’ll miss out on exciting, undiscovered variants.
预测是关于效率的。由于一切都是多步骤的,理想情况下,我们会在整个网站上测试端点的每个可能组合。这是不切实际的——相反,我们需要预测漏洞可能发生的位置。一种诱人的方法是简单地尝试找到本文后面描述的漏洞的副本 – 这很好,很容易,但你会错过令人兴奋的、未被发现的变体。

To start, identify objects with security controls that you’d like to bypass. This will typically include users and sessions, plus some business-specific concepts like orders.

For each object, we then need to identify all the endpoints that either write to it, or read data from it and then use that data for something important. For example, users might be stored in a database table that is modified by registration, profile-edits, password reset initiation, and password reset completion. Also, a website’s login functionality might read critical data from the users table when creating sessions.

A race condition vulnerability requires a ‘collision’ – two concurrent operations on a shared resource. We can use three key questions to rule out endpoints that are unlikely to cause collisions. For each object and the associated endpoints, ask:

1) How is the state stored?
1) 状态是如何存储的?

Data that’s stored in a persistent server-side data structure is ideal for exploitation. Some endpoints store their state entirely client-side, such as password resets that work by emailing a JWT – these can be safely skipped.
存储在持久性服务器端数据结构中的数据非常适合利用。某些端点完全在客户端存储其状态,例如通过电子邮件向 JWT 发送密码重置 – 可以安全地跳过这些。

Applications will often store some state in the user session. These are often somewhat protected against sub-states – more on that later.
应用程序通常会在用户会话中存储某些状态。这些通常在某种程度上受到保护,免受子状态的影响 – 稍后会详细介绍。

2) Are we editing or appending?

Operations that edit existing data (such as changing an account’s primary email address) have ample collision potential, whereas actions that simply append to existing data (such as adding an additional email address) are unlikely to be vulnerable to anything other than limit-overrun attacks.

3) What’s the operation keyed on?
3) 操作键是什么?

Most endpoints operate on a specific record, which is looked up using a ‘key’, such as a username, password reset token, or filename. For a successful attack, we need two operations that use the same key. For example, picture two plausible password reset implementations:

Smashing the state machine: the true potential of web race conditions

In the first implementation, the user’s password reset token is stored in the users table in the database, and the supplied userid acts as the key. If an attacker uses two requests to trigger a reset for two different userids at the same time, two different database records will be altered so there’s no potential for a collision. By identifying the key, you’ve identified that this attack is probably not worth attempting.
在第一个实现中,用户的密码重置令牌存储在数据库的 users 表中,并且提供的 userid 充当密钥。如果攻击者同时使用两个请求触发两个不同用户 ID 的重置,则两个不同的数据库记录将被更改,因此不会发生冲突。通过识别密钥,您已经确定此攻击可能不值得尝试。

In the second implementation, the state is stored in the user’s session, and the token-storage operation is keyed on the user’s sessionid. If an attacker uses two requests to trigger a reset for two different emails at the same time, both threads will attempt to alter the same session’s token and userid attributes, and the session may end up containing one user’s userid, and a token that was sent to the other user.
在第二种实现中,状态存储在用户的会话中,令牌存储操作在用户的 sessionid 上键入。如果攻击者同时使用两个请求触发两个不同电子邮件的重置,则两个线程都将尝试更改同一会话的令牌和用户 ID 属性,并且会话最终可能包含一个用户的用户 ID 和发送给另一个用户的令牌。

Probe for clues 探查线索

Now that we’ve selected some high-value endpoints, it’s time to probe for clues – hints that hidden sub-states exist. We don’t need to cause a meaningful exploit yet – our objective at this point is simply to evoke a clue. As such, you’ll want to send a large number of requests to maximize the chance of visible side-effects, and mitigate server-side jitter. Think of this as a chaos-based strategy – if we see something interesting, we’ll figure out what actually happened later.
现在我们已经选择了一些高价值的终结点,是时候探查线索了 – 暗示存在隐藏的子状态。我们还不需要造成有意义的漏洞利用 – 我们目前的目标只是唤起线索。因此,您需要发送大量请求,以最大限度地提高可见副作用的可能性,并减轻服务器端抖动。可以把它看作是一种基于混沌的策略——如果我们看到一些有趣的东西,我们稍后就会弄清楚到底发生了什么。

Prepare your blend of requests, targeting endpoints and parameters to trigger all relevant code paths. Where possible, use multiple requests to trigger each code path multiple times, with different input values.

Next, benchmark how the endpoints behave under normal conditions by sending your request-blend with a few seconds between each request.

Finally, use the single-packet attack (or last-byte sync if HTTP/2 isn’t supported) to issue all the requests at once. You can do this in Turbo Intruder using the single-packet-attack template, or in Repeater using the ‘Send group in parallel’ option.
最后,使用单数据包攻击(如果不支持 HTTP/2,则使用最后一个字节同步)一次发出所有请求。您可以在 Turbo Intruder 中使用单数据包攻击模板执行此操作,也可以在 Repeater 中使用“并行发送组”选项执行此操作。

Analyze the results and look for clues in the form of any deviation from the benchmarked behavior. This could be a change in one or more responses, or a second-order effect like different email contents or a visible change in your session. Clues can be subtle and counterintuitive so if you skip the benchmark step, you’ll miss vulnerabilities.

Pretty much anything can be a clue, but pay close attention to the request processing time. If it’s shorter than you’d expect, this can indicate that data is being passed to a separate thread, greatly increasing the chances of a vulnerability. If it’s longer than you expect, that could indicate resource limits – or that the application is using locking to avoid concurrency issues. Note that PHP locks on the sessionid by default, so you need to use a separate session for every request in your batch or they’ll get processed sequentially.
几乎任何事情都可以成为线索,但要密切注意请求处理时间。如果它比您预期的要短,这可能表明数据正在传递到单独的线程,从而大大增加了漏洞的可能性。如果它比预期的要长,则可能表示资源限制,或者应用程序正在使用锁定来避免并发问题。请注意,PHP 默认锁定 sessionid,因此您需要对批处理中的每个请求使用单独的会话,否则它们将按顺序处理。

Prove the concept 证明概念

If you spot a clue, the final step is to prove the concept and turn it into a viable attack. The exact steps here will depend on the attack you’re attempting, but there are a few general pointers that may be useful:

When you send a batch of requests, you may find that an early request pair triggers a vulnerable end-state, but later requests overwrite/invalidate it and the final state is unexploitable. In this scenario, you’ll want to eliminate all unnecessary requests – two should be sufficient for exploiting most vulnerabilities.
当您发送一批请求时,您可能会发现早期的请求对触发了易受攻击的最终状态,但后来的请求会覆盖/使其无效,并且最终状态无法利用。在这种情况下,您需要消除所有不必要的请求 – 两个请求应该足以利用大多数漏洞。

Dropping to two requests will make the attack more timing-sensitive, so you may need to retry the attack multiple times or automate it. On a couple of targets I ended up writing a Turbo Intruder script to repeatedly trigger emails, retrieve them from Burp Collaborator, and extract and visit the links within. You can find an example in the email-extraction template.
减少到两个请求将使攻击对时间更加敏感,因此您可能需要多次重试攻击或自动执行攻击。在几个目标上,我最终编写了一个 Turbo Intruder 脚本来重复触发电子邮件,从 Burp Collaborator 中检索它们,并提取并访问其中的链接。您可以在电子邮件提取模板中找到示例。

Finally, don’t forget to escalate! Think of each race condition as a structural weakness, rather than an isolated vulnerability. Advanced race conditions can cause unusual and unique primitives, so the path to maximum impact isn’t always obvious. For example, in one case I ended up with different endpoints on a single website disagreeing about what my email address was. During this research I personally missed out on ~$5k due to overlooking one exploit avenue until after the vulnerability was patched.
最后,别忘了升级!将每个竞争条件视为结构性弱点,而不是孤立的漏洞。高级竞争条件可能会导致不寻常和独特的基元,因此获得最大影响的路径并不总是显而易见的。例如,在一个案例中,我最终在一个网站上获得了不同的端点,不同意我的电子邮件地址是什么。在这项研究中,我个人错过了 ~$5k,因为在漏洞被修补之前忽略了一条漏洞利用途径。

Case studies 案例研究

Let’s take a look at the methodology and tooling in action, with some real-life case studies. These vulnerabilities are focused on email-related functionality, as my primary objective was to understand the mysterious Facebook exploit.

First, a disclaimer. During research, I usually accrue a large number of case studies affecting high-profile companies by using automation to test tens of thousands of sites. Race conditions aren’t suitable for this scale of automation, so every example that follows is brought to you by hours of mostly manual testing. On the bright side, this means I’ve tested only a tiny proportion of websites with bug bounty programs, and left a lot of money on the table for everyone else.

Object masking via limit-overrun

We’ll start with an object masking vulnerability in Gitlab. Gitlab lets you invite users to administer projects via their email address. I decided to try a probe with six identical requests:
我们将从 Gitlab 中的对象掩码漏洞开始。Gitlab 允许您邀请用户通过他们的电子邮件地址管理项目。我决定尝试使用六个相同的请求进行探测:

POST /api/…/invitations HTTP/2
{"email":"[email protected]"}

To build a baseline, I sent these requests sequentially with a small delay between each. This resulted in the response {“status”:”success”} six times, and one invitation email.
为了建立基线,我按顺序发送了这些请求,每个请求之间有一点延迟。这导致了 {“status”:“success”} 六次响应和一封邀请电子邮件。

Next, I sent the requests simultaneously, using the single-packet attack. This resulted in one response containing {“status”:”success”}, five responses saying {“message”:”The member’s email address has already been taken”}, and two emails.
接下来,我使用单数据包攻击同时发送请求。这导致了一个包含 {“status”:“success”} 的回复、五个回复 {“message”:“The member’s email address has already been taken”},以及两封电子邮件。

Receiving two emails from six requests is a clear clue that I’ve hit a sub-state, and further testing is warranted. The difference in the responses is also a clue. Note that if I hadn’t benchmarked Gitlab’s baseline behavior, I wouldn’t have regarded the five “The member’s email address has already been taken” responses as suspicious. Finally, there was also a second-order clue: after an attack, any attempt to edit the resulting invitation triggered an error.
收到来自六个请求的两封电子邮件是一个明确的线索,表明我已经达到了一个子状态,并且需要进一步的测试。反应的差异也是一个线索。请注意,如果我没有对 Gitlab 的基线行为进行基准测试,我就不会认为“成员的电子邮件地址已被占用”的五个响应是可疑的。最后,还有一个二阶线索:在攻击之后,任何编辑生成的邀请的尝试都会触发错误。

After some more digging, I was able to arrive at a low-severity exploit. The page that lists active invitations only displays one invitation for a given email address. Using the race condition, I was able to create a dummy low-privilege invitation which gets replaced by an admin-level invitation if it’s revoked.

The impact here wasn’t great, but it hinted at deeper problems to come.

Multi-endpoint collisions

Classic multi-step exploits can provide inspiration for race condition attacks. While testing an online shop a while ago, I discovered that I could start a purchase flow, pay for my order, and then add an extra item to my basket before I loaded the order confirmation page – effectively getting the extra item for free. We later made a replica of this vulnerability for training purposes.
经典的多步漏洞利用可以为竞争条件攻击提供灵感。在不久前测试一家在线商店时,我发现我可以启动购买流程,支付订单,然后在加载订单确认页面之前将额外的商品添加到我的购物篮中 – 有效地免费获得额外的商品。我们后来制作了此漏洞的副本,用于训练目的。

There’s a documented race condition variation of this attack that can occur when the payment and order confirmation are performed by a single request.

Smashing the state machine: the true potential of web race conditions

On Gitlab, emails are important. The ability to ‘verify’ an email address you don’t own would let you gain administrator access to other projects by hijacking pending invitations. Furthermore, since Gitlab acts as an OpenID IDP, it could also be abused to hijack accounts on third-party websites that naively trust Gitlab’s email verification.
在 Gitlab 上,电子邮件很重要。“验证”您不拥有的电子邮件地址的能力将允许您通过劫持待处理的邀请来获得对其他项目的管理员访问权限。此外,由于 Gitlab 充当 OpenID IDP,它也可能被滥用来劫持天真地信任 Gitlab 电子邮件验证的第三方网站上的帐户。

The basket attack might not sound relevant to exploiting Gitlab, but I realized that when visualized, Gitlab’s email verification flow looks awfully similar:
篮子攻击听起来可能与利用 Gitlab 无关,但我意识到,当可视化时,Gitlab 的电子邮件验证流程看起来非常相似:

Smashing the state machine: the true potential of web race conditions

Perhaps by verifying an email address and changing it at the same time, I could trick Gitlab into incorrectly marking the wrong address as verified?
也许通过验证电子邮件地址并同时更改它,我可以欺骗 Gitlab 错误地将错误的地址标记为已验证?

When I attempted this attack, I noticed that the confirmation operation was executing before the email-change every time. This suggested that the email-change endpoint was doing more processing than the email-confirmation endpoint before it hit the vulnerable sub-state, so sending the two requests in sync was missing the race window:

Smashing the state machine: the true potential of web race conditions

Delaying the confirmation request by 90ms fixed the issue, and achieved a 50/50 spread between the email-change landing first, and the email-confirmation landing first.
将确认请求延迟 90 毫秒解决了该问题,实现了电子邮件更改先登陆和邮件确认先登陆之间的 50/50 点差。

Note that adding a client-side delay means you can’t use the single-packet attack, so on high-jitter targets it won’t work reliably regardless of what delay you set:

Smashing the state machine: the true potential of web race conditions

If you encounter this problem, you may be able to solve it by abusing a common security feature. Webservers often have ‘leaky bucket’ rate-limits which delay processing of requests if they’re sent too quickly. You can abuse this by sending a large number of dummy requests to trigger the rate-limit and cause a server-side delay, making the single-packet attack viable even when delayed execution is required:
如果遇到此问题,可以通过滥用常见的安全功能来解决它。Web 服务器通常具有“泄漏桶”速率限制,如果请求发送得太快,则会延迟请求的处理。您可以通过发送大量虚拟请求来触发速率限制并导致服务器端延迟来滥用此漏洞,即使需要延迟执行,单数据包攻击也可行:

Smashing the state machine: the true potential of web race conditions

Back on Gitlab, lining the race window up revealed two clues – the email confirmation request intermittently triggered a 500 Internal Server Error, and sometimes the confirmation token was sent to the wrong address! Unfortunately, the misdirected code was only valid for confirming the already-confirmed address, making it useless.
回到 Gitlab,将比赛窗口排成一排,揭示了两条线索——电子邮件确认请求间歇性地触发了 500 内部服务器错误,有时确认令牌被发送到了错误的地址!不幸的是,错误发送的代码仅对确认已确认的地址有效,因此毫无用处。

Still, thanks to the misdirected code we know there’s at least one sub-state hidden inside Gitlab’s email-change endpoint. Maybe we just need a different angle to exploit this?
尽管如此,由于错误定向的代码,我们知道 Gitlab 的电子邮件更改端点中至少隐藏了一个子状态。也许我们只是需要一个不同的角度来利用这一点?

Single-endpoint collisions

Race conditions thrive on complexity – they get progressively more likely the more data gets saved, written, read, altered, and handed off between classes, threads, and processes. When an endpoint is sufficiently complex, you don’t even need any other endpoints to cause an exploitable collision.
竞争条件在复杂性中茁壮成长 – 在类、线程和进程之间保存、写入、读取、更改和传递的数据越多,竞争条件的可能性就越大。当端点足够复杂时,您甚至不需要任何其他端点来导致可利用的冲突。

On Gitlab, I noticed that when I tried to change my email address, the response time was 220ms – faster than I’d expect for an operation that sends an email. This hinted that the email might be sent by a different thread – exactly the kind of complexity we need.
在 Gitlab 上,我注意到当我尝试更改电子邮件地址时,响应时间为 220 毫秒 – 比我预期的发送电子邮件操作要快。这暗示了电子邮件可能由不同的线程发送 – 这正是我们需要的那种复杂性。

I decided to probe Gitlab by changing my account’s email address to two different addresses at the same time:
我决定通过同时将帐户的电子邮件地址更改为两个不同的地址来探测 Gitlab:

POST /-/profile HTTP/2

user[email][email protected]POST /-/profile HTTP/2

user[email][email protected]

This revealed a massive clue:

Subject: Confirmation instructions

Click the link below to confirm your email address.

Confirm your email address

The address the message was sent to didn’t always match the address in the body. Crucially, the confirmation token in the misrouted email was often valid. By submitting two requests, containing my own email address and [email protected], I was able to obtain the latter as a validated address. You can still view it on my profile.
邮件发送到的地址并不总是与正文中的地址匹配。至关重要的是,错误路由的电子邮件中的确认令牌通常是有效的。通过提交两个包含我自己的电子邮件地址和 [email protected] 的请求,我能够获得后者作为验证地址。您仍然可以在我的个人资料中查看它。

More importantly, this unlocked the invitation-hijacking and OpenID attacks mentioned earlier.
更重要的是,这解锁了前面提到的邀请劫持和 OpenID 攻击。

I’ve recorded a video demonstrating the full discovery process on a remote Gitlab installation:
我录制了一个视频,演示了远程 Gitlab 安装的完整发现过程:


Code analysis 代码分析

Although my exploit worked, I still had no idea what had actually happened.

The vulnerability seemed to originate from the way Gitlab had integrated Devise, a popular authentication framework for Ruby on Rails. I explored the Devise codebase via Confirmable.rb, and Gitlab via their patch for my finding. Analyzing the race condition from a white-box perspective proved quite challenging, especially around the boundary between Devise and Gitlab, but here’s my best shot at explaining the inner workings of this vulnerability.
该漏洞似乎源于 Gitlab 集成 Devise 的方式,Devise 是 Ruby on Rails 的流行身份验证框架。我通过 Confirmable.rb 探索了 Devise 代码库,并通过他们的补丁探索了 Gitlab。事实证明,从白盒的角度分析竞争条件非常具有挑战性,尤其是在 Devise 和 Gitlab 之间的边界附近,但这是我解释这个漏洞内部工作原理的最佳机会。

If you request an email change, Devise updates user.unconfirmed_email, saves a security token in user.confirmation_token, and emails a link containing the token to user.unconfirmed_email:
如果您请求更改电子邮件,Devise 会更新user.unconfirmed_email,将安全令牌保存在user.confirmation_token中,并通过电子邮件发送包含该令牌的链接以user.unconfirmed_email:

self.unconfirmed_email = // from 'email' parameter...
self.confirmation_token = @raw_confirmation_token = Devise.friendly_token
// this eventually gets handed off a different thread to render & send the emailsend_devise_notification(:confirmation_instructions, @raw_confirmation_token, { to: unconfirmed_email } )
// an email is queued to the unconfirmed_email argument
// but the body is generated via a template engine reads the variables back from the database
- confirmation_link = confirmation_url(@resource, confirmation_token: @token)
- if @resource.unconfirmed_email.present? || [email protected]_recently?
= email_default_heading(@resource.unconfirmed_email ||
%p= _('Click the link below to confirm your email address.')
= link_to _('Confirm your email address'), confirmation_link

The vulnerability arises in an inconsistency between how Devise knows where to send the email, and how it knows what to put inside the email. The email is sent to a variable passed directly in an argument to send_devise_notification. However, the variables used to populate the email body, including the confirmation_link, are retrieved from the database using a server-side template engine. This creates a race window between send_devise_notification being invoked, and the email body being generated, where another thread can update user.unconfirmed_email in the database.
该漏洞的产生在于 Devise 知道将电子邮件发送到何处的方式与它知道在电子邮件中放置什么内容的方式不一致。电子邮件将发送到直接在参数中传递给 send_devise_notification 的变量。但是,用于填充电子邮件正文的变量(包括confirmation_link)是使用服务器端模板引擎从数据库中检索的。这会在被调用send_devise_notification和生成的电子邮件正文之间创建一个争用窗口,其中另一个线程可以更新数据库中的user.unconfirmed_email。

While attempting to replicate this vulnerability on a local Gitlab installation, I noticed an important detail that I overlooked during the original discovery. Although it’s easy to trigger an email that gets sent to the wrong address, the confirmation token within is only valid if the application is in the right starting state. For a successful exploit, you need to trigger Devise’s ‘resend existing token’ code path. You can do this by hitting the resend_confirmation_token endpoint if it’s exposed, or simply by requesting a change to the same email address twice.
在尝试在本地 Gitlab 安装上复制此漏洞时,我注意到了一个重要的细节,我在最初发现时忽略了这个细节。尽管很容易触发发送到错误地址的电子邮件,但仅当应用程序处于正确的启动状态时,其中的确认令牌才有效。要成功利用此漏洞,您需要触发 Devise 的“重新发送现有令牌”代码路径。为此,你可以通过点击resend_confirmation_token终结点(如果它已公开)来执行此操作,或者只需请求两次更改同一电子邮件地址即可。

We’ve built a replica of this vulnerability so you can practise your single-endpoint exploitation skills.

Testing other Devise sites
测试其他 Devise 站点

I reported this vulnerability to Gitlab, and they assigned it CVE-2022-4037 and patched it in release 15.7.2 on the 4th Jan 2023. Note that they classified it as medium severity, but I’d personally classify it as high due to the invitation hijacking exploit which I discovered later.
我向 Gitlab 报告了此漏洞,他们将其分配为 CVE-2022-4037,并在 2023 年 1 月 4 日的 15.7.2 版中对其进行了修补。请注意,他们将其归类为中等严重性,但由于我后来发现的邀请劫持漏洞,我个人将其归类为高。

While reading about Devise, I noticed that NCC described it as “far and away the most popular authentication system for Rails”. Over the following 200 days I made multiple attempts to report this issue via three different security-contact addresses without success, so I thought I’d share my experience hunting down other targets built on Devise. Devise can be easily detected using the unauthenticated endpoint /users/confirmation. Scanning for this quickly revealed a number of interesting sites including -temporarily redacted-. Unfortunately for me, -redacted- was wisely not putting much trust in email verification, so the only impact I could identify was the ability to bypass domain-based access controls, which only functioned as a defense-in-depth measure.
在阅读 Devise 时,我注意到 NCC 将其描述为“Rails 最流行的身份验证系统”。在接下来的 200 天里,我多次尝试通过三个不同的安全联系人地址报告此问题,但没有成功,所以我想分享我在寻找基于 Devise 构建的其他目标的经验。使用未经身份验证的端点 /users/confirmation 可以轻松检测 Devise。对此的扫描很快发现了许多有趣的网站,包括-暂时编辑-。不幸的是,对我来说,-redacted-明智地不太信任电子邮件验证,所以我能确定的唯一影响是能够绕过基于域的访问控制,这仅起到深度防御措施的作用。

On another target, the email confirmation text didn’t tell you who the code was for, so you had to click every confirmation link and reload your profile to see if the confirmed email address matched your expectations. Since there was no visible clue and the exploit only worked intermittently, this would have been an easy vulnerability to overlook. I ended up writing a Turbo Intruder script to automate the detection of no-clue token misrouting findings like this, which you can find in the email-extraction template.
在另一个目标上,电子邮件确认文本没有告诉您代码是给谁的,因此您必须单击每个确认链接并重新加载您的个人资料,以查看确认的电子邮件地址是否符合您的期望。由于没有可见的线索,并且漏洞利用只是间歇性地工作,因此这是一个容易被忽视的漏洞。我最终编写了一个 Turbo Intruder 脚本来自动检测像这样的无线索令牌错误路由结果,您可以在电子邮件提取模板中找到该结果。

Deferred collisions 延迟碰撞

So far, we’ve exploited endpoints where the collision occurs more or less straight away. It’s a mistake to think that an immediate collision is guaranteed – websites may do critical data processing in batches periodically behind the scenes. In this scenario, you don’t need careful request timing to trigger a race condition – the application will do that part for you. I’ll refer to these as deferred collisions.
到目前为止,我们已经利用了冲突或多或少直接发生的端点。认为可以保证立即发生冲突是错误的——网站可能会在幕后定期批量进行关键数据处理。在此方案中,你不需要仔细的请求计时来触发争用条件 – 应用程序将为你完成该部分。我将这些称为延迟冲突。

I discovered one of these while probing for code-misrouting races on a major website that really doesn’t want me to name them. Confirmation emails took quite a while to arrive, and didn’t state which address they were intended to confirm, but I noticed that trying to change my email to two different addresses simultaneously sometimes resulted in two emails to the same address.

It looked similar to the Devise vulnerability until I realized that the two conflicting email-change requests could be sent with a 20-minute delay between them. Deferred race conditions like this one are inherently difficult to identify, as they’ll never trigger immediate clues like different responses. Instead, detection is reliant on second-order clues such as changed application behavior or inconsistent emails at a later date. Since the collisions aren’t dependent on synchronized requests, clues may appear without any deliberate testing. Over time I’ve begun to regard spotting anomalies as the single most important skill for finding race conditions.
它看起来类似于 Devise 漏洞,直到我意识到两个冲突的电子邮件更改请求之间可以延迟 20 分钟。像这样的延迟比赛条件本质上很难识别,因为它们永远不会触发不同的反应等即时线索。相反,检测依赖于二阶线索,例如更改的应用程序行为或以后不一致的电子邮件。由于冲突不依赖于同步请求,因此线索可能会在没有任何刻意测试的情况下出现。随着时间的流逝,我开始将发现异常视为发现竞争条件的最重要技能。

Smashing the state machine: the true potential of web race conditions

I reported this finding, and the initial fix attempt made the misrouted token invalid most of the time, but not always. A different company’s initial fix for their vulnerability was also incomplete, suggesting race condition patches definitely deserve scrutiny.

Future research 未来研究

In this paper, I’ve focused on a collection of closely related exploit scenarios and vulnerability patterns. Race conditions permeate every area of the web, so I suspect there are a range of other undocumented scenarios leading to high impact exploits. These will no doubt prove fruitful for whoever discovers them, and contribute a lot of value if they’re shared with the wider security community.

Partial construction attacks

One pattern that’s just about visible is partial construction vulnerabilities. These occur when an object is created in multiple steps, creating an insecure middle state. For example, during account registration, the application may create the user in the database and set the user’s password in two separate SQL statements, leaving a tiny window open where the password is null. This type of attack is most likely to work on applications where you can provide an input value that will match against the uninitialized database value – such as null in JSON, or an empty array in PHP. In the case of a password input, you’ll want something that makes the password hash function return null. We’ve made a lab for this attack class but be warned it’s quite tricky.
一种几乎可见的模式是部分结构漏洞。当在多个步骤中创建对象时,就会发生这种情况,从而创建不安全的中间状态。例如,在帐户注册期间,应用程序可能会在数据库中创建用户,并在两个单独的 SQL 语句中设置用户的密码,从而打开一个密码为 null 的小窗口。这种类型的攻击最有可能在应用程序上起作用,在这些应用程序中,您可以提供与未初始化的数据库值匹配的输入值 – 例如 JSON 中的 null 或 PHP 中的空数组。在输入密码的情况下,您需要使密码哈希函数返回 null 的东西。我们已经为这个攻击类做了一个实验室,但请注意,这非常棘手。

If you’re interested in this attack class I’d highly recommend reading Natalie Silvanovich’s WebRTC research
如果你对这个攻击类感兴趣,我强烈建议你阅读 Natalie Silvanovich 的 WebRTC 研究

Unsafe data structures 不安全的数据结构

Another angle for further research is exploring the root cause of race conditions – unsafe combinations of data structures and locking strategies. I’ve encountered three main strategies:
进一步研究的另一个角度是探索竞争条件的根本原因 – 数据结构和锁定策略的不安全组合。我遇到过三种主要策略:

Locking 锁定

Some data structures aggressively tackle concurrency issues by using locking to only allow a single worker to access them at a time. One example of this is PHP’s native session handler – if you send PHP two requests in the same session at the same time, they get processed sequentially! This approach is secure against session-based race conditions but it’s terrible for performance, and quite rare as a result.
一些数据结构通过使用锁定来积极解决并发问题,一次只允许一个工作线程访问它们。其中一个例子是 PHP 的原生会话处理程序 – 如果你在同一会话中同时发送两个 PHP 请求,它们会按顺序处理!这种方法在基于会话的竞争条件下是安全的,但它对性能来说很糟糕,因此非常罕见。

It’s extremely important to spot this strategy when you’re testing because it can mask exploitable vulnerabilities. For example, if you try to use two requests in the same session to probe for a database-layer race you’ll miss it every time, but the vulnerability will be trivially exploitable using two separate sessions.

Batching 配料

Most session handlers and ORMs batch updates to a given session. When they start to process a request they read in an entire record (for example, all the variables in a particular session), and subsequent read/write operations are applied to a local in-memory copy of this record, then when the request processing completes the entire record is serialized back to the database.
大多数会话处理程序和 ORM 都会批量更新到给定会话。当它们开始处理请求时,它们会读取整个记录(例如,特定会话中的所有变量),并且后续的读/写操作将应用于此记录的本地内存中副本,然后当请求处理完成时,整个记录将序列化回数据库。

This use of a separate in-memory copy per request makes them internally consistent during the request lifecycle and avoids the creation of sub-states. However, if two requests operate on the same record simultaneously, one will end up overwriting the database changes from the other. This means they can’t be used to defend against attacks affecting other storage layers.

No defense 没有防御

Finally, some data structures update shared resources in real time with no batching, locking, or synchronization. You’ll see this most often with custom, application-specific data structures, and anything stored in databases without consistent use of transactions.

You might also encounter it with custom session handlers, especially those built on low-latency storage like redis or a local database. I have personally encountered a vulnerable session handler, but it doesn’t make for a good case study because I obliviously coded it myself!
您可能还会遇到自定义会话处理程序,尤其是那些基于低延迟存储(如 redis 或本地数据库)构建的处理程序。我个人遇到过一个易受攻击的会话处理程序,但它并不是一个很好的案例研究,因为我自己不知不觉地编写了它!

If you spot a custom session handler, heavy testing is advised as a vulnerable implementation can undermine critical functionality such as login. Here are three snippets of code that are highly exploitable when combined with the session-handler that has no defenses:

# Bypass code-based password reset
session['reset_username'] = username
session['reset_code'] = randomCode()
Exploit: Simultaneous reset for $your-username and $victim-username
# Bypass 2FA
session['user'] = username
if 2fa_enabled:
session['require2fa'] = true
Exploit: Simultaneous login and sensitive page fetch
# Session-swap
session['user'] = username
Detect: Simultaneous login to two separate accounts from same session
Exploit: Force anon session cookie on victim, then log in simultaneously

Hopefully we’ll quickly arrive at a consensus that for a core data-structure like a session handler or ORM, failure to be atomic is a vulnerability.

Single-packet attack enhancements

There are three key areas where the single-packet attack could be developed further.

My implementation lets you complete up to 20-30 HTTP requests with a single packet. It’s probably possible to improve this number further using TCP/TLS-layer techniques such as forcing the maximum segment size up, or deliberately issuing TCP packets out of order.
我的实现允许您使用单个数据包完成多达 20-30 个 HTTP 请求。使用 TCP/TLS 层技术(例如强制增加最大段大小或故意无序发出 TCP 数据包)可能进一步提高此数字。

As we saw earlier, multi-endpoint attacks often require requests to start processing at different times. Abusing server rate-limits can solve this, but only on some systems. A more generic, reliable way to delay the processing of specific requests in a single packet would be valuable.

Finally, my implementation opts to batch requests at the TCP layer, rather than TLS. This is probably the easiest approach, but if you could instead squeeze the requests into a single TLS record, this would make the single-packet attack work through any proxy that doesn’t break TLS – including SOCKS.
最后,我的实现选择在 TCP 层而不是 TLS 上批处理请求。这可能是最简单的方法,但如果您可以将请求压缩到单个 TLS 记录中,这将使单数据包攻击通过任何不破坏 TLS 的代理(包括 SOCKS)工作。

Defence 防御

When a single request can push an application through invisible sub-states, understanding and predicting its behavior is extremely difficult, and makes defense impractical. To secure an application, I recommend eliminating sub-states from all sensitive endpoints by applying the following strategies:

  • Avoid mixing data from different sources. The Devise library read a token from the database, then emailed it to an address held in an instance variable. If it had read both the token and the email address from the database, or passed them both in instance variables, it would not have been vulnerable.
    避免混合来自不同来源的数据。Devise 库从数据库中读取令牌,然后通过电子邮件将其发送到实例变量中保存的地址。如果它从数据库中读取了令牌和电子邮件地址,或者在实例变量中同时传递了它们,那么它就不会容易受到攻击。
  • Ensure sensitive endpoints make state-changes atomic by using the datastore’s concurrency features. For example, use a single database transaction to check the payment matches the cart value and confirm the order.
  • As a defence in depth measure, take advantage of datastore integrity/consistency features like uniqueness constraints.
  • Don’t attempt to use one data storage layer to secure another. For example, sessions aren’t suitable for preventing limit-overrun attacks on databases.
  • Ensure your session handling framework keeps sessions internally consistent. Updating session variables individually instead of in a batch might be a tempting optimization, but it’s extremely dangerous. This goes for ORMs too – by hiding away concepts like transactions, they’re taking on full responsibility for them.
    确保会话处理框架使会话保持内部一致性。单独更新会话变量而不是批量更新会话变量可能是一种诱人的优化,但它非常危险。ORM 也是如此——通过隐藏交易等概念,它们对它们承担了全部责任。
  • In some architectures, it may be appropriate to avoid server-side state entirely, instead using encryption to push the state client-side such as JWT.
    在某些体系结构中,完全避免服务器端状态,而是使用加密来推动客户端状态(如 JWT)可能是合适的。

Takeaways 外卖

HTTP request processing isn’t atomic – any endpoint might be sending an application through invisible sub-states. This means that with race conditions, everything is multi-step.
HTTP 请求处理不是原子的 – 任何端点都可能通过不可见的子状态发送应用程序。这意味着,在竞争条件下,一切都是多步骤的。

The single-packet attack solves network jitter, making it as though every attack is on a local system. This exposes vulnerabilities that were previously near-impossible to detect or exploit.

Spotting anomalies is the single most important skill for finding race conditions.

Good luck! 祝你好运!

– albinowax – 白化病

