What we learned from C++ atomics and memory model standardization (The Future of Weak Memory 2024)

Sun 14 - Sat 20 January 2024 London, United Kingdom

Track

The Future of Weak Memory 2024

Time Zone

The program is currently displayed in (GMT) London.

Use conference time zone: (GMT) LondonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 15 Jan 2024 11:45 - 12:07 at Turing Lecture - Session 2 Chair(s): John Wickerson

Abstract

The C++11 memory model was first included with thread support in C++11, and then incrementally updated with later revisions. I plan to summarize what I learned, both as a C++ standards committee member, and more recently as a frequent user of this model, mentioning as many of these as I have time for:

The C++ committee began with a view that higher level synchronization facilities like mutexes and barriers should constitute perhaps 90% of thread synchronization, sequentially consistent atomics, maybe another 9%, and weakly ordered atomics the other 1%. What I’ve observed in C++ code is often very far from that. I see roughly as much atomics as mutex use, in spite of some official encouragement to the contrary. Much of that uses weakly ordered atomics. I see essentially no clever lock-free data structures, along the lines of lock-free linked lists in the code I work with. I do see a lot of atomic flags, counters, fixed-size caches implemented with atomics, and the like. Code bases vary, but I think this is not atypical.

In spite of their frequent use, the pay-off from weakly ordered atomics is decreasing, and is much less than it was in Pentium 4 times. The perceived benefit on most modern mainstream CPUs seems to significantly exceed the actual benefit, though probably not so on GPUs. In my mind this casts a bit of doubt on the need to expose dependency-based ordering, as in the unsuccessful memory_order_consume, to the programmer, in spite of an abundance of use cases. Even memory_order_seq_cst is often not significantly slower. I’ll illustrate with a microbenchmark.

We initially knew way too little about implementability on various architectures. This came back to bite us recently [Lahav et al.] This remains scary in places. Hardware constraints forced us into a change that makes the interaction between acquire/release and seq_cst hard to explain, and far less intuitive than I would like. It seems to be generally believed that this is hard or impossible to avoid with very high levels of concurrency, as with GPUs.

We knew at the start that the out-of-thin-air problem would be an issue. We initially tried to side-step it, which was a worse disaster than the current hand-waving. This has not stopped memory_order_relaxed from being widely used. Practical code seems to work, but it is not provably correct given the C++ spec, and I will argue that the line between this and non-working code will inherently remain too fuzzy for working programmers. [P1217]

Unsurprisingly, programmers very rarely read the memory model in the standard. We learned that commonly compiler writers do not either. The real audience for language memory models mostly consists of researchers who generate instruction mapping tables for particular architectures. The translation from a mathematical model to standardese is both error prone, and largely pointless. We need to find a way to avoid the standardese.

Atomics mappings are part of the platform application binary interface, and need to be standardized. They often include arbitrary conventions that need to be consistently followed by all compilers on a system for all programming languages. Later evolution of these conventions is not always practical. I’ll give a recent RISC-V example of such a problem.

File attachments

slides (C++ memory model retrospective.pdf)	535KiB

Time Zone

The program is currently displayed in (GMT) London.

Use conference time zone: (GMT) LondonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 15 Jan
Displayed time zone: London change

11:00 - 12:30	Session 2The Future of Weak Memory at Turing Lecture Chair(s): John Wickerson Imperial College London

11:00 22m Talk		Weak Memory Demands Model-based Compiler Testing The Future of Weak Memory Luke Geeson University College London (UCL) File Attached
11:22 22m Talk		On the need for available, functional, and reusable memory models The Future of Weak Memory Hernán Ponce de León Huawei Dresden Research Center
11:45 22m Talk		What we learned from C++ atomics and memory model standardization The Future of Weak Memory Hans-J. Boehm Google File Attached
12:07 22m Talk		Why Languages Should Preserve Load-Store Order The Future of Weak Memory Stephen Dolan Jane Street

What we learned from C++ atomics and memory model standardization

Mon 15 Jan
Displayed time zone: London change

Hans-J. Boehm

Google

Tracks

Co-hosted Conferences

Workshops

Co-hosted Symposia

What we learned from C++ atomics and memory model standardization

Program Display Configuration

Program Display Configuration

Mon 15 JanDisplayed time zone: London change

Hans-J. Boehm

Google

Mon 15 Jan
Displayed time zone: London change