Post

Learner's Notes: Exploring C++ Undefined Behavior (Part 3) – Best Practices for Low-Level Object Manipulation

Quick Note:
If you’re into the nitty-gritty details, be sure to check out Part 1 and Part 2. But if you’re here for practical advice, welcome to Part 3!

Best Practices

As promised in the Part 2, here are the three key rules to keep in mind when dealing with low-level object manipulation:

  1. Always use arrays of std::byte or unsigned char as memory buffers instead of char.
  2. Always check whether std::launder is necessary.
  3. Always check whether you need to call std::destroy_at.

That’s it! If you just wanted the TLDR, you’re good to go. But if you’re curious about why these rules matter, stick around. The rest of this post will take a FAQ-style approach, where I go through key questions that came up during my own deep dive into these best practices.

Rule 1 FAQ: Using arrays of std::byte as Memory Buffers

Q: What do arrays of std::byte or unsigned char provide that arrays of char don’t?
A: Besides implicitly creating objects of implicit-lifetime types (as covered in previous posts), arrays of std::byte or unsigned char also provide storage for those created objects, as stated in [intro.object]/3.

Combining this with [intro.object]/4 and [basic.life]/1.5, we see that the lifetime of a std::byte or unsigned char array does not end when we reuse its storage for another object. In contrast, reusing storage from a char array does end its lifetime.

Q: Why does it matter that the lifetime of arrays of std::byte or unsigned char doesn’t end when reusing their storage?
A: This is a surprisingly tricky question to answer just by looking at the C++23 standard, but let’s go with an example to illustrate why it’s important:

1
2
3
4
5
6
7
8
9
10
11
12
#include <iostream>

struct SimpleIntBuffer {
    alignas(int) char buf[64];
    void f() { std::cout << "Hello World\n";}
};

int main() {
    SimpleIntBuffer s;
    ::new(s.buf) int{2};
    s.f(); // Undefined behavior: `s`'s lifetime has ended
}

When we reuse the storage of buf to create an int object, we unintentionally end the lifetime of s itself.

Why? Because the int object isn’t considered nested within buf, which means it also isn’t nested within s. According to [basic.life]/1.5, this means the lifetime of s is over. And once s’s lifetime has ended, calling f() on it is undefined behavior, as per [basic.life]/7.2.

Rule 2 FAQ: Checking Whether std::launder Is Necessary

Q: When do you generally need to use std::launder?
A: We’ve already seen its necessity for implicitly created objects, but more generally, std::launder is typically required to access an object after placement new if:

  1. You don’t use the pointer returned by placement new, and
  2. You access the object via the buffer using a reinterpret_cast.

If both conditions hold, then a call to std::launder is almost always needed. The rare exceptions are outlined in [basic.life]/8, but they usually don’t apply to memory buffer use cases.

Here’s a simple example:

1
2
3
4
5
6
7
8
9
#include <cstddef>
#include <mutex>
#include <iostream>

int main() {
    alignas(int) std::byte arr[16];
    ::new(arr) int{100};
    std::cout << *std::launder(reinterpret_cast<int*>(arr)) << std::endl; // std::launder is required here
}

Although it may not be necessary in the future due to proposal P3006, std::launder is still required at the moment, which is why I’ve included it here.

Rule 3 FAQ: Checking Whether std::destroy_at Is Necessary

Q: What is std::destroy_at and when should we call it?
A: For the purpose of our discussion, std::destroy_at is essentially used to manually call the destructor of an object (we’ll skip discussing array types here for simplicity). It’s crucial to know when to invoke a destructor because, as we saw in [basic.life]/5, you can end an object’s lifetime without calling its destructor if you reuse or release the storage.

Here’s a simple illustration of why this matters:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#include <cstddef>
#include <mutex>
#include <iostream>

int main() {
    std::mutex m;
    alignas(std::lock_guard<std::mutex>) std::byte* arr = new std::byte[16]{};

    std::lock_guard<std::mutex>* lk_ptr = ::new(arr) std::lock_guard<std::mutex>{m};
    std::destroy_at(lk_ptr);  // Commenting out this line will result in a deadlock
    delete[] arr;

    std::lock_guard<std::mutex> lk2{m};
    std::cout << "Hello World\n";
}

The rule is simple: if you reuse or release the storage occupied by an object, it’s important to call either std::destroy_at or the destructor manually to ensure the proper side-effects (like cleanup) are handled.

Conclusion

And that wraps up our 3-part series! For the language lawyers, I hope you found something useful in Part 1 and Part 2. For the practical software developers out there, I hope Part 3 helps you recognize common pitfalls and adopt best practices when it comes to low-level object manipulation.

Till next time!

References

  1. Working Draft, Standard for Programming Language C++ (C++23)
  2. Discussion in the Comments of “What is the significance of std::byte and unsigned char arrays providing storage in C++?”
  3. cppreference
This post is licensed under CC BY 4.0 by the author.