class: center, middle # Three lies your operating system tells you about memory ??? - Thanks to Jen for organizing our Lunch & Learns! - This topic will be most useful to C/C++ programmers, but hopefully it's an interesting peek under the hood to non-programmers as well. - There's a lot of history here, which I'm going to mostly skip over. Some things I say won't always have been that way, or some things I call lies were once true. If we go down all those rabbit holes, this would be like a 6 hour talk. - Feel free to stop me and ask questions --- class: center, middle # What do we ~~know~~ believe as programmers or users? --- # Let's start from CS 101 .fit[] ??? In school we're taught (or at least I was, I'm old enough that we started in C) that there are two areas of memory, the _stack_ and the _heap_. For non-programmers (or those that didn't learn this way) the _stack_ is a scratch-pad-like area where the current function's state is stored until the function returns and they go out of scope. - Like a stack of index cards: add a card when you call a new function, throw it out when the function is done. Conversely, the heap is a nebulous area where things live as long as you need them to - either manually managed by the programmer, or managed for you by a garbage collector in languages like C#, Python, or JavaScript. --- # Linear memory .fit[] - Memory is a linear collection of bytes - Numbered from 0 to 2<sup>32</sup> or 2<sup>64</sup> - Not all of the numbers are valid, your computer doesn't have that much RAM. - There's no such thing as memory at address 0 (aka `NULL` or `nullptr`) - My program is given access to certain ranges for its stack and heap (and other sections). ??? As we advance as programmers, we learn that it's all one big area called "main memory" - linear address space, with a valid range the size of physical RAM - on 32 bit systems, zero to 4 billion - on 64 bit systems, zero to 18 quintillion - _something_ doles out portions of it to be the stack & heap - gives more as the heap needs to grow until you're OOM - other programs are getting portions of that RAM too - too many and we'll run out quick This is the mental model that many programmers end up keeping. --- # Who gives you memory? - OS Program Loader - Creates a stack area - Also loads code segments - And static, global, constant data ??? Let's dive into that "something" that doles out memory.. - Stack: the OS program loader - Also gives the program memory for: - Code segments - Static and global variables - (often forgotten when teachers generalize to "stack" and "heap") -- - The allocator - In charge of the heap - Function of the C library - Uses several ways to get memory from the OS (`mmap`, `sbrk` on Unix, `HeapCreate` on Windows) ??? - Heap: the allocator As game programmers, you probably know more about the allocator than your average programmer. - Default one is part of the C or C++ standard libs - There are plenty of alternate allocators out there, jemalloc dlmalloc etc - Uses different ways to get memory from the OS --- class: middle, center # And now for a bunch of lies --- # Lie 1: Memory is linear ??? Ok, the first lie is linear memory. - Seems to be the one that more people know at least something about. For example: - For a long time, my mental model of Virtual Memory was what I think of as the "Windows 3.1" model - it's that disk space I tell Windows to use in addition to my physical RAM so that I can run this new game -- name: lie1 ### Virtual memory - ***Physical addresses*** are what's used on the computer's memory bus - Set by the manufacturer (or firmware) - Full of holes and things that aren't RAM (like MMIO) ??? First let's talk about where your memory really is. - It's all chips connected to the memory bus - bus addresses chosen by your computer's manufacturer - But memory bus is used for more than just memory. - memory-mapped I/O is the most common way to do I/O on modern machines --- ``` 00 0000 0000 - 00 0005 8000 (352 KiB) Conventional memory 00 0005 8000 - 00 0005 9000 (4 KiB) Reserved by the firmware 00 0005 9000 - 00 0009 e000 (276 KiB) Conventional memory 00 0009 e000 - 00 000a 0000 (8 KiB) Reserved by the firmware 00 0010 0000 - 00 651e 4000 (1.6 GiB) Conventional memory 00 651e 4000 - 00 6522 4000 (256 KiB) EFI Boot Services data 00 6522 4000 - 00 72d7 a000 (219 MiB) Conventional memory 00 72d7 a000 - 00 750f d000 (35 MiB) EFI data 00 750f d000 - 00 750f e000 (4 KiB) ACPI data 00 750f e000 - 00 7e5e b000 (148 MiB) EFI data 00 7e5e b000 - 00 7e84 2000 (2 MiB) Conventional memory 00 7e84 2000 - 00 7f19 e000 (9 MiB) EFI code 00 7f19 e000 - 00 7f66 0000 (4 MiB) Reserved by the firmware 00 7f66 0000 - 00 7f6a b000 (300 KiB) ACPI data (reclaimable) 00 7f6a b000 - 00 7fac 5000 (4 MiB) ACPI data 00 7fac 5000 - 00 8000 0000 (4 MiB) EFI data & code 00 e000 0000 - 00 f000 0000 (256 MiB) MMIO (PCI Express) 00 fe00 0000 - 00 fe01 1000 (68 KiB) MMIO (??) 00 fec0 0000 - 00 fec0 1000 (4 KiB) MMIO (??) 00 fed0 0000 - 00 fed0 1000 (4 KiB) MMIO (HPET) 00 fee0 0000 - 00 fee0 0000 (4 KiB) MMIO (CPU local APIC) 00 ff00 0000 - 01 0000 0000 (16 MiB) MMIO (UART/Serial) 01 0000 0000 - 01 4000 0000 (1 GiB) Conventional memory 01 4000 0000 - 01 4002 b000 (172 KiB) EFI Bootloader code 01 4002 b000 - 08 7100 0000 (28.7 GiB) Conventional memory ``` ??? This is the memory map I pulled from the firmware of a NUC I have (one of the ones that ran TL3 servers at conventions in 2018!). We can see it's got a few contiguous regions of actual memory: - 640 KiB starting at address 0 - 2 GiB (minus 1 MiB) starting 1 MiB into the address space - 30 GiB (minus 240 MiB) starting 4 GiB into the address space And between the 2 GiB and 30 GiB blocks we can see a range of addresses used for MMIO. --- template: lie1 - ***Virtual Addresses*** are what we use in our programs - Translated to physical addresses by the MMU in 4KiB\* _pages_ - _Page Tables_ created by the OS define the virtual -> physical mapping - Page Tables are (usually) unique to a process - every process thinks they have the entirety of the virtual address space to themselves - When accessing a mapping that doesn't exist, the CPU raises a _page fault_ to the OS ??? Ok, back to linear memory: - So, the MMU translates virtual addresses into physical - Usually based on page tables - which are themselves in memory! - This setup allows for _memory protection_ - Page tables are swapped on process switch - Each process thinks it's the only one in memory - impossible to write to another process' memory - Page tables also allow OS to specify R/W/X per-page - When violated, the CPU raises the dreaded _page fault_ Now the mechanism of the _page fault_ might give a hint about the next lie... --- name: lie2 # Lie 2: The OS gave you some memory ```cpp static constexpr size_t GiB = 0x4000'0000; char *novel = new char [2 * GiB]; ``` ??? So, we talked about how your memory allocator gets memory from the OS. Let's dive into that a little more. Let's say I'm allocating a text buffer big enough to write my new novel. What is the value of `novel` here? (It's a pointer. To what?) -- name: lie21 - The OS doesn't allocate any pages, only a _virtual address_ range - Writing to the memory address in `novel` causes a page fault ??? So if the OS didn't allocate any actual memory, what happens when I try to start writing my novel? There's no physical page mapped to the address in `novel`, so boom, I get a page fault and everything explodes right? -- .center[] ??? This is, actually, fine. --- template: lie21 - The OS catches the page fault and allocates a new page to that address - The CPU jumps back to the failed instruction and tries again ??? This is how your CPU's page faults are intended to be used. There are two kinds of CPU errors when running code - _faults_ and _exceptions_. Exceptions are non- recoverable things like _division by zero_ or _unknown opcode_. Faults are things the OS can try to fix. So the CPU issues the fault to your OS, then tries the instruction again _et viola_, the page is there this time and everything goes on like nothing happened.. until you hit the next page and we start again. Maybe you can see how this all ties in to this last lie... --- layout: true # Lie 3: You aren't out of memory --- name: lie31 ```cpp static constexpr size_t GiB = 0x4000'0000; char *novel = new char [2 * GiB]; if (novel == nullptr) std::cout << "novel is null!" << std::endl; else std::cout << "novel is not null!" << std::endl; ``` What does this program print when your OS reports that you have 1GiB of RAM free? ??? I don't know about you, but I had it drilled into me early in my days of learning C that you always, always check the return value of allocating memory because it will return NULL when out of memory. And you should, there are plenty of reasons you might get a NULL value back - but being out of memory is pretty much never one of them. --- ## Swapping Pages .fit[] - The most well-known part of this lie - Write the contents of a page to disk, then give that page to another process - On many platforms the CPU happily helps out by setting "accessed" or "dirty" flags ??? You're probably all aware of "paging" or "swapping". This goes hand-in-hand with the "Windows 3.1" mental model of virtual memory -- you run out of RAM and suddenly your hard disk starts grinding as the OS swaps pages from memory to disk to make room. --- template: lie31 - What does even "free" mean? How do you count what's _NOT_ free? 1. Allocated by `malloc()` or `new`? 2. Committed to a process through a page fault? 3. Ready to be replaced in swap? ??? So given all that, what does "free" even mean? On top of this list, your OS will also make use of physical pages that aren't currently being used as a cache to speed up things like file I/O. Are those cache pages "free" or "in use"? The discrepency between #1 and #2 here points us at a more subtle part of this lie... --- ## Memory Overcommit - Processes generally don't use all the memory they allocate right away, if ever - This lets modern OSes "overcommit" memory - handing out more allocations than the amount of physical RAM - The bet is that actual physical RAM usage won't actually go above the physical limit - The OS can rely on swap to cover the gap if it does ??? Memory overcommitment is the shell game that the OS is playing when it comes to managing physical RAM. Usually this is a good thing and lets you make more use of your system's resources. The problems only start when you get too close to filling your RAM _and_ your swap space. --- template: lie31 - Linux: `novel` will _never_\* be null for OOM reasons - Windows: `novel` will only be null once you've filled both RAM and swap ??? This one is especially OS-dependent. - Linux is technically configurable, but the default that no one ever changes is that it will never say no to "reasonable" requests. - If you try to allocate 10GiB on a machine with 4GiB RAM it will say no - So `novel` is never NULL for OOM reasons - But if you run out of RAM and swap space, the kernel will just start randomly killing processes. - Windows is a bit more conservative - It won't overcommit past the amount that can be backed by swap - But this does mean there can be situations where there's RAM left under-utilized - But still always check the return value of allocations, seriously. --- layout: true --- class: middle # Lies your OS tells you about memory ## 1. Memory is linear ## 2. The OS gave you some memory ## 3. You aren't out of memory ??? Any questions?