class: center, middle # What happens before `main()`? ??? - Thanks to Christian for suggesting the topic. - not exactly "Advanced C++": C/C++ but also any program - presentation format - easier to show things than terminal - happy to dive into things more --- class: middle # A contrived example ```c #include <stdio.h> int main(int argc, const char **argv) { for (int i = 0; i < argc; ++i) printf("'%s' ", argv[i]); puts("\n"); return 123; } ``` ```bash justin@mars-wsl ~> clang -o p1 p1.c justin@mars-wsl ~> ./p1 one two three './p1' 'one' 'two' 'three' justin@mars-wsl ~> echo $status # program exit code. In bash, use `$?` 123 justin@mars-wsl ~> ``` ??? - Here's super simple program `p1` - barely more than Hello World - Assuming people are familiar with `argc` and `argv` - Compile and run, does what we expect: prints args - You can't see it, but `fish` shell turned red - Because program returned non-0, the 123 from `main()` - Aside: note that I am using Linux / WSL - I know underpinnings better in Linux - All my examples will be Linux - Almost everything is conceptually identical on Windows - I'll call out interesting differences --- class: center, middle # Ok, so what _really_ just happened? ??? - What happened? - We typed a string of text at a command line - our program ran - it printed the arguments from the command line - it returned a value which ended up in `$status` - Seems straightforward, what we would expect - But what questions can we ask to dig deeper? [Anyone?] - How did the shell actually make our program run? - How did the function return value from `main()` get back to the shell? - How did our program's main get passed the correct arguments? --- class: middle # Three main steps 1. **A system call from the parent process** - Linux: `execl("p1", "one", "two", "three", NULL)` - Windows: `CreateProcess("p1", "./p1 one two three", ...)` 2. The OS loads the program into a child process 3. The OS runs the child process ??? - Can break what happened into three parts - The parent process' part - OS loading the program - OS running the child process - Going to gloss over the system call semantics / implementation - This is different in a few ways between OSs - The `exec` family on Linux, `CreateProcess` family on Windows - Mainly, calling a system library to say "run this program with these args" And then, the OS starts loading our program. --- # Loading: Layout of a binary (ELF) .left-column[ - Sections - `.text`: Executable code (marked executable) - `.data`, `.bss`: Global and static variables (marked writeable) - `.rodata`: Constant global data, and literal values (neither) - Program segments - Groups of sections with the same attributes, to be loaded into memory contiguously - Execution entrypoint ] .right-column[ <img src="elf_layout.svg" height="350"> .footnote[.tiny[Image based on original by SurueƱa, <a href="https://creativecommons.org/licenses/by-sa/3.0">CC BY-SA 3.0</a>, via Wikimedia Commons]] ] ??? - Layout of an executable binary file on disk - ELF: Executable and Linkable Format - Windows uses PE. Nearly identical, more legacy headers. - Binaries laid out to be easy to load - Bunch of sections' data - Program segments group them into similar contiguous memory - Just copy them right from file into memory - On modern OSs, it's even lazier than that - Mark a region of memory as mapping file data. Done. - When a `#PF` happens, copy it then - I can got into virtual memory another time, lots of cool stuff there - One last important thing in the binary: entrypoint address - Where the OS jumps to to start running your program But before we can do that.. --- # Running: Building a stack .left-column[ ### The OS: Before calling entrypoint - allocates stack space for the `main()` thread - fills the "bottom" of the stack with data - OS sets the stack pointer `rsp` ### And now we jump to our program! Done, right? ] .right-column[ ``` ... + 0x80: 'SHELL=fish\0' + 0x74: 'USER=justin\0' + 0x6d: 'three\0' + 0x69: 'two\0' + 0x65: 'one\0' + 0x60: './p1\0' ... ... // More init data ... + 0x38: NULL (env[2]) + 0x30: [sp + 0x80 ] (env[1]) + 0x28: [sp + 0x74 ] (env[0]) + 0x20: NULL (argv[3]) + 0x18: [sp + 0x6d ] (argv[2]) + 0x10: [sp + 0x65 ] (argv[1]) + 0x08: [sp + 0x60 ] (argv[0]) rsp + 0x00: 3 (argc) ``` ] ??? - To run more than simple asm, we need a stack. - Waaay back in the old days, before AMD saved us - Only had 6, maybe 7 general purpose registers - function arguments were mostly passed on the stack - makes a lot of sense for OS to put `main()`s args on the stack - OS allocates some memory for the stack - Pushes initialization data, env vars, and `main()`s args - See example, "top" of stack at bottom And now we're done, right? We've got `main()`'s arguments on the stack and the OS jumps to it? --- # Running: The `_start()` function ### So where is this `_start()` function? - Linux: `crt0.o` (not even a library!) - Windows: ??? (Provided dynamically in MSVCRT, maybe?) ### So _why_ is this `_start()` function? - Call all necessary initialization for the runtime libraries - Call all global and `static` constructors - Arrange the program arguments for `main()`, and call it - Call all functions registered with `atexit()` - Clean up and call* `exit(mains_return_value)` ??? Not quite - `main()` is not the entrypoint. - On ELF, entrypoint usually called `_start()` - Provided with system libraries - Just an object file the linker sneaks into your program - I don't actually know where or what it's called on Windows [Anyone?] - Microsoft intentionally obfuscates it, like their syscalls - Good for keeping people for relying on implementation details - `_start()` - sets up the runtime libraries like glibc - sets up arguments for `main()`, it's not the 90s anymore - calls `main()` - runs library cleanup - tells the OS we're done and the exit code Ok, so _now_ we're done, right? We've seen where `main()` gets called. --- # The loader strikes back: dynamic loading - The OS loads `p1` into memory, but sees that it's dynamically linked - The OS loads the dynamic loader into the same process and runs _that_ instead of `p1` - The dynamic loader loads the needed shared libraries - The dynamic loader looks up the address of every dynamic symbol in `p1`, and populates its _global offset table_ and _procedure linkage table_. - The dynamic loader jumps to `p1` just like the OS would have ??? One more thing - we skipped a step in loading. - Almost all modern programs have _some_ dynamic linking - Windows hides syscalls behind DLLs - Linux adds the VDSO into every process, whether it wants it or not - glibc is usually dynamically linked - Dynamic linking implementation is probably the part with the most differences on Windows - Still similar, concepts carry over - Lets talk about `p1` again - The OS loads dynamic lodader on top of `p1` - The dynamic loader is just another program! - Jumps to dynamic loader's entrypoint instead - Loads libs from disk or existing memory (hence, shared libs) - "Binding" dynamic symbols - Looks up address of every symbol listed as dynamic in `p1` from other libs - Records their address in GOT or PLT - PLT is a lazy-binding optimization - And then the dynamic loader calls `p1`'s entrypoint --- class: middle # That's it! We've called `main()`!