Skip to content

Flags

  • Current CFLAGS:
-pipe -O2 -flto=auto -flto-compression-level=3 -fuse-linker-plugin -fstack-protector-strong -fstack-clash-protection -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-plt -march=x86-64-v3 -malign-data=cacheline -mtls-dialect=gnu2
  • Current CXXFLAGS are identical to CFLAGS
  • Current LDFLAGS:
-Wl,-O1,-s,-z,noexecstack,-z,now,-z,pack-relative-relocs,-z,relro,-z,x86-64-v3,--as-needed,--gc-sections,--sort-common,--hash-style=gnu
  • Use pipes rather than temporary files
  • Uses more RAM but reduces disk usage
  • Does it get ignored by clang?
  • Compiling with -g0 or without -g at all results in no debugging information in the binaries
  • Some build systems might misinterpret -g0 as -g; this is a bug and should be reported to the relative upstream
  • Will spawn N threads based on the number of threads; similar to make -j
  • Use instead of -flto alone to get rid of the 128 LTRANS serial jobs
  • gcc’s version of ThinLTO is WHOPR, previously it was enabled by using -fwhopr, but now it has become the default mode for LTO and -fwhopr was removed from gcc’s options; -fno-fat-lto-objects is now the default
  • Available when using zstd as a backend for LTO as it results in smaller binaries
  • Graphite is not well maintained in gcc and will likely end up being removed entirely
  • Most of its developers have moved to Polly in LLVM/Clang
  • Graphite cannot effectively optimize compared to the baseline gcc
  • The optimizations it was supposed to make are just being implemented via other methods
  • It’s not necessarily buggy, but its benefits are rather doubtful
  • https://dl.acm.org/doi/full/10.1145/3674735
  • Required for isl to work
  • Used to be known as -floop-optimize-isl
  • A new way to implement Graphite.
  • Replaces -floop-interchange, -ftree-loop-linear, -floop-strip-mine and -floop-block
  • Still considered experimental?
  • Enabled by default if gcc is built with lto enabled
  • There is no actual guarantee that -fuse-linker-plugin will be used in cases where gcc is built without lto support and binutils is built without plugins support
  • This means that not using this flag in the case above, it might resort to -fwhole-program which is not a good idea, so use it instead so it can rely on a linker plugin and forward the lto stuff to some other linker (e.g. mold) successfully
  • Abandoned and needs a major redesign
  • Does not scale, at least for now according to openSUSE
  • Increases memory usage and compilation time
  • Prone to having the compiler segfault with an internal compiler error which leads to all kinds of weird errors like duplicate case value (affected packages: bash, gcc, inetutils, libarchive, libedit, netbsd-curses, util-linux)
  • -funwind-tables is enabled by default, but gcc’s documentation says that you normally do not need to enable this option; instead, a language processor that needs this handling enables it on your behalf

-fmodulo-sched, -fmodulo-sched-allow-regmoves, -fgcse-sm and -fgcse-las

Section titled “-fmodulo-sched, -fmodulo-sched-allow-regmoves, -fgcse-sm and -fgcse-las”
  • Aggressive common subexpression elimination (cse) and scheduling (particularly modulo scheduling) can dramatically increase register pressure leading to more loads and stores, making performance worse than it would be without them, especially on register-starved machines like x86 it makes sense to have some of these off by default
  • Should in theory help decrease register pressure before allocation
  • Can decrease code size by preventing register pressure and subsequent spills in register allocation
  • No idea if it works with -fschedule-insns (which is not enabled by default at -O2 and -Os)
  • No idea if it works with -fschedule-insns2 (which is enabled by default at -O2 and -Os)
  • Permits the speculative motion of some load instructions before register allocation to minimize execution stalls due to data dependencies
  • Works with -fschedule-insns
  • -fschedule-insns2 is enabled by default at -O2 and -Os
  • Similar to -fsched-spec-load
  • Avoid options with dangerous in the name..
  • This option is only for shared libraries/dynamic linking and breaks static binaries and libraries
  • Makes code built with -fPIC and LTO faster, and improves performance in general; might cause subtle ABI breakages
  • Breaks LD_PRELOAD which in turn breaks custom memory allocators like mimalloc
  • Contrary to popular belief, enabling this flag globally is safe (unless interposing symbols is required, for example when using different allocators on system libraries), but the reason for it not being enabled by default is to comply with the ELF standard. In contrast, this flag is part of the default when using Clang
  • https://maskray.me/blog/2021-05-09-fno-semantic-interposition
  • This takes a number and by default GCC only enables it for PowerPC, but disables it for other architectures, also it is not supported by clang apparently: https://reviews.llvm.org/D4565
  • Has no use when -falign-functions is not used
  • Enabled after decisions by PGO and shouldn’t be manually used everywhere; may cause regressions and produce bigger code that may or may not be fast

-ffunction-sections, -fdata-sections and -Wl,--gc-sections

Section titled “-ffunction-sections, -fdata-sections and -Wl,--gc-sections”
  • It is better not to explicitly specify these options globally as we don’t know whether they will be passed to build an executable or a shared library (passing -fpie/-fPIE when building a shared library is not a good thing..)
  • It is better to have gcc configured with --enable-default-pie so that it knows when to pass these options
  • These options do not contradict with -fno-plt
  • -fno-pic can only be used by executables
  • -fpic can be used by both executables and shared objects
  • -fpie can only be used by executables
  • https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected
  • Speeds up compilation time when -g is used
  • Does not make sense when -g is not used
  • Enabled for -O2 and disabled for -Os and -Oz
  • Increases register pressure which can spills and increase code size
  • Removes redundant load instructions which can reduce register pressure by reusing loaded values
  • Might reduce code size
  • Enables simple constant folding optimizations
  • Enabled by default on most targets; no need to mess with it
  • x86-64 does not have delay slots rendering this “legacy” optimization irrelevant
  • Experimental option that might produce unreliable results and increase code size
  • Do not enable manually; let PGO decide
  • Modulo scheduling is a software pipelining technique; thus it might increase code size with no proved performance gain
  • Most of x86-64-v3 have hardware pipelining

-fselective-scheduling, -fselective-scheduling2, -fsel-sched-pipelining and -fsel-sched-pipelining-outer-loops

Section titled “-fselective-scheduling, -fselective-scheduling2, -fsel-sched-pipelining and -fsel-sched-pipelining-outer-loops”
  • IA64 is probably the only target left requiring selective scheduling
  • Selective scheduling itself is in a poor state nowadays
  • -fsel-sched-pipelining has no effect without -fselective-scheduling or -fselective-scheduling2
  • -fsel-sched-pipelining-outer-loops has no effect without -fsel-sched-pipelining
  • https://gcc.gnu.org/pipermail/gcc-patches/2025-August/692322.html
  • Prevents a lot of optimizations from gcc to produce output suitable for live-patching
  • Does not work with -flto
  • Isolates UB paths from main control and turns them into a trap
  • gcc’s -O2 might enabled this in the future
  • -O2 uses the value 5, while -Os uses 1 which is more aggressive
  • x86-64-v3 provides better performance and battery life
  • Automatically detected on modern 64-bit hosts and Linux targets
  • Redundant when -fomit-frame-pointer is used (which is the default with -O2 and -Os)
  • Remove -flto=auto -flto-compression-level=3 -fuse-linker-plugin

-Wl,--gc-sections and -Wl,-z,start-stop-gc

Section titled “-Wl,--gc-sections and -Wl,-z,start-stop-gc”
  • Provides “Stack Execution Protection”
  • Should be the default behavior by gcc; does not mark the stack as executable by default, and warns when that happens
  • Enables linker optimizations which can reduce code size
  • No higher value
  • Ignored by mold
  • It does not make sense to compress “nonexistant” debug sections as we’re stripping everything with -s

--no-keep-memory and --reduce-memory-overheads

Section titled “--no-keep-memory and --reduce-memory-overheads”
  • Make memory consumption reasonable especially with the optimizations we are using (mainly LTO), at the expense of a slight increase in link time

-x, --discard-all and -X, --discard-locals

Section titled “-x, --discard-all and -X, --discard-locals”
  • Using -z,separate-code is good for security
  • Adding --rosegment when -z,separate-code is used makes resulting binaries smaller
  • Using -z,noseparate-code is a bad idea; remember how passing --disable-separate-code to binutils bloated every executable and shared library by at least 2 MB (for better huge page support)

Default program headers: With traditional -z noseparate-code, GNU ld defaults to a RX/R/RW program header layout. With -z separate-code (default on Linux/x86 from binutils 2.31 onwards), GNU ld defaults to a R/RX/R/RW program header layout. ld.lld defaults to R/RX/RW(RELRO)/RW(non-RELRO). With —rosegment, ld.lld uses RX/RW(RELRO)/RW(non-RELRO). Placing all R before RX is preferable because it can save one program header and reduce alignment costs. ld.lld’s split of RW saves one maxpagesize alignment and can make the linked image smaller. This breaks some assumptions that the (so-called) “text segment” precedes the (so-called) “data segment”.

  • If you use bfd’s noseparate-code or lld’s --no-rosegment, .rodata and .text will be placed in the same PT_LOAD segment
  • --no-rosegment combines the read-only and the RX segments (output file will consume less address space at run-time)
  • AArch64 and PowerPC64 have a default MAXPAGESIZE of 65536 so -z noseparate-code default ensures that they will not experience unnecessary size increase
  • In -z noseparate-code layouts waste half a huge page on unrelated content and switching to -z separate-code reclaims the benefits of the half huge page but increases the file size
  • ld.bfd’s -z separate-code is essentially split into two options in lld: -z separate-code and —rosegment.
  • GitHub actions for rad uses ubuntu-latest which does not recognize --rosegment:
Nim Output /usr/bin/ld: unrecognized option '--rosegment'
... /usr/bin/ld: use the --help option for usage information
... collect2: error: ld returned 1 exit status