Skip to content

Flags

  • Current CFLAGS:
-pipe -O2 -fgraphite-identity -floop-nest-optimize -flto=auto -flto-compression-level=19 -fuse-linker-plugin -fstack-protector-strong -fstack-clash-protection -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-plt -march=x86-64-v3 -mfpmath=sse -mabi=sysv -malign-data=cacheline -mtls-dialect=gnu2
  • Current CXXFLAGS are identical to CFLAGS
  • Current LDFLAGS:
-Wl,-O1,-s,-z,noexecstack,-z,now,-z,pack-relative-relocs,-z,relro,-z,x86-64-v3,--as-needed,--gc-sections,--sort-common,--hash-style=gnu

CFLAGS

-pipe

  • Use pipes rather than temporary files
  • Uses more RAM but reduces disk usage

-g0

  • Compiling with -g0 or without -g at all results in no debugging information in the binaries
  • Some build systems might misinterpret -g0 as -g; this is a bug and should be reported to the relative upstream

-flto=auto

  • Will spawn N threads based on the number of threads; similar to make -j
  • Use instead of -flto alone to get rid of the 128 LTRANS serial jobs
  • gcc’s version of ThinLTO is WHOPR, previously it was enabled by using -fwhopr, but now it has become the default mode for LTO, and was removed from gcc’s options

-flto-compression-level=19

  • Available when using zstd as a backend for LTO as it results in smaller binaries
  • Check if level 22 is supported

-fgraphite-identity

  • Graphite is not well maintained in gcc and will likely end up being removed entirely
  • Most of its developers have moved to Polly in LLVM/Clang?
  • The optimizations it was supposed to make are just being implemented via other methods
  • It’s not necessarily buggy, but its benefits are rather doubtful

-floop-nest-optimize

  • Required for isl to work
  • Used to be known as -floop-optimize-isl
  • A new way to implement Graphite.
  • Replaces -floop-interchange, -ftree-loop-linear, -floop-strip-mine and -floop-block
  • Still considered experimental?

-fno-plt

-fuse-linker-plugin

  • Enabled by default if gcc is built with lto enabled
  • There is no actual guarantee that -fuse-linker-plugin will be used in cases where gcc is built without lto support and binutils is built without plugins support
  • This means that not using this flag in the case above, it might resort to -fwhole-program which is not a good idea, so use it instead so it can rely on a linker plugin and forward the lto stuff to some other linker (e.g. mold) successfully

-fdevirtualize-at-ltrans

  • gcc disables it by default as it increases the size of streamed data
  • In practice, it is buggy and really slow

-fipa-pta

  • Abandoned and needs a major redesign
  • Does not scale, at least for now according to openSUSE
  • Increases memory usage and compilation time
  • Prone to having the compiler segfault with an internal compiler error which leads to all kinds of weird errors like duplicate case value (affected packages: bash, gcc, inetutils, libarchive, libedit, netbsd-curses, util-linux)

-fno-unwind-tables

  • -funwind-tables is enabled by default, but gcc’s documentation says that you normally do not need to enable this option; instead, a language processor that needs this handling enables it on your behalf

-fno-asynchronous-unwind-tables

-O3

-fmerge-all-constants

-fmodulo-sched, -fmodulo-sched-allow-regmoves, -fgcse-sm and -fgcse-las

  • Aggressive common subexpression elimination (cse) and scheduling (particularly modulo scheduling) can dramatically increase register pressure leading to more loads and stores, making performance worse than it would be without them, especially on register-starved machines like x86 it makes sense to have some of these off by default

-fsched-pressure

  • Enabled by default when -fschedule-insns is used at -O2 and higher

-fira-loop-pressure

-fno-semantic-interposition

  • Makes code built with -fPIC and LTO faster, and improves performance in general, but breaks LD_PRELOAD which in turn breaks allocators like mimalloc
  • Contrary to popular belief, enabling this flag globally is safe (unless interposing symbols is required, for example when using different allocators on system libraries), but the reason for it not being enabled by default is to comply with the ELF standard. In contrast, this flag is part of the default when using Clang. This option is also only for shared libraries/dynamic linking and breaks static libraries

-floop-parallelize-all

-fvariable-expansion-in-unroller

  • This takes a number and by default GCC only enables it for PowerPC, but disables it for other architectures, also it is not supported by clang apparently: https://reviews.llvm.org/D4565

-falign-functions=32

-flimit-function-alignment

  • Has no use when -falign-functions is not used

-ftracer and -funroll-loops

  • Enabled after decisions by PGO and shouldn’t be manually used everywhere; may cause regressions and produce bigger code that may or may not be fast

-ffunction-sections and -fdata-sections

  • According to GCC’s optimization manual, these options affect code generation, and should only be used when there are significant benefits from doing so
  • They make the assembler and linker create larger and slower objects and executables
  • Prevent optimizations by the compiler and assembler using relative locations inside a translation unit since the locations are unknown until link time
  • An example of such an optimization is relaxing calls to short call instructions

-fvisibility-inlines-hidden

-march=x86-64-v3

  • x86-64-v3 provides better performance and battery life

Disable LTO

  • Remove -flto=auto -flto-compression-level=19 -fuse-linker-plugin

LDFLAGS

-Wl,--gc-sections

-Wl,-z,noexectack

  • Provides “Stack Execution Protection”
  • Should be the default behavior by gcc; does not mark the stack as executable by default, and warns when that happens

-Wl,-O1

  • No higher value
  • Ignored by mold

Resources