Here's some demo to show what I mean: garloff@pckurt:~/Physics/numerix/lina/include $ time g++ -O2 -o tbci_test tbci_test.cc -DTBCI_USE_COLS -DGCC295_FRIEND_BUG -DHAVE_UNISTD_H -finline-limit-X time peak stripped exec. X u+s mem size time [s] [MB] [B] [s] 150 12.22 21 136028 0.18 250 15.02 22 158668 0.17 400 17.67 29 166472 0.15 600 16.16 29 160192 0.16 1000 22.26 35 188444 0.16 1500 24.52 37 200752 0.15 2000 30.42 44 210308 0.15 2500 50.33 77 268800 0.15 3500 154.44 204 351528 0.15 5000 79.15 109 371008 0.15 10000 463.59 513 552088 0.15 Execution times are u+s, and the best of 5 runs >/dev/null is taken Note taht it is not a benchmark program, and the short execution time does not allow for more than a first check of runtime effects. Basically the program does not much ... To create a benchmark, I could put loops around the numerical operations. I'll do so for some future testing. (I can also post the code for the program, if somebody wants to read it. The reason for the crazy inlining things is probably that I somewhere return some std::string inside a class decl, i.e. the stuff gets inlined. And g++ allegedly has a crazy lot to do for those.) But his was mainly to see how much harm this crazy inline limit of 10000 can create. You just blow up your executables and your compiler resources for basically nothing. 513MB for compiling a couple of k source code! (I did the test on a machine with 640MB of RAM, otherwise compilation would just have failed with OOM!)