On Fri, Aug 24, 2001 at 09:13:26AM +0200, Kurt Garloff wrote: > > > Compile total test1 test2 test3 > > > time (u) size (u+s) (u+s) (u+s) > > > 2.95.3 : 35:29 49773 70.67 2.66 127.54 > > > 2.95.3-k260: 28:57 47265 73.93 2.81 128.29 > > > 2.95.3-k240: 28:31 46949 74.02 2.84 128.17 > > > > > > k260vs295 : (+)22.6% (+)5.3% (-)4.6% (-)5.6% (-)0.6% > > > k240vs295 : (+)24.4% (+)6.0% (-)4.7% (-)6.8% (-)0.5% > > > > Same code with -k240 and a 4x bonus for leaf functions: > > 2953-k240*4: 28:41 46977 74.38 2.81 128.11 > > > > k2404vs295 : (+)23.7% (+)6.0% (-)5.5% (-)5.6% (-)0.4% > > Same machine: > 2.95.3-k300: 29:41 47713 73.78 2.95 128.23 > (+)19.5% (+)4.3% (-)4.4% (-)10.9% (-)0.5% > 2.95.3-k360: 29:43 47365 73.48 2.81 128.13 > (+)19.4% (+)5.1% (-)4.0% (-)5.6% (-)0.5% > 2.95.3-k400: 29:43 47361 72.42 2.80 127.35 > (+)19.4% (+)5.1% (-)2.5% (-)5.3% (+)0.1% 2.95.3-k500: 29:58 47561 72.17 2.82 127.42 (+)18.4% (+)4.7% (-)2.1% (-)6.0% (+)0.1% As this machine is basically idle and has an up-to-date 7.2, I base my fine tuning upon this one. The other results are more for illustration. > > Same code on an Athlon-700 (256MB, SuLi 7.0) but busy in bg with dnetc and > > 2.95.3 : 41:29 52855 76.53 3.08 79.60 > > 2.95.3-k260: 33:20 50119 79.19 3.44 77.99 > > > > k260vs295 : (+)24.5% (+)5.5% (-)3.5% (-)11.7% (+)2.1% > > 2.95.3-k400: 32:54 50227 73.64 3.05 70.39 > (+)26.1% (+)5.2% (+)3.9% (+)1.7% (+)13.1% > > This last result unfortunately has to be taken with care: I had mpg123 > running (remote) instead of xmms, and the CPU usage of the bench program > was slightly higher ... resulting in less cache pollution and better > performance. I'll repeat this test when I'm back at university, however. ith slightly higher load (concluding from ratio elapsed to user+sys) than during the plain 2.95.3 tests: 2.95.3-k400: 34:24 50223 78.68 3.33 79.71 (+)20.1% (+)5.2% (-)2.8% (-)8.1% (-)0.1% Somewhere in between is the truth, so expect 23% compile time improvement, 5% smaller executable, the same speed for test1, slight disadavantage (5%) in the mini-test2 and an advantage in test3 (6%) for this machine. > Last but not least: A third machine: 2xiPIII-700, 640MB, SuLi 7.0, > with background load (2xdnetc): > > 2.95.3 : - - 68.12 2.52 100.86 > 2.95.3-k240: - 47154 71.37 2.73 159.60 > (-)4.8% (-)8.3% (-)58.2% > 2.95.3-k300: - - 68.24 2.65 100.16 > (-)0.2% (-)5.2% (+)0.7% 2.95.3-k400: - 47404 66.33 2.74 105.79 (+)2.7% (-)8.7% (-)4.9% > 2.95.3-k420: - 47590 66.49 2.66 105.75 > (+)2.5% (-)5.6% (-)4.8% > 2.95.3-k480: - 47778 67.26 2.59 105.73 > (+)1.3% (-)2.8% (-)4.8% Conclusion: The results are mixed, but except for the extremely short test2 (which I do not really consider representative therefore), you can see both degradations and improvements of runtime performance depending on the CPU and the -finline-limit-X setting. It looks like the K7 would tend to profit more, but this might also be a consequence of it not being idle. (If you schedule another process from time to time, more compact code is getting some advantage.) * If we want to be very conservative, I can bump the default inline-limit to 1000, which will lead to almost unchanged compile times and code sizes, except for some rare cases, were we inline way too heavily, and then it's certainly no mistake to limit it to improve both compile and runtime performance. * If we're less conservative, we should go with the 400/3 I built. We risk that some apps which rely on heavy inlining are compiled significantly faster, are a bit smaller and vary +-5% in performance as compared to std g++-2.95.3. * Performance testing on other archs may be a good idea. * Note that -O3 and -Os should work a bit better now as well (all my tests were done with -O2, the code has the keyword inline at the interesting places, maybe even at too many places.) Regards, -- Kurt Garloff Eindhoven, NL GPG key: See mail header, key servers Linux kernel development SuSE GmbH, Nuernberg, DE SCSI, Security