stream对比数据
总结下几个CPU用stream测试访问内存的RT以及抖动和带宽对比数据,重点关注带宽,这个测试中时延不重要
| 最小RT | 最大RT | 最大copy bandwidth | 最小copy bandwidth |
---|
申威3231(2numa node) | 7.09 | 8.75 | 2256.59 MB/sec | 1827.88 MB/sec |
飞腾2500(16 numa node) | 2.84 | 10.34 | 5638.21 MB/sec | 1546.68 MB/sec |
鲲鹏920(4 numa node) | 1.84 | 3.87 | 8700.75 MB/sec | 4131.81 MB/sec |
海光7280(8 numa node) | 1.38 | 2.58 | 11591.48 MB/sec | 6206.99 MB/sec |
海光5280(4 numa node) | 1.22 | 2.52 | 13166.34 MB/sec | 6357.71 MB/sec |
Intel8269CY(2 numa node) | 1.12 | 1.52 | 14293.68 MB/sec | 10551.71 MB/sec |
Intel E5-2682(2 numa node) | 1.58 | 2.02 | 10092.31 MB/sec | 7914.25 MB/sec |
AMD EPYC 7T83(4 numa node) | 0.49 | 1.39 | 32561.30 MB/sec | 11512.93 MB/sec |
Y7 | 1.83 | 3.48 | 8764.72 MB/sec | 4593.25 MB/sec |
从以上数据可以看出这5款CPU性能一款比一款好,飞腾2500慢的core上延时快到intel 8269的10倍了,平均延时5倍以上了。延时数据基本和单核上测试sysbench TPS一致。性能差不多就是:常数 * 主频/RT。
lat_mem_rd 对比数据
用不同的node上的core 跑lat_mem_rd测试访问node0内存的RT,只取最大64M的时延,时延和node距离完全一致
| RT变化 |
---|
飞腾2500(16 numa node) | core:0 149.976 core:8 168.805 core:16 191.415 core:24 178.283 core:32 170.814 core:40 185.699 core:48 212.281 core:56 202.479 core:64 426.176 core:72 444.367 core:80 465.894 core:88 452.245 core:96 448.352 core:104 460.603 core:112 485.989 core:120 490.402 |
鲲鹏920(4 numa node) | core:0 117.323 core:24 135.337 core:48 197.782 core:72 219.416 |
海光7280(8 numa node) | numa0 106.839 numa1 168.583 numa2 163.925 numa3 163.690 numa4 289.628 numa5 288.632 numa6 236.615 numa7 291.880 分割行 enabled die interleaving core:0 153.005 core:16 152.458 core:32 272.057 core:48 269.441 |
海光5280(4 numa node) | core:0 102.574 core:8 160.989 core:16 286.850 core:24 231.197 |
海光7260(1 numa node) | core:0 265 |
Intel 8269CY(2 numa node) | core:0 69.792 core:26 93.107 |
申威3231(2numa node) | core:0 215.146 core:32 282.443 |
AMD EPYC 7T83(4 numa node) | core:0 71.656 core:32 80.129 core:64 131.334 core:96 129.563 |
Y7(2Die,2node,1socket) | core:8 42.395 core:40 36.434 core:104 105.745 core:88 124.384
core:24 62.979 core:8 69.324 core:64 137.233 core:88 127.250
133ns 205ns (待测) |