JMH 測試 stream 平行處理多執行緒

效能測試：平行處理比較快？

謝長洺 2020/11/27 10:38:50

0 1236

在回饋客戶硬體建議規格時，往往依據經驗列出了可支持應用系統順暢運行的多核心中央處理器的需求清單。為了評估應用系統是否擅用了多核心處理器以提高執行效能，可使用Java Microbenchmark Harness (JMH)測試工具。JMH是由OpenJDK社群團隊開發維護，可對個別method進行測試，精度可達到十億分之一秒，深度應用可參考JMH官方網站。

本篇由建立maven專案開始，由JMH測試三種由1累加至1,000,000,000方法的處理效能，專案目錄架構如下圖：

首先規劃好pom.xml依賴環境：

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>thinkpower</groupId>
  <artifactId>test</artifactId>
  <version>1.0.0</version>
  <packaging>jar</packaging>
  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
  </properties>
  <dependencies>
    <dependency>
      <groupId>org.openjdk.jmh</groupId>
      <artifactId>jmh-core</artifactId>
      <version>1.26</version>
    </dependency>
    <dependency>
      <groupId>org.openjdk.jmh</groupId>
      <artifactId>jmh-generator-annprocess</artifactId>
      <version>1.26</version>
      <scope>provided</scope>
    </dependency>
  </dependencies>
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.8.0</version>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>3.2.1</version>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <finalName>test</finalName>
              <transformers>
                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                  <mainClass>org.openjdk.jmh.Main</mainClass>
                </transformer>
              </transformers>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

接著撰寫測試主程式AccumulationBenchmark.java：

package thinkpower.test;

import java.util.concurrent.TimeUnit;
import java.util.stream.LongStream;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.TearDown;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Threads(1)
@Fork(value = 2, jvmArgs = {"-Xms3G", "-Xmx3G"})
@Measurement(iterations = 5)
@Warmup(iterations = 5)
public class AccumulationBenchmark {

  private static final long N = 1_000_000_000L;

  @Benchmark
  public long iterative() {
    long result = 0;

    for (long i = 1L; i <= N; i++) {
      result += i;
    }

    return result;
  }

  @Benchmark
  public long sequential() {
    return LongStream
        .rangeClosed(1, N)
        .reduce(0L, Long::sum);
  }

  @Benchmark
  public long parallel() {
    return LongStream
        .rangeClosed(1, N)
        .parallel()
        .reduce(0L, Long::sum);
  }

  @TearDown(Level.Invocation)
  public void tearDown() {
    System.gc();
  }
}

其中@Thread為指定測試時會使用的執行緒數目，本篇將在分別使用1及2個執行緒的基礎上比較效能。

iterative()使用基本的for each累加；sequential()採用LongStream單執行緒處理；parallel()採用LongSteam之平行處理版本。

經過maven打包編譯之後，產生test.jar檔案，在命令列中輸入：

java -jar .\test.jar AccumulationBenchmark

系統將開始熱身與測試，並依序輸出結果：

java -jar .\test.jar AccumulationBenchmark
# JMH version: 1.26
# VM version: JDK 11.0.9, OpenJDK 64-Bit Server VM, 11.0.9+11
# VM invoker: C:\Program Files\Java\jdk-11.0.9+11\bin\java.exe
# VM options: -Xms3G -Xmx3G
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: thinkpower.test.AccumulationBenchmark.iterative

# Run progress: 0.00% complete, ETA 00:10:00
# Fork: 1 of 2
# Warmup Iteration   1: 4456.206 ms/op
# Warmup Iteration   2: 5946.085 ms/op
# Warmup Iteration   3: 5940.744 ms/op
# Warmup Iteration   4: 5930.648 ms/op
# Warmup Iteration   5: 5956.760 ms/op
Iteration   1: 5932.714 ms/op
Iteration   2: 5933.015 ms/op
Iteration   3: 5930.179 ms/op
Iteration   4: 5938.809 ms/op
Iteration   5: 5936.512 ms/op

# Run progress: 16.67% complete, ETA 00:10:04
# Fork: 2 of 2
# Warmup Iteration   1: 4449.944 ms/op
# Warmup Iteration   2: 5933.952 ms/op
# Warmup Iteration   3: 5935.393 ms/op
# Warmup Iteration   4: 5940.638 ms/op
# Warmup Iteration   5: 5926.879 ms/op
Iteration   1: 5921.653 ms/op
Iteration   2: 5932.864 ms/op
Iteration   3: 5929.409 ms/op
Iteration   4: 5926.152 ms/op
Iteration   5: 5926.998 ms/op

Result "thinkpower.test.AccumulationBenchmark.iterative":
  5930.830 ±(99.9%) 7.670 ms/op [Average]
  (min, avg, max) = (5921.653, 5930.830, 5938.809), stdev = 5.073
  CI (99.9%): [5923.161, 5938.500] (assumes normal distribution)

# JMH version: 1.26
# VM version: JDK 11.0.9, OpenJDK 64-Bit Server VM, 11.0.9+11
# VM invoker: C:\Program Files\Java\jdk-11.0.9+11\bin\java.exe
# VM options: -Xms3G -Xmx3G
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: thinkpower.test.AccumulationBenchmark.parallel

# Run progress: 33.33% complete, ETA 00:08:02
# Fork: 1 of 2
# Warmup Iteration   1: 1253.240 ms/op
# Warmup Iteration   2: 1243.111 ms/op
# Warmup Iteration   3: 1248.442 ms/op
# Warmup Iteration   4: 1267.132 ms/op
# Warmup Iteration   5: 1281.695 ms/op
Iteration   1: 1710.087 ms/op
Iteration   2: 7878.918 ms/op
Iteration   3: 8210.969 ms/op
Iteration   4: 8593.274 ms/op
Iteration   5: 9107.577 ms/op

# Run progress: 50.00% complete, ETA 00:06:15
# Fork: 2 of 2
# Warmup Iteration   1: 1649.726 ms/op
# Warmup Iteration   2: 1714.278 ms/op
# Warmup Iteration   3: 1720.990 ms/op
# Warmup Iteration   4: 1713.735 ms/op
# Warmup Iteration   5: 1740.331 ms/op
Iteration   1: 1713.652 ms/op
Iteration   2: 1722.397 ms/op
Iteration   3: 2491.294 ms/op
Iteration   4: 9500.024 ms/op
Iteration   5: 9537.059 ms/op

Result "thinkpower.test.AccumulationBenchmark.parallel":
  6046.525 ±(99.9%) 5449.336 ms/op [Average]
  (min, avg, max) = (1710.087, 6046.525, 9537.059), stdev = 3604.398
  CI (99.9%): [597.190, 11495.861] (assumes normal distribution)

# JMH version: 1.26
# VM version: JDK 11.0.9, OpenJDK 64-Bit Server VM, 11.0.9+11
# VM invoker: C:\Program Files\Java\jdk-11.0.9+11\bin\java.exe
# VM options: -Xms3G -Xmx3G
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: thinkpower.test.AccumulationBenchmark.sequential

# Run progress: 66.67% complete, ETA 00:04:09
# Fork: 1 of 2
# Warmup Iteration   1: 15654.114 ms/op
# Warmup Iteration   2: 5325.045 ms/op
# Warmup Iteration   3: 6018.046 ms/op
# Warmup Iteration   4: 6051.297 ms/op
# Warmup Iteration   5: 6009.306 ms/op
Iteration   1: 6144.531 ms/op
Iteration   2: 6029.512 ms/op
Iteration   3: 6017.933 ms/op
Iteration   4: 6019.122 ms/op
Iteration   5: 6028.526 ms/op

# Run progress: 83.33% complete, ETA 00:02:04
# Fork: 2 of 2
# Warmup Iteration   1: 15624.997 ms/op
# Warmup Iteration   2: 5321.495 ms/op
# Warmup Iteration   3: 6049.290 ms/op
# Warmup Iteration   4: 6008.154 ms/op
# Warmup Iteration   5: 6017.732 ms/op
Iteration   1: 6023.221 ms/op
Iteration   2: 6046.884 ms/op
Iteration   3: 6029.584 ms/op
Iteration   4: 6019.114 ms/op
Iteration   5: 6028.419 ms/op

Result "thinkpower.test.AccumulationBenchmark.sequential":
  6038.685 ±(99.9%) 57.640 ms/op [Average]
  (min, avg, max) = (6017.933, 6038.685, 6144.531), stdev = 38.125
  CI (99.9%): [5981.045, 6096.324] (assumes normal distribution)

# Run complete. Total time: 00:12:26

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                         Mode  Cnt     Score      Error  Units
AccumulationBenchmark.iterative   avgt   10  5930.830 ±    7.670  ms/op
AccumulationBenchmark.parallel    avgt   10  6046.525 ± 5449.336  ms/op
AccumulationBenchmark.sequential  avgt   10  6038.685 ±   57.640  ms/op

在單一執行緒環境下，三者程式寫法的效能測試結果並無顯著差異，parallel()及sequential()還須付出Stream轉換成本。

接著測試在兩個執行緒環境下所得到的結果：

java -jar .\test.jar AccumulationBenchmark
# JMH version: 1.26
# VM version: JDK 11.0.9, OpenJDK 64-Bit Server VM, 11.0.9+11
# VM invoker: C:\Program Files\Java\jdk-11.0.9+11\bin\java.exe
# VM options: -Xms3G -Xmx3G
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 2 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: thinkpower.test.AccumulationBenchmark.iterative

# Run progress: 0.00% complete, ETA 00:10:00
# Fork: 1 of 2
# Warmup Iteration   1: 4851.613 ms/op
# Warmup Iteration   2: 6012.493 ms/op
# Warmup Iteration   3: 5950.634 ms/op
# Warmup Iteration   4: 6001.263 ms/op
# Warmup Iteration   5: 6159.523 ms/op
Iteration   1: 6075.403 ms/op
Iteration   2: 5956.874 ms/op
Iteration   3: 5962.192 ms/op
Iteration   4: 5961.672 ms/op
Iteration   5: 5963.367 ms/op

# Run progress: 16.67% complete, ETA 00:15:06
# Fork: 2 of 2
# Warmup Iteration   1: 4660.383 ms/op
# Warmup Iteration   2: 6320.753 ms/op
# Warmup Iteration   3: 6091.674 ms/op
# Warmup Iteration   4: 6099.851 ms/op
# Warmup Iteration   5: 6028.244 ms/op
Iteration   1: 6008.184 ms/op
Iteration   2: 6098.875 ms/op
Iteration   3: 6109.764 ms/op
Iteration   4: 6145.786 ms/op
Iteration   5: 6112.700 ms/op

Result "thinkpower.test.AccumulationBenchmark.iterative":
  6039.482 ±(99.9%) 114.968 ms/op [Average]
  (min, avg, max) = (5956.874, 6039.482, 6145.786), stdev = 76.044
  CI (99.9%): [5924.514, 6154.449] (assumes normal distribution)

# JMH version: 1.26
# VM version: JDK 11.0.9, OpenJDK 64-Bit Server VM, 11.0.9+11
# VM invoker: C:\Program Files\Java\jdk-11.0.9+11\bin\java.exe
# VM options: -Xms3G -Xmx3G
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 2 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: thinkpower.test.AccumulationBenchmark.parallel

# Run progress: 33.33% complete, ETA 00:12:11
# Fork: 1 of 2
# Warmup Iteration   1: 2450.662 ms/op
# Warmup Iteration   2: 2675.631 ms/op
# Warmup Iteration   3: 2767.538 ms/op
# Warmup Iteration   4: 3086.775 ms/op
# Warmup Iteration   5: 3276.890 ms/op
Iteration   1: 3261.641 ms/op
Iteration   2: 3289.541 ms/op
Iteration   3: 15242.769 ms/op
Iteration   4:
PS C:\test> java -jar .\test.jar AccumulationBenchmark
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.openjdk.jmh.util.Utils (file:/C:/Test/test.jar) to field java.io.Console.cs
WARNING: Please consider reporting this to the maintainers of org.openjdk.jmh.util.Utils
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
# JMH version: 1.26
# VM version: JDK 11.0.9, OpenJDK 64-Bit Server VM, 11.0.9+11
# VM invoker: C:\Program Files\Java\jdk-11.0.9+11\bin\java.exe
# VM options: -Xms3G -Xmx3G
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 2 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: thinkpower.test.AccumulationBenchmark.iterative

# Run progress: 0.00% complete, ETA 00:10:00
# Fork: 1 of 2
# Warmup Iteration   1: 12041.970 ms/op
# Warmup Iteration   2: 6520.502 ms/op
# Warmup Iteration   3: 6395.763 ms/op
# Warmup Iteration   4: 6382.029 ms/op
# Warmup Iteration   5: 6403.139 ms/op
Iteration   1: 6426.603 ms/op
Iteration   2: 6385.655 ms/op
Iteration   3: 6412.165 ms/op
Iteration   4: 6373.420 ms/op
Iteration   5: 6372.247 ms/op

# Run progress: 16.67% complete, ETA 00:15:44
# Fork: 2 of 2
# Warmup Iteration   1: 4859.628 ms/op
# Warmup Iteration   2: 6036.120 ms/op
# Warmup Iteration   3: 6110.761 ms/op
# Warmup Iteration   4: 6069.649 ms/op
# Warmup Iteration   5: 6005.840 ms/op
Iteration   1: 6218.517 ms/op
Iteration   2: 6479.342 ms/op
Iteration   3: 6622.461 ms/op
Iteration   4: 6503.406 ms/op
Iteration   5: 6626.705 ms/op

Result "thinkpower.test.AccumulationBenchmark.iterative":
  6442.052 ±(99.9%) 185.617 ms/op [Average]
  (min, avg, max) = (6218.517, 6442.052, 6626.705), stdev = 122.774
  CI (99.9%): [6256.435, 6627.670] (assumes normal distribution)

# JMH version: 1.26
# VM version: JDK 11.0.9, OpenJDK 64-Bit Server VM, 11.0.9+11
# VM invoker: C:\Program Files\Java\jdk-11.0.9+11\bin\java.exe
# VM options: -Xms3G -Xmx3G
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 2 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: thinkpower.test.AccumulationBenchmark.parallel

# Run progress: 33.33% complete, ETA 00:12:31
# Fork: 1 of 2
# Warmup Iteration   1: 3238.765 ms/op
# Warmup Iteration   2: 3355.833 ms/op
# Warmup Iteration   3: 3133.194 ms/op
# Warmup Iteration   4: 3249.205 ms/op
# Warmup Iteration   5: 3252.772 ms/op
Iteration   1: 3232.813 ms/op
Iteration   2: 3312.203 ms/op
Iteration   3: 3291.101 ms/op
Iteration   4: 3262.520 ms/op
Iteration   5: 3277.272 ms/op

# Run progress: 50.00% complete, ETA 00:08:29
# Fork: 2 of 2
# Warmup Iteration   1: 3332.932 ms/op
# Warmup Iteration   2: 3312.197 ms/op
# Warmup Iteration   3: 3281.602 ms/op
# Warmup Iteration   4: 3402.100 ms/op
# Warmup Iteration   5: 3226.125 ms/op
Iteration   1: 3325.027 ms/op
Iteration   2: 3234.206 ms/op
Iteration   3: 3384.664 ms/op
Iteration   4: 3095.578 ms/op
Iteration   5: 3299.872 ms/op

Result "thinkpower.test.AccumulationBenchmark.parallel":
  3271.526 ±(99.9%) 115.434 ms/op [Average]
  (min, avg, max) = (3095.578, 3271.526, 3384.664), stdev = 76.352
  CI (99.9%): [3156.092, 3386.960] (assumes normal distribution)

# JMH version: 1.26
# VM version: JDK 11.0.9, OpenJDK 64-Bit Server VM, 11.0.9+11
# VM invoker: C:\Program Files\Java\jdk-11.0.9+11\bin\java.exe
# VM options: -Xms3G -Xmx3G
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 2 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: thinkpower.test.AccumulationBenchmark.sequential

# Run progress: 66.67% complete, ETA 00:05:24
# Fork: 1 of 2
# Warmup Iteration   1: 10609.940 ms/op
# Warmup Iteration   2: 6775.638 ms/op
# Warmup Iteration   3: 6745.000 ms/op
# Warmup Iteration   4: 6766.871 ms/op
# Warmup Iteration   5: 6757.104 ms/op
Iteration   1: 6771.728 ms/op
Iteration   2: 6777.884 ms/op
Iteration   3: 6732.919 ms/op
Iteration   4: 6921.757 ms/op
Iteration   5: 6800.023 ms/op

# Run progress: 83.33% complete, ETA 00:02:49
# Fork: 2 of 2
# Warmup Iteration   1: 11353.900 ms/op
# Warmup Iteration   2: 7132.713 ms/op
# Warmup Iteration   3: 6798.452 ms/op
# Warmup Iteration   4: 6961.987 ms/op
# Warmup Iteration   5: 6786.564 ms/op
Iteration   1: 6940.743 ms/op
Iteration   2: 6776.996 ms/op
Iteration   3: 6771.191 ms/op
Iteration   4: 6740.755 ms/op
Iteration   5: 6749.857 ms/op

Result "thinkpower.test.AccumulationBenchmark.sequential":
  6798.385 ±(99.9%) 110.156 ms/op [Average]
  (min, avg, max) = (6732.919, 6798.385, 6940.743), stdev = 72.861
  CI (99.9%): [6688.229, 6908.541] (assumes normal distribution)

# Run complete. Total time: 00:17:31

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                         Mode  Cnt     Score     Error  Units
AccumulationBenchmark.iterative   avgt   10  6442.052 ± 185.617  ms/op
AccumulationBenchmark.parallel    avgt   10  3271.526 ± 115.434  ms/op
AccumulationBenchmark.sequential  avgt   10  6798.385 ± 110.156  ms/op

採用平行處理的parallel()在兩個執行緒環境下，得到顯著效能提升。

然而，平行處理過程尚牽涉到fork/join演算邏輯、處理標的之資料結構、運行中是否遇到等待時間等因素，皆會影響到效能測試結果。藉由JMH測試工具逐步調校程式寫法，可有效並有依據地優化系統執行效能。