为什么我的字符串操作使用lambda表达式很慢?

方法将逗号分隔的单词作为String并以逗号分隔的单词返回String ,其中包含自然排序顺序的单词,不包含任何4个字母的单词,包含UPPER大小写中的所有单词,不包含重复项。 与第二种方法相比,第一种方法相当慢。 你能帮我理解为什么以及如何改进我的方法?

方法1:

 public String stringProcessing(String s){ Stream tokens = Arrays.stream(s.split(",")); return tokens.filter(t -> t.length() != 4) .distinct() .sorted() .collect(Collectors.joining(",")).toUpperCase(); } 

方法2:

 public String processing(String s) { String[] tokens = s.split(","); Set resultSet = new TreeSet(); for(String t:tokens){ if(t.length() != 4) resultSet.add(t.toUpperCase()); } StringBuilder result = new StringBuilder(); resultSet.forEach(key -> { result.append(key).append(","); }); result.deleteCharAt(result.length()-1); return result.toString(); } 

没有记录使用过的JRE版本,输入数据集和基准方法的性能比较不适合得出任何结论。

此外,您的变体之间存在根本差异。 在使用distinct() ,第一个变体处理原始字符串,在将完整结果字符串转换为大写字母之前,可能保留比第二个变量更多的元素,将所有元素连接到字符串。 相反,您的第二个变体在添加到集合之前会转换单个元素,因此只会进一步处理具有不同大写字母表示的字符串。 因此,第二个变体在加入时可能需要更少的内存并处理更少的元素。

因此,在完成不同的事情时,比较这些操作的性能没有多大意义。 这两种变体之间的比较更好:

 public String variant1(String s){ Stream tokens = Arrays.stream(s.split(",")); return tokens.filter(t -> t.length() != 4) .map(String::toUpperCase) .sorted().distinct() .collect(Collectors.joining(",")); } public String variant2(String s) { String[] tokens = s.split(","); Set resultSet = new TreeSet<>(); for(String t:tokens){ if(t.length() != 4) resultSet.add(t.toUpperCase()); } return String.join(",", resultSet); } 

请注意,我更改了sorted()distinct()的顺序; 如本答案中所讨论的,在sorted()之后直接应用distinct()允许在不同的操作中利用流的排序特性。

您还可以考虑在流式传输之前不创建包含所有子字符串的临时数组:

 public String variant1(String s){ return Pattern.compile(",").splitAsStream(s) .filter(t -> t.length() != 4) .map(String::toUpperCase) .sorted().distinct() .collect(Collectors.joining(",")); } 

您还可以添加第三个变体,

 public String variant3(String s) { Set resultSet = new TreeSet<>(); int o = 0, p; for(p = s.indexOf(','); p>=0; p = s.indexOf(',', o=p+1)) { if(po == 4) continue; resultSet.add(s.substring(o, p).toUpperCase()); } if(s.length()-o != 4) resultSet.add(s.substring(o).toUpperCase()); return String.join(",", resultSet); } 

它不会创建一个子字符串数组,甚至会为不符合过滤条件的子字符串创建子字符串。 这并不意味着建议在生产代码中使用如此低的级别,但总是可能存在更快的变体,因此无论您使用的变体是否最快,而且它是否运行合理并不重要速度可维持。

我想这只是一些实际发布一些JMH测试的时间。 我采用了Holger的方法并测试了它们:

 @BenchmarkMode(value = { Mode.AverageTime }) @OutputTimeUnit(TimeUnit.NANOSECONDS) @Warmup(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS) @Measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS) @State(Scope.Benchmark) public class StreamVsLoop { public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder().include(StreamVsLoop.class.getSimpleName()) .build(); new Runner(opt).run(); } @Param(value = { "a, b, c", "a, bb, ccc, dddd, eeeee, ffffff, ggggggg, hhhhhhhh", "a, bb, ccc, dddd, eeeee, ffffff, ggggggg, hhhhhhhh, ooooooooo, tttttttttttttt, mmmmmmmmmmmmmmmmmm" }) String s; @Benchmark @Fork(1) public String stream() { Stream tokens = Arrays.stream(s.split(",")); return tokens.filter(t -> t.length() != 4) .map(String::toUpperCase) .sorted().distinct() .collect(Collectors.joining(",")); } @Benchmark @Fork(1) public String loop() { String[] tokens = s.split(","); Set resultSet = new TreeSet<>(); for (String t : tokens) { if (t.length() != 4) { resultSet.add(t.toUpperCase()); } } return String.join(",", resultSet); } @Benchmark @Fork(1) public String sortedDistinct() { return Pattern.compile(",").splitAsStream(s) .filter(t -> t.length() != 4) .map(String::toUpperCase) .sorted() .distinct() .collect(Collectors.joining(",")); } @Benchmark @Fork(1) public String distinctSorted() { return Pattern.compile(",").splitAsStream(s) .filter(t -> t.length() != 4) .map(String::toUpperCase) .distinct() .sorted() .collect(Collectors.joining(",")); } } 

以下是结果:

  stream 3 args 574.042 loop 3 args 393.364 sortedDistinct 3 args 829.077 distinctSorted 3 args 836.558 stream 8 args 1144.488 loop 8 args 1014.756 sortedDistinct 8 args 1533.968 distinctSorted 8 args 1745.055 stream 11 args 1829.571 loop 11 args 1514.138 sortedDistinct 11 args 1940.256 distinctSorted 11 args 2591.715 

结果有点明显,流速度较慢,但​​不是那么多,可能性可读性很高。 此外,霍尔格是对的(但他很少,如果有的话,不是)

我花了一点时间来构建一个我会相当满意的测试; 实际判断我会得到的数字……

 @BenchmarkMode(value = { Mode.AverageTime }) @OutputTimeUnit(TimeUnit.MILLISECONDS) @Warmup(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS) @Measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS) @State(Scope.Benchmark) public class StreamVsLoop { public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder().include(StreamVsLoop.class.getSimpleName()) .jvmArgs("-ea") .shouldFailOnError(true) .build(); new Runner(opt).run(); } @State(Scope.Thread) public static class StringInput { private String[] letters = { "q", "a", "z", "w", "s", "x", "e", "d", "c", "r", "f", "v", "t", "g", "b", "y", "h", "n", "u", "j", "m", "i", "k", "o", "l", "p" }; public String s = ""; @Param(value = { "1000", "10000", "100000" }) int next; @TearDown(Level.Iteration) public void tearDown() { if (next == 1000) { long count = Arrays.stream(s.split(",")).filter(x -> x.length() == 5).count(); assert count == 99; } if (next == 10000) { long count = Arrays.stream(s.split(",")).filter(x -> x.length() == 5).count(); assert count == 999; } if (next == 100000) { long count = Arrays.stream(s.split(",")).filter(x -> x.length() == 5).count(); assert count == 9999; } s = null; } /** * a very brute-force tentative to have 1/2 elements to be filtered and 1/2 not * highly inneficiant, but this is not part of the measurment, so who cares? */ @Setup(Level.Iteration) public void setUp() { for (int i = 0; i < next; i++) { int index = ThreadLocalRandom.current().nextInt(0, letters.length); String letter = letters[index]; if (next == 1000) { if (i < 500 && i % 4 == 0) { s = s + "," + letter; } else if (i > 500 && i % 5 == 0) { s = s + "," + letter; } else { s = s + letter; } } else if (next == 10000) { if (i < 5000 && i % 4 == 0) { s = s + "," + letter; } else if (i > 5000 && i % 5 == 0) { s = s + "," + letter; } else { s = s + letter; } } else if (next == 100000) { if (i < 50000 && i % 4 == 0) { s = s + "," + letter; } else if (i > 50000 && i % 5 == 0) { s = s + "," + letter; } else { s = s + letter; } } } } } @Benchmark @Fork public String stream(StringInput si) { Stream tokens = Arrays.stream(si.s.split(",")); return tokens.filter(t -> t.length() != 4) .map(String::toUpperCase) .sorted().distinct() .collect(Collectors.joining(",")); } @Benchmark @Fork(1) public String loop(StringInput si) { String[] tokens = si.s.split(","); Set resultSet = new TreeSet<>(); for (String t : tokens) { if (t.length() != 4) { resultSet.add(t.toUpperCase()); } } return String.join(",", resultSet); } @Benchmark @Fork(1) public String sortedDistinct(StringInput si) { return Pattern.compile(",").splitAsStream(si.s) .filter(t -> t.length() != 4) .map(String::toUpperCase) .sorted() .distinct() .collect(Collectors.joining(",")); } @Benchmark @Fork(1) public String distinctSorted(StringInput si) { return Pattern.compile(",").splitAsStream(si.s) .filter(t -> t.length() != 4) .map(String::toUpperCase) .distinct() .sorted() .collect(Collectors.joining(",")); } @Benchmark @Fork(1) public String variant3(StringInput si) { String s = si.s; Set resultSet = new TreeSet<>(); int o = 0, p; for (p = s.indexOf(','); p >= 0; p = s.indexOf(',', o = p + 1)) { if (p - o == 4) { continue; } resultSet.add(s.substring(o, p).toUpperCase()); } if (s.length() - o != 4) { resultSet.add(s.substring(o).toUpperCase()); } return String.join(",", resultSet); } } 
 streamvsLoop.StreamVsLoop.distinctSorted 1000 0.028 streamvsLoop.StreamVsLoop.sortedDistinct 1000 0.024 streamvsLoop.StreamVsLoop.loop 1000 0.016 streamvsLoop.StreamVsLoop.stream 1000 0.020 streamvsLoop.StreamVsLoop.variant3 1000 0.012 streamvsLoop.StreamVsLoop.distinctSorted 10000 0.394 streamvsLoop.StreamVsLoop.sortedDistinct 10000 0.359 streamvsLoop.StreamVsLoop.loop 10000 0.274 streamvsLoop.StreamVsLoop.stream 10000 0.304 ± 0.006 streamvsLoop.StreamVsLoop.variant3 10000 0.234 streamvsLoop.StreamVsLoop.distinctSorted 100000 4.950 streamvsLoop.StreamVsLoop.sortedDistinct 100000 4.432 streamvsLoop.StreamVsLoop.loop 100000 5.457 streamvsLoop.StreamVsLoop.stream 100000 3.927 ± 0.048 streamvsLoop.StreamVsLoop.variant3 100000 3.595 

Holger的方法获胜,但是一旦代码足够热,男孩在其他解决方案之间的差异很小。