Spark Java中的移动平均线

我有实时流数据进入火花，我想对该时间序列数据进行移动平均预测。有没有办法在Java中使用spark实现它？

我已经提到过： https ： //gist.github.com/samklr/27411098f04fc46dcd05/revisions和Apache Spark Moving Average，但这些代码都是用Scala编写的。由于我不熟悉Scala，我无法判断我是否会发现它有用甚至将代码转换为Java。在Spark Java中是否有直接的预测实现？

为了将Scala代码翻译成Java，我接受了你提到的问题，并且花费了几个小时的努力：

// Read a file containing the Stock Quotations // You can also paralelize a collection of objects to create a RDD JavaRDD linesRDD = sc.textFile("some sample file containing stock prices"); // Convert the lines into our business objects JavaRDD quotationsRDD = linesRDD.flatMap(new ConvertLineToStockQuotation()); // We need these two objects in order to use the MLLib RDDFunctions object ClassTag classTag = scala.reflect.ClassManifestFactory.fromClass(StockQuotation.class); RDD rdd = JavaRDD.toRDD(quotationsRDD); // Instantiate a RDDFunctions object to work with RDDFunctions rddFs = RDDFunctions.fromRDD(rdd, classTag); // This applies the sliding function and return the (DATE,SMA) tuple JavaPairRDD smaPerDate = rddFs.sliding(slidingWindow).toJavaRDD().mapToPair(new MovingAvgByDateFunction()); List> smaPerDateList = smaPerDate.collect();

然后，您必须使用新的函数类来执行每个数据窗口的实际计算：

 public class MovingAvgByDateFunction implements PairFunction { /** * */ private static final long serialVersionUID = 9220435667459839141L; @Override public Tuple2 call(Object t) throws Exception { StockQuotation[] stocks = (StockQuotation[]) t; List stockList = Arrays.asList(stocks); Double result = stockList.stream().collect(Collectors.summingDouble(new ToDoubleFunction() { @Override public double applyAsDouble(StockQuotation value) { return value.getValue(); } })); result = result / stockList.size(); return new Tuple2(stockList.get(0).getTimestamp(),result); } }

如果你想了解更多细节，我在这里写了关于简单移动平均线的信息： https ： //t.co/gmWltdANd3

Spark Java中的移动平均线

如何使用spark处理一系列hbase行？

Spark 2.0.1写入错误：引起：java.util.NoSuchElementException

Spark中的并发作业执行

为什么SparkSession为一个动作执行两次？

如何修复java.lang.ClassCastException：无法将scala.collection.immutable.List的实例分配给字段类型scala.collection.Seq？

在Java中使用foreachActive for spark Vector

处理Spark Scala中的微秒

线程“main”中的exceptionorg.apache.spark.SparkException：此JVM中只能运行一个SparkContext（参见SPARK-2243）

Java – Spark SQL DataFrame映射函数不起作用

Spark SQL：镶嵌错误的嵌套类