Tag: 流式传输

如何在Spark中将JavaPairInputDStream转换为DataSet / DataFrame

我正在尝试从kafka接收流数据。 在此过程中,我能够接收流数据并将其存储到JavaPairInputDStream中 。 现在我需要分析这些数据,而不是将其存储到任何数据库中。所以我想将此JavaPairInputDStream转换为DataSet或DataFrame 到目前为止我尝试的是: import java.util.Arrays; import java.util.Collections; import java.util.HashMap; import java.util.Iterator; import java.util.List; import java.util.Map; import java.util.Set; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.SQLContext; import org.apache.spark.sql.SparkSession; import org.apache.spark.sql.catalog.Function; import org.apache.spark.streaming.Duration; import org.apache.spark.streaming.api.java.AbstractJavaDStreamLike; import org.apache.spark.streaming.api.java.JavaDStream; import org.apache.spark.streaming.api.java.JavaPairDStream; import org.apache.spark.streaming.api.java.JavaPairInputDStream; import org.apache.spark.streaming.api.java.JavaStreamingContext; import org.apache.spark.streaming.kafka.KafkaUtils; import kafka.serializer.StringDecoder; import scala.Tuple2; //Streaming Working […]