带有DataFrame API的Apache Spark MLlib在createDataFrame()或read()时会产生java.net.URISyntaxException .csv(…)

在一个独立的应用程序(运行在java8,Windows 10上,使用spark-xxx_2.11:2.0.0作为jar依赖项)下一代码会出错:

/* this: */ Dataset logData = spark_session.createDataFrame(Arrays.asList( new LabeledPoint(1.0, Vectors.dense(4.9,3,1.4,0.2)), new LabeledPoint(1.0, Vectors.dense(4.7,3.2,1.3,0.2)) ), LabeledPoint.class); /* or this: */ /* logFile: "C:\files\project\file.csv", "C:\\files\\project\\file.csv", "C:/files/project/file.csv", "file:/C:/files/project/file.csv", "file:///C:/files/project/file.csv", "/file.csv" */ Dataset logData = spark_session.read().csv(logFile); 

例外:

 java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:C:/files/project/spark-warehouse at org.apache.hadoop.fs.Path.initialize(Path.java:206) at org.apache.hadoop.fs.Path.(Path.java:172) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.makeQualifiedPath(SessionCatalog.scala:114) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:145) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.(SessionCatalog.scala:89) at org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:95) at org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:95) at org.apache.spark.sql.internal.SessionState$$anon$1.(SessionState.scala:112) at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:112) at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:111) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:373) at  

如何从java代码将csv文件加载到Dataset

文件系统路径存在一些问题。 请参阅jira https://issues.apache.org/jira/browse/SPARK-15899 。 对于解决方法,您可以在SparkSession中设置“spark.sql.warehouse.dir”,如下所示。

 SparkSession spark = SparkSession .builder() .appName("JavaALSExample") .config("spark.sql.warehouse.dir", "/file:C:/temp") .getOrCreate();