用Java读取HDFS和本地文件

我想读取文件路径,无论它们是HDFS还是本地路径。 目前,我传递带有前缀file://的本地路径和带有前缀hdfs://的HDFS路径,并编写一些代码如下

Configuration configuration = new Configuration(); FileSystem fileSystem = null; if (filePath.startsWith("hdfs://")) { fileSystem = FileSystem.get(configuration); } else if (filePath.startsWith("file://")) { fileSystem = FileSystem.getLocal(configuration).getRawFileSystem(); } 

从这里我使用FileSystem的API来读取文件。

如果还有其他比这更好的方法,你能告诉我吗?

这是否有意义,

 public static void main(String[] args) throws IOException { Configuration conf = new Configuration(); conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml")); conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml")); BufferedReader br = new BufferedReader(new InputStreamReader(System.in)); System.out.println("Enter the file path..."); String filePath = br.readLine(); Path path = new Path(filePath); FileSystem fs = path.getFileSystem(conf); FSDataInputStream inputStream = fs.open(path); System.out.println(inputStream.available()); fs.close(); } 

如果你这样做,你不必进行检查。 直接从Path获取FileSystem,然后做任何你想做的事情。

您可以通过以下方式获取FileSystem

 Configuration conf = new Configuration(); Path path = new Path(stringPath); FileSystem fs = FileSystem.get(path.toUri(), conf); 

您无需判断路径是以hdfs://还是file://开头。 这个API将完成这项工作。

请检查HDFS路径下列出文件的代码片段; 即以hdfs://开头的路径字符串。 如果您可以提供Hadoop配置和本地路径,它还将列出本地文件系统中的文件; 即以file://开头的路径字符串。

  //helper method to get the list of files from the HDFS path public static List listFilesFromHDFSPath(Configuration hadoopConfiguration, String hdfsPath, boolean recursive) { //resulting list of files List filePaths = new ArrayList(); FileSystem fs = null; //try-catch-finally all possible exceptions try { //get path from string and then the filesystem Path path = new Path(hdfsPath); //throws IllegalArgumentException, all others will only throw IOException fs = path.getFileSystem(hadoopConfiguration); //resolve hdfsPath first to check whether the path exists => either a real directory or o real file //resolvePath() returns fully-qualified variant of the path path = fs.resolvePath(path); //if recursive approach is requested if (recursive) { //(heap issues with recursive approach) => using a queue Queue fileQueue = new LinkedList(); //add the obtained path to the queue fileQueue.add(path); //while the fileQueue is not empty while (!fileQueue.isEmpty()) { //get the file path from queue Path filePath = fileQueue.remove(); //filePath refers to a file if (fs.isFile(filePath)) { filePaths.add(filePath.toString()); } else //else filePath refers to a directory { //list paths in the directory and add to the queue FileStatus[] fileStatuses = fs.listStatus(filePath); for (FileStatus fileStatus : fileStatuses) { fileQueue.add(fileStatus.getPath()); } // for } // else } // while } // if else //non-recursive approach => no heap overhead { //if the given hdfsPath is actually directory if (fs.isDirectory(path)) { FileStatus[] fileStatuses = fs.listStatus(path); //loop all file statuses for (FileStatus fileStatus : fileStatuses) { //if the given status is a file, then update the resulting list if (fileStatus.isFile()) filePaths.add(fileStatus.getPath().toString()); } // for } // if else //it is a file then { //return the one and only file path to the resulting list filePaths.add(path.toString()); } // else } // else } // try catch(Exception ex) //will catch all exception including IOException and IllegalArgumentException { ex.printStackTrace(); //if some problem occurs return an empty array list return new ArrayList(); } // finally { //close filesystem; not more operations try { if(fs != null) fs.close(); } catch (IOException e) { e.printStackTrace(); } // catch } // finally //return the resulting list; list can be empty if given path is an empty directory without files and sub-directories return filePaths; } // listFilesFromHDFSPath 

如果您真的想使用java.io.File API,那么以下方法将帮助您仅从本地文件系统列出文件; 即以file://开头的路径字符串。

  //helper method to list files from the local path in the local file system public static List listFilesFromLocalPath(String localPathString, boolean recursive) { //resulting list of files List localFilePaths = new ArrayList(); //get the Java file instance from local path string File localPath = new File(localPathString); //this case is possible if the given localPathString does not exit => which means neither file nor a directory if(!localPath.exists()) { System.err.println("\n" + localPathString + " is neither a file nor a directory; please provide correct local path"); //return with empty list return new ArrayList(); } // if //at this point localPath does exist in the file system => either as a directory or a file //if recursive approach is requested if (recursive) { //recursive approach => using a queue Queue fileQueue = new LinkedList(); //add the file in obtained path to the queue fileQueue.add(localPath); //while the fileQueue is not empty while (!fileQueue.isEmpty()) { //get the file from queue File file = fileQueue.remove(); //file instance refers to a file if (file.isFile()) { //update the list with file absolute path localFilePaths.add(file.getAbsolutePath()); } // if else //else file instance refers to a directory { //list files in the directory and add to the queue File[] listedFiles = file.listFiles(); for (File listedFile : listedFiles) { fileQueue.add(listedFile); } // for } // else } // while } // if else //non-recursive approach { //if the given localPathString is actually a directory if (localPath.isDirectory()) { File[] listedFiles = localPath.listFiles(); //loop all listed files for (File listedFile : listedFiles) { //if the given listedFile is actually a file, then update the resulting list if (listedFile.isFile()) localFilePaths.add(listedFile.getAbsolutePath()); } // for } // if else //it is a file then { //return the one and only file absolute path to the resulting list localFilePaths.add(localPath.getAbsolutePath()); } // else } // else //return the resulting list; list can be empty if given path is an empty directory without files and sub-directories return localFilePaths; } // listFilesFromLocalPath