计算文件中的单词数

我在计算文件中的单词数时遇到问题。我正在采取的方法是当我看到一个空间或一个新线时，我知道要算一个字。

问题是，如果我在段落之间有多行，那么我最终也将它们视为单词。如果你看一下readFile（）方法，你就可以看到我在做什么。

你能帮助我并指导我如何解决这个问题吗？

示例输入文件（包括空行）：

word word word word word word word word

我会改变你的方法。首先，我将使用BufferedReader使用readLine()逐行读取文件文件。然后使用String.split("\\s")在空格上拆分每一行，并使用结果数组的大小来查看该行上有多少个单词。要获得字符数，您可以查看每行或每个拆分字的大小（取决于您是否要将空格计为字符）。

您可以使用带有FileInputStream的Scanner而不是带有FileReader的BufferedReader。例如：-

 File file = new File("sample.txt"); try(Scanner sc = new Scanner(new FileInputStream(file))){ int count=0; while(sc.hasNext()){ sc.next(); count++; } System.out.println("Number of words: " + count); }

这只是一个想法。有一种非常简单的方法可以做到这一点。如果您只需要单词而不是实际单词，那么只需使用Apache WordUtils即可

 import org.apache.commons.lang.WordUtils; public class CountWord { public static void main(String[] args) { String str = "Just keep a boolean flag around that lets you know if the previous character was whitespace or not pseudocode follows"; String initials = WordUtils.initials(str); System.out.println(initials); //so number of words in your file will be System.out.println(initials.length()); } }

只需保留一个布尔标志，让你知道前一个字符是否是空格（伪代码如下）：

 boolean prevWhitespace = false; int wordCount = 0; while (char ch = getNextChar(input)) { if (isWhitespace(ch)) { if (!prevWhitespace) { prevWhitespace = true; wordCount++; } } else { prevWhitespace = false; } }

 import java.io.BufferedReader; import java.io.FileReader; public class CountWords { public static void main (String args[]) throws Exception { System.out.println ("Counting Words"); FileReader fr = new FileReader ("c:\\Customer1.txt"); BufferedReader br = new BufferedReader (fr); String line = br.readLin (); int count = 0; while (line != null) { String []parts = line.split(" "); for( String w : parts) { count++; } line = br.readLine(); } System.out.println(count); } }

黑客解决方案

您可以将文本文件读入String var。然后使用单个空格将String拆分为数组作为分隔符StringVar.Split（“”）。

数组计数等于文件中“单词”的数量。当然这不会给你一个行数。

我认为正确的方法是通过Regex：

 String fileContent = ; String[] words = Pattern.compile("\\s+").split(fileContent); System.out.println("File has " + words.length + " words");

希望能帮助到你。 “\ s +”含义在Pattern javadoc中

3步骤：消耗所有空白区域，检查是否为一条线，消耗所有非空白区域

 while(true){ c = inFile.read(); // consume whitespaces while(isspace(c)){ inFile.read() } if (c == '\n'){ numberLines++; continue; } while (!isspace(c)){ numberChars++; c = inFile.read(); } numberWords++; }

文件字数

如果在具有某些符号的单词之间，则可以分割和计算单词的数量。

 Scanner sc = new Scanner(new FileInputStream(new File("Input.txt"))); int count = 0; while (sc.hasNext()) { String[] s = sc.next().split("d*[.@:=#-]"); for (int i = 0; i < s.length; i++) { if (!s[i].isEmpty()){ System.out.println(s[i]); count++; } } } System.out.println("Word-Count : "+count);

看看我的解决方案，它应该工作。我的想法是从单词中删除所有不需要的符号，然后将这些单词分开并将它们存储在其他变量中，我使用的是ArrayList。通过调整“excludedSymbols”变量，您可以添加更多要从单词中排除的符号。

 public static void countWords () { String textFileLocation ="c:\\yourFileLocation"; String readWords =""; ArrayList extractOnlyWordsFromTextFile = new ArrayList<>(); // excludedSymbols can be extended to whatever you want to exclude from the file String[] excludedSymbols = {" ", "," , "." , "/" , ":" , ";" , "<" , ">", "\n"}; String readByteCharByChar = ""; boolean testIfWord = false; try { InputStream inputStream = new FileInputStream(textFileLocation); byte byte1 = (byte) inputStream.read(); while (byte1 != -1) { readByteCharByChar +=String.valueOf((char)byte1); for(int i=0;i

这可以使用Java 8以非常方式完成：

 Files.lines(Paths.get(file)) .flatMap(str->Stream.of(str.split("[ ,.!?\r\n]"))) .filter(s->s.length()>0).count();

 BufferedReader bf= new BufferedReader(new FileReader("G://Sample.txt")); String line=bf.readLine(); while(line!=null) { String[] words=line.split(" "); System.out.println("this line contains " +words.length+ " words"); line=bf.readLine(); }

以下代码支持Java 8

//将文件读入String

 String fileContent=new String(Files.readAlBytes(Paths.get("MyFile.txt")),StandardCharacters.UFT_8);

//通过使用分隔符拆分将这些保存到字符串列表中

 List words = Arrays.asList(contents.split("\\PL+")); int count=0; for(String x: words){ if(x.length()>1) count++; } sop(x);

计算文件中的单词数

如何使用java 读取文本文件中的最后一行

从java调用R脚本

Javamultithreading文件下载性能

打开文件句柄太多

在Java中，当使用DataOutputStream写入文件时，如何定义正在写入的数据的Endian？

控制台输入错误java.lang.NullPointerException

写入文件但只保存最后一行

PipedInputStream – 如何避免“java.io.IOException：Pipe broken”

Java中的自动检测字符编码

读取多字节字符时InputStream和InputStreamReader之间的区别