为什么这个BufferedReader没有以指定的UTF-8格式读取？

我正在抓几个网站，其中一些包含非拉丁字符和特殊字符，如“用于引号而不是"和'用于撇号而不是' 。

这是真正的曲线球……

我将相关文本打印到控制台。当我在IDE（Netbeans）中运行它时，一切都编码良好。但是当我在我的电脑上运行时， “I Need Your Help”打印出来： ΓÇ£I Need Your HelpΓÇ¥ …

在有人说我需要将我的JAVA_TOOL_OPTIONS环境变量设置为-Dfile.encoding=UTF8让我说我已经这样做了，这仍然是一个问题。此外，我不应该指定缓冲读卡器的编码为"UTF-8"覆盖它吗？

这是一些信息：

我正在使用目标平台的JDK 7作为1.7
我正在运行Windows 7机器上运行我正在运行的所有机器并遇到同样的问题（有些没有设置JAVA_TOOL_OPTIONS ，但这似乎没有任何区别）。
我认为它使用的默认编码是Cp1252 …

这是我的代码。让我知道您是否需要更多信息。谢谢！

 /** * Using the given url, this method creates and returns the buffered reader for that url * * @param urlString * @return * @throws MalformedURLException * @throws IOException */ public synchronized static BufferedReader getBufferedReader(String urlString) throws MalformedURLException, IOException { URL url = new URL(urlString); InputStream is = url.openStream(); BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8")); return br; }

这里有两种可能性。正如user1291492所说，可能是您正确读取了内容，但终端使用的编码与IDE使用的编码不同。

另一种可能性是源数据不是UTF-8。如果您正在抓取一个网站，那么您应该注意网站告诉您它通过Content-Type标头用于编码的Content-Type ，而不是假设它始终是UTF-8。

IDE的输出“窗口”可能具有理解和打印utf-8字符的能力。控制台可能不是那么先进

 try { reader = new BufferedReader(new InputStreamReader(in,"UTF-8")); } catch (UnsupportedEncodingException e1) { // TODO Auto-generated catch block e1.printStackTrace(); } String line=""; String s =""; try { line = reader.readLine(); } catch (IOException e) { e.printStackTrace(); } while (line != null) { s = s + line; s =s+"\n"; try { line = reader.readLine(); } catch (IOException e) { e.printStackTrace(); } } tv.setText(""+s); }

为什么这个BufferedReader没有以指定的UTF-8格式读取？

Socket，BufferedReader挂起在readLine（）

如何在Java 1.4中设置BufferedReader和PrintWriter的超时？

在Java中逐行读取和写入大文件的最快方法

如何两次或多次读取BufferedReader？

Java变量未初始化错误

java阅读器与流

BufferedReader在应该的时候没有说“准备好”

随机化用Java读取的文本文件

缓冲读卡器没有从套接字接收数据

java.io.Buffer *流与普通流有何不同？