如何用java保存中文字符到文件？

我使用以下代码将中文字符保存为.txt文件，但是当我用Wordpad打开它时，我无法读取它。

StringBuffer Shanghai_StrBuf = new StringBuffer("\u4E0A\u6D77"); boolean Append = true; FileOutputStream fos; fos = new FileOutputStream(FileName, Append); for (int i = 0;i < Shanghai_StrBuf.length(); i++) { fos.write(Shanghai_StrBuf.charAt(i)); } fos.close();

我能做什么？我知道如果我将中文字符剪切并粘贴到Wordpad中，我可以将其保存为.txt文件。我如何用Java做到这一点？

这里有几个因素在起作用：

文本文件没有用于描述其编码的内在元数据（对于所有关于角括号税的讨论，XML都很受欢迎）
Windows的默认编码仍然是8位（或双字节）“ ANSI ”字符集，其值范围有限 – 以此格式编写的文本文件不可移植
为了告诉ANSI文件中的Unicode文件，Windows应用程序依赖于文件开头的字节顺序标记（严格来说不是真的–Raymond Chen解释）。从理论上讲，BOM可以告诉您数据的字节顺序（字节顺序）。对于UTF-8，即使只有一个字节顺序，Windows应用依赖于标记字节来自动确定它是Unicode（尽管您会注意到Notepad在其打开/保存对话框中有一个编码选项）。
说Java被破坏是错误的，因为它没有自动写入UTF-8 BOM。例如，在Unix系统上，将BOM写入脚本文件是错误的，并且许多Unix系统使用UTF-8作为其默认编码。有时候你不想在Windows上使用它，比如当你将数据附加到现有文件时： fos = new FileOutputStream(FileName,Append);

这是一种可靠地将UTF-8数据附加到文件的方法：

  private static void writeUtf8ToFile(File file, boolean append, String data) throws IOException { boolean skipBOM = append && file.isFile() && (file.length() > 0); Closer res = new Closer(); try { OutputStream out = res.using(new FileOutputStream(file, append)); Writer writer = res.using(new OutputStreamWriter(out, Charset .forName("UTF-8"))); if (!skipBOM) { writer.write('\uFEFF'); } writer.write(data); } finally { res.close(); } }

用法：

  public static void main(String[] args) throws IOException { String chinese = "\u4E0A\u6D77"; boolean append = true; writeUtf8ToFile(new File("chinese.txt"), append, chinese); }

注意：如果文件已经存在并且您选择追加并且现有数据不是 UTF-8编码，那么代码将创建的唯一内容就是混乱。

以下是此代码中使用的Closer类型：

 public class Closer implements Closeable { private Closeable closeable; public  T using(T t) { closeable = t; return t; } @Override public void close() throws IOException { if (closeable != null) { closeable.close(); } } }

此代码使Windows风格最佳猜测如何根据字节顺序标记读取文件：

  private static final Charset[] UTF_ENCODINGS = { Charset.forName("UTF-8"), Charset.forName("UTF-16LE"), Charset.forName("UTF-16BE") }; private static Charset getEncoding(InputStream in) throws IOException { charsetLoop: for (Charset encodings : UTF_ENCODINGS) { byte[] bom = "\uFEFF".getBytes(encodings); in.mark(bom.length); for (byte b : bom) { if ((0xFF & b) != in.read()) { in.reset(); continue charsetLoop; } } return encodings; } return Charset.defaultCharset(); } private static String readText(File file) throws IOException { Closer res = new Closer(); try { InputStream in = res.using(new FileInputStream(file)); InputStream bin = res.using(new BufferedInputStream(in)); Reader reader = res.using(new InputStreamReader(bin, getEncoding(bin))); StringBuilder out = new StringBuilder(); for (int ch = reader.read(); ch != -1; ch = reader.read()) out.append((char) ch); return out.toString(); } finally { res.close(); } }

用法：

  public static void main(String[] args) throws IOException { System.out.println(readText(new File("chinese.txt"))); }

（System.out使用默认编码，因此它是否打印任何合理的取决于您的平台和配置。）

这让我想起：

绝对最低每个软件开发人员绝对必须知道Unicode和字符集（没有借口！）

如果您可以依赖默认字符编码为UTF-8（或其他一些Unicode编码），则可以使用以下命令：

  Writer w = new FileWriter("test.txt"); w.append("上海"); w.close();

最安全的方法是始终明确指定编码：

  Writer w = new OutputStreamWriter(new FileOutputStream("test.txt"), "UTF-8"); w.append("上海"); w.close();

PS如果javac的-encoding参数配置正确，您可以在Java源代码中使用任何Unicode字符，即使是方法和变量名也是如此。这使得源代码比转义的\uXXXX表单更具可读性。

对提出的方法要非常小心。甚至指定文件的编码如下：

Writer w = new OutputStreamWriter（new FileOutputStream（“test.txt”），“UTF-8”）;

如果您在Windows等操作系统下运行，则无法运行。即使将file.encoding的系统属性设置为UTF-8也无法解决问题。这是因为Java无法为文件写入字节顺序标记（BOM）。即使您在写入文件时指定编码，在Wordpad等应用程序中打开相同的文件也会将文本显示为垃圾，因为它不会检测到BOM。我尝试在Windows中运行这些示例（使用CP1252的平台/容器编码）。

存在以下错误来描述Java中的问题：

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058

暂时的解决方案是自己编写字节顺序标记，以确保文件在其他应用程序中正确打开。有关BOM的更多详细信息，请参阅此处：

http://mindprod.com/jgloss/bom.html

有关更正确的解决方案，请参阅以下链接：

http://tripoverit.blogspot.com/2007/04/javas-utf-8-and-unicode-writing-is.html

这是许多人中的一种方式。基本上，我们只是在将字节输出到FileOutputStream之前指定转换为UTF-8：

 String FileName = "output.txt"; StringBuffer Shanghai_StrBuf=new StringBuffer("\u4E0A\u6D77"); boolean Append=true; Writer writer = new OutputStreamWriter(new FileOutputStream(FileName,Append), "UTF-8"); writer.write(Shanghai_StrBuf.toString(), 0, Shanghai_StrBuf.length()); writer.close();

我在http://www.fileformat.info/info/unicode/char/上对图像进行了手动validation。将来，请遵循Java编码标准，包括小写变量名称。它提高了可读性。

尝试这个，

 StringBuffer Shanghai_StrBuf=new StringBuffer("\u4E0A\u6D77"); boolean Append=true; Writer out = new BufferedWriter(new OutputStreamWriter( new FileOutputStream(FileName,Append), "UTF8")); for (int i=0;i



  Spring + Hibernate的多租户：“SessionFactory配置为多租户，但没有指定租户标识符”
  从Java读取cobol数据结构

如何用java保存中文字符到文件？

如何将日文字符分类为汉字或假名？

标记化和模式匹配如何在中文中起作用。

如何在java中使用中文和日文字符作为字符串？