java:如何将文件转换为utf8

我有一个文件有一些非utf8 caracters(如“ISO-8859-1”),所以我想将该文件(或读取)转换为UTF8编码,我该怎么做?

它是这样的代码:

File file = new File("some_file_with_non_utf8_characters.txt"); /* some code to convert the file to an utf8 file */ ... 

编辑:放一个编码示例

  String charset = "ISO-8859-1"; // or what corresponds BufferedReader in = new BufferedReader( new InputStreamReader (new FileInputStream(file), charset)); String line; while( (line = in.readLine()) != null) { .... } 

你有文字解码。 您可以通过simmetric Writer / OutputStream方法使用您喜欢的编码(例如UTF-8)来编写它。

以下代码将文件从srcEncoding转换为tgtEncoding:

 public static void transform(File source, String srcEncoding, File target, String tgtEncoding) throws IOException { BufferedReader br = null; BufferedWriter bw = null; try{ br = new BufferedReader(new InputStreamReader(new FileInputStream(source),srcEncoding)); bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target), tgtEncoding)); char[] buffer = new char[16384]; int read; while ((read = br.read(buffer)) != -1) bw.write(buffer, 0, read); } finally { try { if (br != null) br.close(); } finally { if (bw != null) bw.close(); } } } 

– 编辑 –

使用Try-with-resources(Java 7):

 public static void transform(File source, String srcEncoding, File target, String tgtEncoding) throws IOException { try ( BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(source), srcEncoding)); BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(target), tgtEncoding)); ) { char[] buffer = new char[16384]; int read; while ((read = br.read(buffer)) != -1) bw.write(buffer, 0, read); } } 

您需要知道输入文件的编码。 例如,如果文件是Latin-1,你会做这样的事情,

  FileInputStream fis = new FileInputStream("test.in"); InputStreamReader isr = new InputStreamReader(fis, "ISO-8859-1"); Reader in = new BufferedReader(isr); FileOutputStream fos = new FileOutputStream("test.out"); OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF-8"); Writer out = new BufferedWriter(osw); int ch; while ((ch = in.read()) > -1) { out.write(ch); } out.close(); in.close(); 

你只想把它读成UTF-8? 我最近提出的类似问题是使用-Dfile.encoding = UTF-8启动JVM,并正常读取/打印。 我不知道这是否适用于您的情况。

使用该选项:

 System.out.println("á é í ó ú") 

正确打印字符。 否则打印出来? 符号