如何使用utf8字符正确读取url内容?

public class URLReader { public static byte[] read(String from, String to, String string){ try { String text = "http://translate.google.com/translate_a/t?"+ "client=o&text="+URLEncoder.encode(string, "UTF-8")+ "&hl=en&sl="+from+"&tl="+to+""; URL url = new URL(text); BufferedReader in = new BufferedReader( new InputStreamReader(url.openStream(), "UTF-8")); String json = in.readLine(); byte[] bytes = json.getBytes("UTF-8"); in.close(); return bytes; //return text.getBytes(); } catch (Exception e) { return null; } } } 

和:

 public class AbcServlet extends HttpServlet { public void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException { resp.setContentType("text/plain;charset=UTF-8"); resp.getWriter().println(new String(URLReader.read("pl", "en", "koń"))); } } 

当我运行这个时,我得到: {"sentences"[{"trans":"end","orig":"koďż˝","translit":"","src_translit":""}],"src":"pl","server_time":30}所以utf无法正常工作,但如果我返回编码url: http://translate.google.com/translate_a/t?client=o&text=ko%C5%84&hl=en&sl=pl&tl=en {"sentences"[{"trans":"end","orig":"koďż˝","translit":"","src_translit":""}],"src":"pl","server_time":30} http://translate.google.com/translate_a/t?client=o&text=ko%C5%84&hl=en&sl=pl&tl=en和粘贴在url bar我得到正确: {"sentences":[{"trans":"horse","orig":"koń","translit":"","src_translit":""}],"dict":[{"pos":"noun","terms":["horse"]}],"src":"pl","server_time":76}

 byte[] bytes = json.getBytes("UTF-8"); 

为您提供UTF-8字节序列,因此URLReader.read也为您提供UTF-8字节序列

但是你试图在没有指定编码器的情况下进行解码,即new String(URLReader.read("pl", "en", "koń"))因此Java将使用您的系统默认编码进行解码(不是UTF-8)

试试:

 new String(URLReader.read("pl", "en", "koń"), "UTF-8") 

更新

这是我的机器上完全正常工作的代码:

 public class URLReader { public static byte[] read(String from, String to, String string) { try { String text = "http://translate.google.com/translate_a/t?" + "client=o&text=" + URLEncoder.encode(string, "UTF-8") + "&hl=en&sl=" + from + "&tl=" + to + ""; URL url = new URL(text); URLConnection conn = url.openConnection(); // Look like faking the request coming from Web browser solve 403 error conn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 (.NET CLR 3.5.30729)"); BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream(), "UTF-8")); String json = in.readLine(); byte[] bytes = json.getBytes("UTF-8"); in.close(); return bytes; //return text.getBytes(); } catch (Exception e) { System.out.println(e); // becarful with returning null. subsequence call will return NullPointException. return null; } } } 

不要忘记逃离\ u144。 Java编译器可能无法正确编译Unicode文本,因此最好以纯ASCII编写它。

 public class AbcServlet extends HttpServlet { @Override public void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException { resp.setContentType("text/plain;charset=UTF-8"); byte[] read = URLReader.read("pl", "en", "ko\u0144"); resp.getOutputStream().write(read) ; } } 
Interesting Posts