如何检测字符串中URL的存在

我有一个输入字符串说Please go to http://stackoverflow.com 。检测到字符串的url部分，许多浏览器/ IDE /应用程序自动添加锚点。所以Please go to http://stackoverflow.com 。

我需要使用Java做同样的事情。

使用java.net.URL！

嘿，为什么不在java中使用核心类来获取这个“java.net.URL”并让它validationURL。

虽然下面的代码违反了黄金原则“仅针对exception条件使用exception”，但尝试重新发明轮子以获得在Java平台上成熟的东西是没有意义的。

这是代码：

 import java.net.URL; import java.net.MalformedURLException; // Replaces URLs with html hrefs codes public class URLInString { public static void main(String[] args) { String s = args[0]; // separate input by spaces ( URLs don't have spaces ) String [] parts = s.split("\\s+"); // Attempt to convert each item into an URL. for( String item : parts ) try { URL url = new URL(item); // If possible then replace with anchor... System.out.print(""+ url + " " ); } catch (MalformedURLException e) { // If there was an URL that was not it!... System.out.print( item + " " ); } System.out.println(); } }

使用以下输入：

 "Please go to http://stackoverflow.com and then mailto:oscarreyes@wordpress.com to download a file from ftp://user:pass@someserver/someFile.txt"

产生以下输出：

 Please go to http://stackoverflow.com and then mailto:oscarreyes@wordpress.com to download a file from ftp://user:pass@someserver/someFile.txt

当然，可以以不同方式处理不同的协议。例如，您可以使用URL类的getter获取所有信息

  url.getProtocol();

或者其他属性：spec，port，file，query，ref等

http://java.sun.com/javase/6/docs/api/java/net/URL.html

处理所有协议（至少所有java平台都知道的协议）并作为额外的好处，如果有任何java当前无法识别的URL并最终被合并到URL类中（通过库更新），您将获得它透明！

虽然它不是特定于Java的，但Jeff Atwood最近发布了一篇文章，介绍了在尝试查找和匹配任意文本的URL时可能遇到的陷阱：

url问题

它提供了一个很好的正则表达式，可以与您需要用来正确（或多或少）处理parens的代码片段一起使用。

正则表达式：

 \(?\bhttp://[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]

paren清理：

 if (s.StartsWith("(") && s.EndsWith(")")) { return s.Substring(1, s.Length - 2); }

你可以做这样的事情（根据你的需要调整正则表达式）：

 String originalString = "Please go to http://www.stackoverflow.com"; String newString = originalString.replaceAll("http://.+?(com|net|org)/{0,1}", "https://stackoverflow.com/questions/285619/how-to-detect-the-presence-of-url-in-a-string/$0");

以下代码对“Atwood方法”进行了这些修改：

除http之外检测https（添加其他方案是微不足道的）
由于HtTpS：//有效，因此使用CASE_INSENSTIVE标志。
剥离匹配的括号组（它们可以嵌套到任何级别）。此外，任何剩余的不匹配的左括号都被剥离，但尾随的右括号保持不变（尊重维基百科式的URL）
URL是链接文本中的HTML编码。
target属性通过method参数传入。可以根据需要添加其他属性。
在匹配URL之前，它不使用\ b来标识分词符。 URL可以以左括号或http [s]：//开头，没有其他要求。

笔记：

Apache Commons Lang的StringUtils用于下面的代码中
下面对HtmlUtil.encode（）的调用是一个util，它最终调用一些Tomahawk代码对链接文本进行HTML编码，但任何类似的实用程序都可以。
有关在JSF或默认情况下输出为HTML编码的其他环境中的用法，请参阅方法注释。

这是为了响应客户的要求而编写的，我们认为它代表了RFC中允许的字符与常用用法之间的合理折衷。它在这里提供，希望它对其他人有用。

可以进一步扩展，允许输入任何Unicode字符（即不使用％XX（两位hex）转义和超链接，但这需要接受所有Unicode字母加上有限的标点符号，然后拆分“可接受的”分隔符（例如，…，％，|，＃等），对每个部分进行URL编码，然后再粘合在一起。例如， http ：//en.wikipedia.org/wiki/Björn_Andrésen（Stack Overflow生成器不会检测）将是href中的“http://en.wikipedia.org/wiki/Bj%C3%B6rn_Andr%C3%A9sen”，但会在页面上的链接文本中包含Björn_Andrésen。

 // NOTES: 1) \w includes 0-9, az, AZ, _ // 2) The leading '-' is the '-' character. It must go first in character class expression private static final String VALID_CHARS = "-\\w+&@#/%=~()|"; private static final String VALID_NON_TERMINAL = "?!:,.;"; // Notes on the expression: // 1) Any number of leading '(' (left parenthesis) accepted. Will be dealt with. // 2) s? ==> the s is optional so either [http, https] accepted as scheme // 3) All valid chars accepted and then one or more // 4) Case insensitive so that the scheme can be hTtPs (for example) if desired private static final Pattern URI_FINDER_PATTERN = Pattern.compile("\\(*https?://["+ VALID_CHARS + VALID_NON_TERMINAL + "]*[" +VALID_CHARS + "]", Pattern.CASE_INSENSITIVE ); /** *  * Finds all "URL"s in the given _rawText, wraps them in * HTML link tags and returns the result (with the rest of the text * html encoded). * 
 *  * We employ the procedure described at: * http://www.codinghorror.com/blog/2008/10/the-problem-with-urls.html * which is a must-read. * 
 * Basically, we allow any number of left parenthesis (which will get stripped away) * followed by http:// or https://. Then any number of permitted URL characters * (based on http://www.ietf.org/rfc/rfc1738.txt) followed by a single character * of that set (basically, those minus typical punctuation). We remove all sets of * matching left & right parentheses which surround the URL. * *  * This method *must* be called from a tag/component which will NOT * end up escaping the output. For example: * 
 *  * 
 * 
 * 
 * Reason: we are adding <a href="..."> tags to the output *and* * encoding the rest of the string. So, encoding the outupt will result in * double-encoding data which was already encoded - and encoding the a href * (which will render it useless). * 
 * 
 * * @param _rawText - if null, returns "" (empty string). * @param _target - if not null or "", adds a target attributed to the generated link, using _target as the attribute value. */ public static final String hyperlinkText( final String _rawText, final String _target ) { String returnValue = null; if ( !StringUtils.isBlank( _rawText ) ) { final Matcher matcher = URI_FINDER_PATTERN.matcher( _rawText ); if ( matcher.find() ) { final int originalLength = _rawText.length(); final String targetText = ( StringUtils.isBlank( _target ) ) ? "" : " target=https://stackoverflow.com/questions/285619/how-to-detect-the-presence-of-url-in-a-string/\"" + _target.trim() + "https://stackoverflow.com/questions/285619/how-to-detect-the-presence-of-url-in-a-string/\""; final int targetLength = targetText.length(); // Counted 15 characters aside from the target + 2 of the URL (max if the whole string is URL) // Rough guess, but should keep us from expanding the Builder too many times. final StringBuilder returnBuffer = new StringBuilder( originalLength * 2 + targetLength + 15 ); int currentStart; int currentEnd; int lastEnd = 0; String currentURL; do { currentStart = matcher.start(); currentEnd = matcher.end(); currentURL = matcher.group(); // Adjust for URLs wrapped in ()'s ... move start/end markers // and substring the _rawText for new URL value. while ( currentURL.startsWith( "(" ) && currentURL.endsWith( ")" ) ) { currentStart = currentStart + 1; currentEnd = currentEnd - 1; currentURL = _rawText.substring( currentStart, currentEnd ); } while ( currentURL.startsWith( "(" ) ) { currentStart = currentStart + 1; currentURL = _rawText.substring( currentStart, currentEnd ); } // Text since last match returnBuffer.append( HtmlUtil.encode( _rawText.substring( lastEnd, currentStart ) ) ); // Wrap matched URL returnBuffer.append( "" + currentURL + "" ); lastEnd = currentEnd; } while ( matcher.find() ); if ( lastEnd < originalLength ) { returnBuffer.append( HtmlUtil.encode( _rawText.substring( lastEnd ) ) ); } returnValue = returnBuffer.toString(); } } if ( returnValue == null ) { returnValue = HtmlUtil.encode( _rawText ); } return returnValue; }

我创建了一个小型库，它正是这样做的：

https://github.com/robinst/autolink-java

一些棘手的例子和它检测到的链接：

http://example.com. → http：//example.com 。
http://example.com, //example.com，→http：//example.com ，
(http://example.com) →（ http://example.com ）
(... (see http://example.com)) →（…（请参阅http://example.com ））
https://en.wikipedia.org/wiki/Link_(The_Legend_of_Zelda)→https://en.wikipedia.org/wiki/Link_(The_Legend_of_Zelda）
http://üñîçøðé.com/ ：//üñîçøðé.com/

原始：

 String msg = "Please go to http://stackoverflow.com"; String withURL = msg.replaceAll("(?:https?|ftps?)://[\\w/%.-]+", "https://stackoverflow.com/questions/285619/how-to-detect-the-presence-of-url-in-a-string/$0"); System.out.println(withURL);

这需要改进，以匹配正确的URL，特别是GET参数（？foo = bar＆x = 25）

你问的是两个不同的问题。

识别字符串中URL的最佳方法是什么？看到这个post
如何用Java编写上述解决方案？说明String.replaceAll用法的其他响应已经解决了这个问题

PhiLho答案的一个很好的改进是： msg.replaceAll("(?:https?|ftps?)://[\w/%.-][/\??\w=?\w?/%.-]?[/\?&\w=?\w?/%.-]*", "https://stackoverflow.com/questions/285619/how-to-detect-the-presence-of-url-in-a-string/$0");

我编写了自己的URI / URL提取器，并认为有人可能会发现它有用，因为恕我直言比其他答案更好，因为：

它基于Stream，可用于大型文档
它可以扩展到通过战略链处理各种“Atwood Paren”问题。

由于代码有点长（虽然只有一个Java文件），我把它放在gist github上 。

这是一个主要方法的签名，称之为上述要点如何：

 public static Iterator extractURIs( final Reader reader, final Iterable strategies, String ... schemes);

有一个默认的策略链可以处理大多数Atwood问题。

 public static List DEFAULT_STRATEGY_CHAIN = ImmutableList.of( new RemoveSurroundsWithToURIStrategy("'"), new RemoveSurroundsWithToURIStrategy("https://stackoverflow.com/questions/285619/how-to-detect-the-presence-of-url-in-a-string/\""), new RemoveSurroundsWithToURIStrategy("(", ")"), new RemoveEndsWithToURIStrategy("."), DEFAULT_STRATEGY, REMOVE_LAST_STRATEGY);

请享用！

建议在2017年更方便的方式：

或者android:autoLink="all"用于各种链接。

有一个非常好的javascript框架直接在浏览器中呈现链接： https ： //github.com/gregjacobs/Autolinker.js

它支持：html，电子邮件，（仅限我们）电话号码，推特和主题标签。

它还提供没有链接：http：//

您也可以使用jSoup，请参阅此（非常详细）示例：

http://jsoup.org/cookbook/extracting-data/example-list-links

要检测URL，您只需要：

 if (yourtextview.getText().toString().contains("www") || yourtextview.getText().toString().contains("http://"){ your code here if contains URL;}

如何检测字符串中URL的存在

使用java.net.URL！

使用Spring电子邮件抽象层阅读邮件

superClass的私有成员是否由子类inheritance… Java？

何时以及如何将HashMap从链表转换为红黑树？

Maven插件自动生成setter / getters？

递归方法总是比Java中的迭代方法更好吗？

如何在Spring Tools Suite上添加Spring roo

如何查询流程定义的运行实例？

Spring JDBC与普通JDBC之间的区别？

javax.validation.constraints.Pattern注释的参数化错误消息？

Java方法描述符中美元符号的含义？