使用JAVA将window.open(超链接)Javascript代码转换为纯绝对URL

我在JAVA Jsoup Library的网站上工作,以提取一些超链接

Document doc = Jsoup.connect("http://www.saudisale.com/SS_a_mpg.aspx").get(); Elements script = doc.select("script") ; for(Element elementary :doc.select("table")) { System.out.println(""+elementary.select("tbody").select("tr").select("td").select("input").attr("onClick")+""); 

样本输出: –

 window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1'); window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1'); window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1'); window.open('http://ads.saudisale.com/dyaralez.html ','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1'); window.open('http://ads.saudisale.com/dyaralez.html ','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1'); window.open('http://ads.saudisale.com/dalel.html','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1'); window.open('http://ads.saudisale.com/dalel.html','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1'); window.open('SS_a_car.aspx?carid=37240','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1'); window.open('SS_a_car.aspx?carid=37240','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1'); 

基于Jsoup不支持javascript的事实,所以我必须做一些手动java代码将window.open(超链接)javascript代码转换为绝对超链接

例如,必须转换以下输出JavaScript代码

 window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode=1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1') 

致: http : //saudisale.com/arPrivatePage.aspx?id = 21871638

 window.open('SS_a_car.aspx?carid=37149','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1'); 

致http://www.saudisale.com/SS_a_car.aspx?carid=37149

有人可以指导我如何用JAVA完成这项任务吗?

使用正则表达式。 这将做你想要的:

 String input = "window.open('http://saudisale.com/arPrivatePage.aspx?id=21871638','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1');"; String regex = "window.open\\(['\"]*(.*?)(\\s*['\"]*,.*?)"; Pattern pattern = Pattern.compile(regex); Matcher matcher = pattern.matcher(input); while (matcher.find()) { String output = (matcher.group().replaceAll(regex, "$1")); System.out.println(output); } 

您的最后两个url是相对的 ,因此您必须将其转换为绝对url,如此处所述。

对于相对URl,我使用了这段代码。 它工作正常。

 String input2 = "window.open('SS_a_car.aspx?carid=37149','_blank','channelmode =1,scrollbars=1,status=0,titlebar=0,toolbar=0,resizable=1')"; URL baseURL = new URL("http://saudisale.com/"); String regex = "window.open\\(['\"]*(.*?)(\\s*['\"]*,.*?)"; Pattern pattern = Pattern.compile(regex); Matcher matcher = pattern.matcher(input2); while (matcher.find()) { String output = (matcher.group().replaceAll(regex, "$1")); URL url = new URL( baseURL ,output); System.out.println(url); }