连接到产品页面URL Jsoup

我有一个网站,我需要从中解析数据。 我需要通过关键字结果进行一些搜索。 但是,并非所有字段都在产品预览中可见。 似乎这些字段(产品颜色,描述,旧价格)只能从每个产品页面中删除。 产品页面的url如下所示https://www.aboutyou.de/p/new-look/basecap-in-satin-optik-3649077 SI不知道如何以通用方式调用它,所以我会不必经历每个产品。 我可以找到项目的名称和品牌,但我不知道如何构建url – 将所有字母设置为大写并在字词之间加上破折号? 我可以通过以下方式获得品牌名称和产品名称:Satin-Optik中的新LOOK Basecap。

那么我如何定义每个产品的url?

这是我到目前为止的代码:

String url = "https://www.aboutyou.de/frauen/accessoires/huete-und-muetzen/caps"; Document doc = Jsoup.connect(url).get(); System.out.println("Title: " + doc.title()); String mainPath = "section.layout_11glwo1-o_O-stretchLayout_1jug6qr > " + "div.content_1jug6qr > " + "div.container > " + "div.mainContent_10ejhcu > " + "div.productStream_6k751k > " + "div > " + "div.wrapper_8yay2a > " + "div.col-sm-6.col-md-4 > " + "div.wrapper_1eu800j > " + "div > " + "div.categoryTileWrapper_e296pg"; String searchPath = mainPath + " > a.anchor_wgmchy > " + "div.details_197iil9 > " + "div.meta_1ihynio"; String linksPath = mainPath + " > a.anchor_wgmchy"; String brandPath = mainPath + " > a.anchor_wgmchy > " + "div.details_197iil9 > " + "div.meta_1ihynio > " + "div.description_ya0ltb > " + "strong.brand_ke66rm"; Elements result = doc.body().select("main#app"); for(Element element : result) { Elements products = element.select(searchPath); Elements links = element.select(linksPath); Elements brands = element.select(brandPath); for(Element product : products){ System.out.println(product.text()); } String[] linksText = null; for(Element link : links){ String linkHref = link.attr("href"); String linkText = link.text(); linksText = linkHref.split("[\\-]"); String id = linksText[linksText.length-1]; System.out.println("id: " + id); System.out.print("link attr:" + linkHref + ", "); } System.out.print("\nbrands" + brands.text()); } 

也许,有一些图书馆吗? 我会很感激任何建议!

大多数所需的细节都可以从div中抓取,如下所示:

 

抓住这些div的文本会给你一些类似的东西:

 -10%9,90€ -10 % EXTRA8,90€ NEW LOOK Basecap in Satin-Optik 8,01€ 

从产品页面中分离一些细节和颜色细节子请求的示例代码:

 String url = "https://www.aboutyou.de/frauen/accessoires/huete-und-muetzen/caps"; String userAgent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.108 Safari/537.36"; try { Document doc = Jsoup.connect(url).userAgent(userAgent).get(); Elements elements = doc.select("div[class^='categoryTileWrapper_']"); for (Element element : elements) { String brand = element.select("strong[class^='brand_']").first().text(); String name = element.select("p[class^='name_']").first().text(); System.out.println(brand + " - " + name); String href = element.select("a[class^='anchor_']").first().absUrl("href"); Document subDoc = Jsoup.connect(href).userAgent(userAgent).get(); String color = subDoc.select("div[class^='attributeWrapper_']").first().text(); System.out.println("\t"+href); System.out.println("\t"+color); String finalPrice = element.select("div[class^='finalPrice_']").first().text(); if( element.select("ul").size()>0 ){ for (Element listItems : element.select("ul").first().select("li")) { System.out.println("\tpriece was: " + listItems.select("span[class^='price_']").first().text()); } } System.out.println("\tfinal priece: " + finalPrice); } } catch (IOException e) { e.printStackTrace(); } 

输出:

 NEW LOOK - Basecap in Satin-Optik https://www.aboutyou.de/p/new-look/basecap-in-satin-optik-3649077 Textil Unifarben priece was: 9,90€ priece was: 8,90€ final priece: 8,01€ WOOD WOOD - Weiche 'Baseball cap' https://www.aboutyou.de/p/wood-wood/weiche-baseball-cap-3687779 Logoprint priece was: 39,90€ priece was: 29,90€ final priece: 20,93€ [... truncated]