如何删除List 中没有equals / hashcode的重复对象？

我必须删除列表中的重复对象。它是来自对象Blog的List，如下所示：

public class Blog { private String title; private String author; private String url; private String description; ... }

复制对象是一个对象，其标题，作者，URL和描述等于其他对象。

我无法改变这个对象。我不能把新的方法放在上面。

我该怎么做呢？

如果你不能编辑类的来源（为什么不呢？），那么你需要迭代列表并根据提到的四个标准（“标题，作者，url和描述”）比较每个项目。

为了以BlogKey方式执行此操作，我将创建一个新类，类似于BlogKey ，它包含这四个元素， 并正确实现equals()和hashCode() 。然后，您可以遍历原始列表，为每个列表构建一个BlogKey并添加到HashMap ：

 Map map = new HashMap(); for (Blog blog : blogs) { BlogKey key = createKey(blog); if (!map.containsKey(key)) { map.put(key, blog); } } Collection uniqueBlogs = map.values();

然而，最简单的事情就是编辑Blog的原始源代码，以便正确实现equals()和hashCode() 。

以下是适用于此场景的完整代码：

  class Blog { private String title; private String author; private String url; public String getTitle() { return title; } public void setTitle(String title) { this.title = title; } public String getAuthor() { return author; } public void setAuthor(String author) { this.author = author; } public String getUrl() { return url; } public void setUrl(String url) { this.url = url; } public String getDescription() { return description; } public void setDescription(String description) { this.description = description; } private String description; Blog(String title, String author, String url, String description) { this.title = title; this.author = author; this.url = url; this.description = description; } @Override public boolean equals(Object obj) { // TODO Auto-generated method stub if(obj instanceof Blog) { Blog temp = (Blog) obj; if(this.title == temp.title && this.author== temp.author && this.url == temp.url && this.description == temp.description) return true; } return false; } @Override public int hashCode() { // TODO Auto-generated method stub return (this.title.hashCode() + this.author.hashCode() + this.url.hashCode() + this.description.hashCode()); }

}

这是主要function，将消除重复：

  public static void main(String[] args) { Blog b1 = new Blog("A", "sam", "a", "desc"); Blog b2 = new Blog("B", "ram", "b", "desc"); Blog b3 = new Blog("C", "cam", "c", "desc"); Blog b4 = new Blog("A", "sam", "a", "desc"); Blog b5 = new Blog("D", "dam", "d", "desc"); List list = new ArrayList(); list.add(b1); list.add(b2); list.add(b3); list.add(b4); list.add(b5); //Removing Duplicates; Set s= new HashSet(); s.addAll(list); list = new ArrayList(); list.addAll(s); //Now the List has only the identical Elements

}

确保Blog定义了equals(Object)和hashCode() ，然后将addAll(list)到new HashSet()或new LinkedHashSet()如果顺序很重要）。

更好的是，从一开始就使用Set而不是List ，因为你显然不需要重复，所以你的数据模型更好地反映了它，而不是必须在事后删除它们。

使用集：

yourList = new ArrayList(new LinkedHashSet(yourList));

这将创建没有重复的列表，元素顺序将与原始列表中的一样。

只是不要忘记为您的类Blog实现hashCode（）和equals（）。

使用这4个字段覆盖hashCode()和equals(..)
使用new HashSet(blogList) – 这将为您提供一个根据定义没有重复的Set

更新：由于您无法更改类，因此这是一个O（n ^ 2）解决方案：

创建一个新列表
迭代第一个列表
在内部循环中迭代第二个列表并validation它是否具有相同字段的元素

如果使用外部化hashCode()和equals(..)方法提供HashSet数据结构，则可以提高效率。

如果您的Blog类在其上定义了适当的equals()方法，最简单的方法就是创建一个列表中的Set ，这将自动删除重复项：

 List blogList = ...; // your initial list Set noDups = new HashSet(blogList)

有可能这将与你的其余代码透明地工作 – 例如，如果你只是迭代内容，那么Collection任何实例都和另一个一样好。（如果迭代顺序很重要，那么您可能更喜欢使用LinkedHashSet ，这将保留列表的原始顺序）。

如果你真的需要将结果作为List然后保持简单的方法，你可以通过包装在ArrayList （或类似的）中直接将它直接转换回来。如果你的藏品相对较小（比如说少于一千个元素）那么这种方法的明显效率低下可能并不重要。

您可以使用title，author，url和description覆盖equals()方法。（和hashCode()因为如果你覆盖一个你应该覆盖另一个）。然后使用类型的HashSet 。

您需要的第一步是实现equals方法并比较您的字段。之后，步骤各不相同。

您可以使用：if（！list2.contains（item））创建一个新的空列表并循环遍历原始列表然后执行添加。

另一种快速的方法是将它们全部塞进一个Set中并将它们拉回到List中。这是有效的，因为集合不允许重复开头。

我无法改变这个对象。我不能把新的方法放在上面。

我该怎么做呢？

如果您还意味着如何使对象不可变并阻止子类化：使用final关键字

 public final class Blog { //final classes can't be extended/subclassed private final String title; //final members have to be set in the constructor and can't be changed private final String author; private final String url; private final String description; ... }

编辑：我刚刚看到你的一些评论，似乎你想改变课程，但不能（第三方我假设）。

为了防止重复，您可以使用实现适当的equals()和hashCode()的包装器，然后使用其他人提到的Set aproach：

  class BlogWrapper { private Blog blog; //set via constructor etc. public int hashCode() { int hashCode = blog.getTitle().hashCode(); //check for null etc. //add the other hash codes as well return hashCode; } public boolean equals(Object other) { //check if both are BlogWrappers //remember to check for null too! Blog otherBlog = ((BlogWrapper)other).getBlog(); if( !blog.getTitle().equals(otherBlog.getTitle()) { return false; } ... //check other fields as well return true } }

请注意，这只是一个粗略而简单的版本，不包含强制性的空检查。

最后使用Set ，遍历所有博客并尝试将new BlogWrapper(blog)添加到集合中。最后，您应该只在集合中拥有唯一（包装）的博客。

我尝试了几种方法从java对象列表中删除重复项
他们之中有一些是
1.覆盖equals和hashCode方法并通过将列表传递给set类构造函数将列表转换为集合并删除并添加所有
2.运行2个指针并通过运行2 for for循环手动删除重复项，就像我们以前用C语言进行数组一样
3.为bean编写一个匿名Comparator类并执行Collections.sort，然后运行2个指针以向前移除。

我的要求更多的是从近500万个对象中删除近100万个重复项。
所以经过这么多试验后，我得到了第三种选择，我认为这是最有效和最有效的方式，结果是在几秒钟内进行评估，而其他两种选择几乎需要10到15分钟。
第一个和第二个选项非常无效，因为当我的对象增加了以指数方式删除重复项所花费的时间。

所以最后第三种选择是最好的。

  import java.util.ArrayList; import java.util.HashSet; class Person { public int age; public String name; public int hashCode() { // System.out.println("In hashcode"); int hashcode = 0; hashcode = age*20; hashcode += name.hashCode(); System.out.println("In hashcode : "+hashcode); return hashcode; } public boolean equals(Object obj) { if (obj instanceof Person) { Person pp = (Person) obj; boolean flag=(pp.name.equals(this.name) && pp.age == this.age); System.out.println(pp); System.out.println(pp.name+" "+this.name); System.out.println(pp.age+" "+this.age); System.out.println("In equals : "+flag); return flag; } else { System.out.println("In equals : false"); return false; } } public void setAge(int age) { this.age=age; } public int getAge() { return age; } public void setName(String name ) { this.name=name; } public String getName() { return name; } public String toString() { return "[ "+name+", "+age+" ]"; } } class ListRemoveDuplicateObject { public static void main(String[] args) { ArrayList al=new ArrayList(); Person person =new Person(); person.setName("Neelesh"); person.setAge(26); al.add(person); person =new Person(); person.setName("Hitesh"); person.setAge(16); al.add(person); person =new Person(); person.setName("jyoti"); person.setAge(27); al.add(person); person =new Person(); person.setName("Neelesh"); person.setAge(60); al.add(person); person =new Person(); person.setName("Hitesh"); person.setAge(16); al.add(person); person =new Person(); person.setName("Mohan"); person.setAge(56); al.add(person); person =new Person(); person.setName("Hitesh"); person.setAge(16); al.add(person); System.out.println(al); HashSet al1=new HashSet(); al1.addAll(al); al.clear(); al.addAll(al1); System.out.println(al); } }

产量

[[Neelesh，26]，[Hitesh，16]，[jyoti，27]，[Neelesh，60]，[Hitesh，16]，[Mohan，56]，[Hitesh，16]]
在哈希码中：-801018364
在哈希码：-2133141913
在哈希码：101608849
在哈希码中：-801017684
在哈希码：-2133141913
[Hitesh，16]
Hitesh Hitesh
16 16
等于：是的
在哈希码：74522099
在哈希码：-2133141913
[Hitesh，16]
Hitesh Hitesh
16 16
等于：是的
[[Neelesh，60]，[Neelesh，26]，[Mohan，56]，[jyoti，27]，[Hitesh，16]]

这是删除重复对象的一种方法。

博客类应该像这样或类似的东西，如正确的pojo

 public class Blog { private String title; private String author; private String url; private String description; private int hashCode; public String getTitle() { return title; } public void setTitle(String title) { this.title = title; } public String getAuthor() { return author; } public void setAuthor(String author) { this.author = author; } public String getUrl() { return url; } public void setUrl(String url) { this.url = url; } public String getDescription() { return description; } public void setDescription(String description) { this.description = description; } @Override public boolean equals(Object obj) { Blog blog = (Blog)obj; if(title.equals(blog.title) && author.equals(blog.author) && url.equals(blog.url) && description.equals(blog.description)) { hashCode = blog.hashCode; return true; }else{ hashCode = super.hashCode(); return false; } } }

并像这样使用它来删除重复的对象。这里的关键数据结构是Set和LinkedHashSet。它将删除重复项并保持输入顺序

  Blog blog1 = new Blog(); blog1.setTitle("Game of Thrones"); blog1.setAuthor("HBO"); blog1.setDescription("The best TV show in the US"); blog1.setUrl("www.hbonow.com/gameofthrones"); Blog blog2 = new Blog(); blog2.setTitle("Game of Thrones"); blog2.setAuthor("HBO"); blog2.setDescription("The best TV show in the US"); blog2.setUrl("www.hbonow.com/gameofthrones"); Blog blog3 = new Blog(); blog3.setTitle("Ray Donovan"); blog3.setAuthor("Showtime"); blog3.setDescription("The second best TV show in the US"); blog3.setUrl("www.showtime.com/raydonovan"); ArrayList listOfBlogs = new ArrayList<>(); listOfBlogs.add(blog1); listOfBlogs.add(blog2); listOfBlogs.add(blog3); Set setOfBlogs = new LinkedHashSet<>(listOfBlogs); listOfBlogs.clear(); listOfBlogs.addAll(setOfBlogs); for(int i=0;i

运行这个应该打印

 Game of Thrones Ray Donovan

第二个将被删除，因为它是第一个对象的副本。

使用此代码

  public List removeDuplicates(List list) { // Set set1 = new LinkedHashSet(list); Set set = new TreeSet(new Comparator() { @Override public int compare(Object o1, Object o2) { if (((Blog) o1).get().equalsIgnoreCase(((Blog) o2).getId()) /*&& ((Blog)o1).getName().equalsIgnoreCase(((Blog)o2).getName())*/) { return 0; } return 1; } }); set.addAll(list); final List newList = new ArrayList(set); return newList; }

首先重写equals()方法：

 @Override public boolean equals(Object obj) { if(obj == null) return false; else if(obj instanceof MyObject && getTitle() == obj.getTitle() && getAuthor() == obj.getAuthor() && getURL() == obj.getURL() && getDescription() == obj.getDescription()) return true; else return false; }

然后使用：

 List list = new ArrayList; for(MyObject obj1 : list) { for(MyObject obj2 : list) { if(obj1.equals(obj2)) list.remove(obj1); // or list.remove(obj2); } }

创建一个包装Blog对象的新类，并提供所需的相等/哈希代码方法。为了获得最大效率，我将在包装器上添加两个静态方法，一个用于转换博客列表 – > Blog Wrapper列表，另一个用于转换Blog Wrapper列表 – >博客列表。然后你会：

将您的博客列表转换为博客包装列表
将您的博客包装列表添加到哈希集
从哈希集中取出精简的博客包装列表
将博客包装列表转换为博客列表

Blog Wrapper的代码是这样的：

 import java.util.ArrayList; import java.util.List; public class BlogWrapper { public static List unwrappedList(List blogWrapperList) { if (blogWrapperList == null) return new ArrayList(0); List blogList = new ArrayList(blogWrapperList.size()); for (BlogWrapper bW : blogWrapperList) { blogList.add(bW.getBlog()); } return blogList; } public static List wrappedList(List blogList) { if (blogList == null) return new ArrayList(0); List blogWrapperList = new ArrayList(blogList .size()); for (Blog b : blogList) { blogWrapperList.add(new BlogWrapper(b)); } return blogWrapperList; } private Blog blog = null; public BlogWrapper() { super(); } public BlogWrapper(Blog aBlog) { super(); setBlog(aBlog); } public boolean equals(Object other) { // Your equality logic here return super.equals(other); } public Blog getBlog() { return blog; } public int hashCode() { // Your hashcode logic here return super.hashCode(); } public void setBlog(Blog blog) { this.blog = blog; } }

你可以像这样使用它：

 List myBlogWrappers = BlogWrapper.wrappedList(your blog list here); Set noDupWrapSet = new HashSet(myBlogWrappers); List noDupWrapList = new ArrayList(noDupSet); List noDupList = BlogWrapper.unwrappedList(noDupWrapList);

很明显，您可以使上述代码更有效，特别是通过使Blog Wrapper上的wrap和unwrap方法采用集合而不是列表。

包装Blog类的另一种方法是使用像BCEL这样的字节代码操作库来实际更改Blog的equals和hashcode方法。但是，当然，如果它们需要原始的equals / hashcode行为，那么它可能会对代码的其余部分产生意想不到的后果。

最简单和最有效的方法是允许eclipse生成并覆盖equals和hashcode方法。只需在提示时选择要检查重复项的属性，您应该全部设置。

如何删除List 中没有equals / hashcode的重复对象？

Java中的断点和逐步调试？

如何在Eclipse项目中设置HtmlUnit？

Struts2将列表数据从JSP发送到Action类：替代方式

Spring Boot CommandLineRunner：filter选项参数

如何使用Chrome访问现有Cookie？

最常见的已检查和未检查的Javaexception？

RegExp匹配由一组有限字符组成的字符串，而不重用任何字符

Spring @Scheduled任务运行两次

selenium找不到合适的方法（ExpectedCondition ）

如何用表格格式设计报表？