在两个数组中搜索匹配项，没有额外的内存

前几天我和亚马逊进行了一次采访，他们问我的一个问题是关于以下问题。

给定2个整数数组，包含任意数量的正数和负数元素，找到两个数组中出现的数字。

我能够使用HashMaps很容易地解决这个问题，因此它会有O(n)计算复杂度，但不幸的是，这也会产生O(n)空间复杂度。这可以通过遍历每个数组中的所有元素而没有额外的内存来完成，但这将是O(n^2) 。

在我完成HashMap方法的解释之后，面试官问我是否可以想到一个O（n）计算方法，但不会使用任何额外的内存。我无法想到任何动态，并且无法为此找到解决方案。在线性时间内，有没有办法在不使用额外内存的情况下找到这些值？

注意：我在CareerCup上发布了这个问题，但是那里的每个人似乎都没有得到我不需要使用额外空间的概念，并且它必须是O(n)计算。

这是我在采访中使用的代码。它有效，但空间不是O（1）。

 import java.util.*; public class ArrayFun { public static void main(String[] args) { int[] a = {1,2,3,4}; int[] b = {2,5,6,7,3,2,2,2,2,1,2,2,2,2}; ArrayList matches = ArrayFun.findMatches(a,b); for (int i = 0;i<matches.size();++i) { System.out.println(matches.get(i)); } } public static ArrayList findMatches(int[] a, int[] b) { HashMap map = new HashMap(); ArrayList matches = new ArrayList(); for (int i = 0;i<a.length;++i) { map.put(a[i],0); } for (int i = 0;i<b.length;++i) { if (map.get(b[i]) != null && map.get(b[i]) == 0) { map.put(b[i],1); matches.add(b[i]); } } return matches; } }

此代码将返回

1,2,3

编辑：当我说没有额外的空间，而O（1），我可以互换地使用它们。没有额外的空间我的意思是小的占位符变量很好，但分配新的数组不是。

在O（n）时间内没有O（1）空间方法来查找两个未排序集的交集。

对于具有无限范围的数据类型，最小排序价格为O（n ln n）。

对于具有有限范围基数排序的数据类型，可以在O（n ln n’n“）时间内进行就地基数排序，其中n是数据的大小，n’是值的数量可以表示，并且n“与检查两个值是否在同一基数组中的成本有关。对于O（ln n）空间价格，可以降低n“时间价格。

在32位整数的特殊情况下，n’是2 ^ 32且n“是1，因此这将崩溃为O（n）并为数十亿个记录集提供成功的解决方案。

对于无限大小的整数，n’和n“通过基数排除O（n）时间解。

关键是就地对两个数组进行排序。我搜索了“就地基数排序”，并找到了In-Place Radix Sort 。我相信这个问题是可以解决的，至少对于Java int []来说，通过应用这些想法来逐位排序每个数组，然后进行明显的扫描。

顺便说一句，我认为问题代码中问题的正确输出是1,2,3。

这是我的实现，基于引用问题的答案：

  public class ArrayMatch { public static void main(String[] args) { int[] a = { 4, 1, 2, 3, 4 }; int[] b = { 2, 5, 6, 7, 3, 2, 2, 2, 2, 1, 2, 2, 2, 2 }; System.out.print("Original problem"); printMatches(a, b); System.out.println(); int[] a1 = { 4, 1, -1234, 2, 3, 4, Integer.MIN_VALUE }; int[] b1 = { -1234, 2, 5, 6, 7, 3, 2, 2, 2, 2, 1, 2, 2, 2, 2 , Integer.MIN_VALUE, Integer.MAX_VALUE}; System.out.print("With negatives"); printMatches(a1, b1); System.out.println(); } // Print all matching elements between the two arrays. private static void printMatches(int[] a, int[] b) { if (a.length == 0 || b.length == 0) { return; } sort(a); sort(b); int i = 0; int j = 0; while (true) { while (a[i] < b[j]) { i++; if (i == a.length) { return; } } while (a[i] > b[j]) { j++; if (j == b.length) { return; } } if (a[i] == b[j]) { System.out.print(" " + a[i]); do { i++; } while (i < a.length && a[i - 1] == a[i]); do { j++; } while (j < b.length && b[j - 1] == b[j]); } if (i == a.length || j == b.length) { return; } } } // In place radix sort. private static void sort(int[] in) { // Flip the sign bit to regularize the sort order flipBit(in, 31); sort(in, 0, in.length, 31); // Flip back the sign bit back to restore 2's complement flipBit(in, 31); } /** * Sort a subarray, elements start through end-1 of in, according to the * values in firstBit through 0. * * @param in * @param start * @param end * @param firstBit */ private static void sort(int[] in, int start, int end, int firstBit) { if (start == end) { return; } int mask = 1 << firstBit; int zeroCount = 0; for (int i = start; i < end; i++) { if ((in[i] & mask) == 0) { zeroCount++; } } int elements = end - start; int nextZeroIndex = start; int nextOneIndex = start + zeroCount; int split = nextOneIndex; if (zeroCount > 0 && zeroCount < elements) { while (nextZeroIndex < split) { if ((in[nextZeroIndex] & mask) != 0) { // Found a one bit in the zero area, look for its partner in the one // area while ((in[nextOneIndex] & mask) != 0) { nextOneIndex++; } int temp = in[nextZeroIndex]; in[nextZeroIndex] = in[nextOneIndex]; in[nextOneIndex] = temp; nextOneIndex++; } nextZeroIndex++; } } if (firstBit > 0) { sort(in, start, split, firstBit - 1); sort(in, split, end, firstBit - 1); } } private static void flipBit(int[] in, int bitNo) { int mask = 1 << bitNo; for (int i = 0; i < in.length; i++) { in[i] ^= mask; } } }

一个可能的答案类似于HashMap解决方案…… 如果你知道整数在一个非常小的窗口内。它将类似于： http ： //en.wikipedia.org/wiki/Bucket_sort

基本上，如果保证整数在某个恒定大小的窗口内（即它们都是1-1000），那么你可以通过递增索引的每个单元格=无论你的数字是多少来在恒定空间中进行。这与HashMap解决方案完全相同，不同之处在于您不需要像HashMap那样考虑所有可能的整数，这样可以节省空间。如果不清楚，请在评论中告诉我，我会进一步解释。

我相信这可以在 O(1) 额外空间的情况下进行。我利用了额外的假设，即数组中的元素是可变的以及可交换的，但我相信通过仔细计算，可以针对这个特定问题去除可变性假设。

基本思想是进行就地散列。就地散列可以通过使用O(n) 中位数中值选择算法将arrays围绕合适的百分位数（例如第90位）进行划分来实现。这将arrays分成小部分（约10％）和大部分（约90％），其元素彼此可区分（小于分区元素或不小于分区元素）。然后，您可以通过交换从10％部分哈希到90％部分。此散列可用于检测重复项。对于每个处理10％的数组，这是O(n) ，因此完成10次仍然是O(n) 。我更详细地描述了这一点，虽然有一些挥手我想在某一天纠正这个相关的问题。。

对于这个特殊问题，您需要进行3次就地散列。首先在每个单独的数组上删除重复项。然后，在表示组合数组的包装器上（如果索引小于数组1的长度，索引到数组1，否则索引到数组2）以报告重复项。

在两个数组中搜索匹配项，没有额外的内存

如何使用JSON数据填充dropdownlist作为jQuery中的ajax响应

为什么此代码会出现“无法访问的语句”错误？

实现接口的开销

Camel Splitter并行处理数组列表 – 并发访问问题

对象的InvokeExact，其类型由classloader动态加载

@Startup @Singleton在WebLogic中实例化两次（EJB 3.1）

Java：通过_happens-before_关系安全地在构造函数中“泄漏”最终类的引用？

Hibernate org.hibernate.LazyInitializationException：懒得初始化角色集合：

Java中的马尔可夫模型决策过程

如何使用需要2个参数的查询进行batchUpdate，并且只有其中一个存储在列表中