What is the most efficient way of getting the intersection of k sorted arrays?

6

Given k sorted arrays what is the most efficient way of getting the intersection of these lists

Ex INPUT: [[1,3,5,7], [1,1,3,5,7], [1,4,7,9]] Output: [1,7]

There is a way to get the union of k sorted arrays based on what I read in the Elements of programming interviews book in nlogk time. I was wondering if there is a way to do something similar for the intersection as well

## merge sorted arrays in nlogk time [ regular appending and merging is nlogn time ]
import heapq
def mergeArys(srtd_arys):
    heap = []
    srtd_iters = [iter(x) for x in srtd_arys]
    
    # put the first element from each srtd array onto the heap
    for idx, it in enumerate(srtd_iters):
        elem = next(it, None)
        if elem:
            heapq.heappush(heap, (elem, idx))
    
    res = []
 
    # collect results in nlogK time
    while heap:
        elem, ary = heapq.heappop(heap)
        it = srtd_iters[ary]
        res.append(elem)
        nxt = next(it, None)
        if nxt:
            heapq.heappush(heap, (nxt, ary))

EDIT: obviously this is an algorithm question that I am trying to solve so I cannot use any of the inbuilt functions like set intersection etc

Share
Improve this question
6
  • You can still apply the priority queue approach if you make the observation that an element that appears in the intersection of all k arrays must appear at least k times in a row. Figuring out how to efficiently determine that the >=k consecutive elements have been seen in each array is left as an exercise. – wLui155 May 13 at 21:02
  • 1
    Are the numbers small? within [0, 127] perhaps? – Neil May 13 at 21:37
  • There seems to be no need for a priority queue - you could maintain an index per array, and just keep track of the largest value at any current index of any array, plus a counter for how many arrays that value has been seen in. This should come out at O(n) time, but the simple one-liner using sets is also O(n), so the only asymptotic difference is in auxiliary space. At least in Python, the solution using sets is certainly going to be faster because the work is done by code written in C. – kaya3 May 13 at 21:46
  • 1
    I can read what k is: What exactly is n? – greybeard May 14 at 4:17
  • 1
    What exactly is the intersection of [lists]? In [[1, 3, 3, 5, 7, 7, 9], [2, 2, 3, 3, 5, 5, 7, 7]], is the intersection [3, 3, 5, 7, 7] or [3, 5, 7]? – greybeard May 14 at 4:24

Comments

Popular posts from this blog

Meaning of `{}` for return expression

Get current scroll position of ScrollView in React Native

Chart JS +ng2-charts not working on Angular+2