Here’s the scenario: you have two lists of integers, and you have a sum. Write an algorithm – in Perl, of course – to find which two items when summed give you a specified number. If no pairing exists, don’t return anything. If a pairing does exist, return the first pair you encounter. Remember, you could get two very large input lists.

### Brainstorming

So when I first saw this problem, I wanted to do a nested loop. I *really* wanted to do a nested loop. However, I’ve been playing with algorithms a lot lately so it’s fresh in my brain how nested loops aren’t always the best way to do things. Here’s what we *know*:

- We have two arrays of integers, passed in by reference
- We have one sum coming in
- We don’t know how big the lists are, so we really need to scale

Here’s what I came up with:

- For each
`X`

in the shortest of the two lists,`Y = sum - X`

- Now that you have
`Y`

, do a binary search on the longest of the two lists for`Y`

- If you find
`Y`

in that longest list, return`(X, Y)`

We know that a binary search is `O(logn)`

, and we know that iterating over the shortest list is going to be `O(n)`

with n being the number of items in the shortest list. When we look at this then we see that we have `O(nlogn)`

.

### The Code

Here’s the code:

[gist id=”6556132″ file=”find_pairs_having_sum.pl”]

### Possible Improvements

There are a couple of ways that I think we could improve this algorithm:

- We may have a really big list, and we may want all of the items which fit, not just the first one we find. For that we could just append everything to an array and then return the whole array.
- You could check to see if the sum is smaller than the first element of both input arrays to save time on lookups that are too small
- You could also check the sum of the last elements in each of the arrays to see if it is less than the number coming in, which could help to save you time for numbers which are too long
- You could use a hash table to cache the values returned by the binary search to speed up that part as you iterate through the loop

That’s all I have for now, though that’s almost certainly not an exhaustive list.

### Big Fat Disclaimer

I am not an expert on algorithms, so it’s entirely possible that I screwed up on my big-O analysis. If you notice a mistake I made, please politely correct me. Don’t forget Wheaton’s Law.

### The Conclusion

This was a fun one, and I think I’m getting better at these small problems. I think this one was cool because I think the answer I came up with is pretty quick, and then the possible improvements I identified could make it that much faster. Let me know if you can think of additional improvements.

Please also note that I developed this entire program using koding.com. If you think you’d like to use koding.com, check it out. If you click here and sign up, it gives me more disk space to write code with, which enables me to do more of this. I apologize in advance, but there is no way to stop me.

Your O() analysis does not include sorting one of the lists for the binary search.

If both lists are sorted, you may get some improvement by determining which elements in each array already exceed the sum, and which elements are too big/small given the min/max in the other array.

-QM

I’m sending the lists in sorted, and you’re right, I could gain improvement by stopping when the sums get too big.