Bucketing Swift arrays in place

Bucketing Swift arrays in place

An interesting problem

A while back, I had a large array of structs representing jobs, which I needed to hand off to different services depending upon various tests. I wanted to minimise memory usage so I had a few requirements:

  1. Don't allocate new arrays (they're slow to allocate and eat memory)
  2. Avoid dispatching the items off one at a time

This naturally leads me to the combination of reordering the Array in place and sending off slices (Array<Element>.SubSequence) to the different services. Those slices are views into the original storage so they are very space-efficient.

Partition(by:)

There's a little known method on Array called partition(by:):

Reorders the elements of the collection such that all the elements that match the given predicate are after all the elements that don’t match.

Sounds ideal, so in theory we should be able to call this each time we want to gather a set of jobs. Let's give that a go with a simple array of Ints and collect all the 4s into a slice.

var array = [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]
let foursIndex = array.partition { $0 == 4 }
let fours = array[foursIndex...]

// array == [1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 4, 4]
// foursIndex == 9
// fours = [4, 4, 4]

So far so good, now let's try and get the 3s together:

var remainder = array[..<foursIndex]
let threesIndex = remainder.partition { $0 == 3 }
let threes = remainder[threesIndex...]

// threesIndex == 6
// threes = [3, 3, 3]

This looks good too, but wait - there's a problem:

// array == [1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 4, 4]

Huh, array hasn't changed. remainder is an Array.SubSequence and it seems that calling partition on it will trigger copy on write.

A workaround

Fortunately, there's a workaround via withUnsafeMutableBufferPointer:

var array = [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]
array.withUnsafeMutableBufferPointer { buffer in
    let foursIndex = buffer.partition { $0 == 4 }
    var remainder = buffer[..<foursIndex]
    
    let threesIndex = remainder.partition { $0 == 3}
}

// array == [1, 2, 1, 2, 1, 2, 3, 3, 3, 4, 4, 4]

When directly accessing the underlying storage, we ensure that we bypass the copy on write mechanics.

But is this really the correct approach?

I should probably finish up by saying that if you have an array of items that you want to sort in place, you're probably better off just using Array.sort as you'll traverse the Array the minimum number of times. The drawback is, you won't know the index offsets to each bucket of items within that Array so that will be another O(n) operation to discover them.

If you don't really care about allocations, a far less verbose method would be to use the Dictionary(grouping:by:) initialiser. In our increasingly contrived example, that would look like:

let grouped = Dictionary(grouping: array) { $0 }
let fours = grouped[4]
let threes = grouped[3]
Show Comments