diff --git a/CHANGELOG.md b/CHANGELOG.md index b070cf5a..41be2b43 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,7 +8,15 @@ This project follows semantic versioning. ## [Unreleased] -*No new changes.* +### Additions + +One new addition to the list of algorithms: + +- `merge(_:_:)` eagerly merges two sorted sequences with the same element type + and sorting criterion into a collection that is likewise sorted. When using + lazy sequences, mergers can be done lazily with `lazilyMerge(_:_:)`. Treating + the sorted sequences as multi-sets, a parameter can be provided to return the + sequences' union, intersection, *etc.* instead. ([#184]) --- @@ -308,6 +316,7 @@ This changelog's format is based on [Keep a Changelog](https://keepachangelog.co [#130]: https://github.com/apple/swift-algorithms/pull/130 [#138]: https://github.com/apple/swift-algorithms/pull/138 [#162]: https://github.com/apple/swift-algorithms/pull/162 +[#184]: https://github.com/apple/swift-algorithms/pull/184 diff --git a/Guides/Merge.md b/Guides/Merge.md new file mode 100644 index 00000000..080bbc5b --- /dev/null +++ b/Guides/Merge.md @@ -0,0 +1,130 @@ +# Merge + +[[Source](https://github.com/apple/swift-algorithms/blob/main/Sources/Algorithms/Merge.swift) | + [Tests](https://github.com/apple/swift-algorithms/blob/main/Tests/SwiftAlgorithmsTests/MergeTests.swift)] + +A function returning the sorted merger of two already sorted sequences. + +```swift +let source1 = "acegg", source2 = "bdfgh" +print(merge(source1, source2)) // Prints "abcdefgggh" + +// Is equivalent to: +print(String((source1 + source2).sorted())) +``` + +A sorted list may be used to implement a set. To aid this, `merge` supports +generating results that are subsets of a full merger, based on standard set +operations. + +```swift +print(merge(source1, source2, keeping: .union)) // "abcdefggh" +print(merge(source1, source2, keeping: .intersection)) // "g" +``` + +## Detailed Design + +By default, the `merge` function takes two sequences with a common `Element` +type that conforms to `Comparable`, and returns an `Array`: + +```swift +public func merge( + _ first: Base1, _ second: Base2, keeping operation: SetOperation = .sum +) -> [Base1.Element] +where Base1.Element == Base2.Element, Base2.Element: Comparable +``` + +The optional third parameter adjusts the result to exclude elements that would +not match said parameter's set operation, based on shared and/or disjoint +element values. For `Element` types that do not conform to `Comparable`, and/or +when the sequences use a sort order other than `<`, an ordering predicate can be +supplied as a fourth parameter (implemented as an overload): + +```swift +public func merge( + _ first: Base1, _ second: Base2, keeping operation: SetOperation = .sum, + along areInIncreasingOrder: (Base1.Element, Base2.Element) throws -> Bool +) rethrows -> [Base1.Element] where Base1.Element == Base2.Element +``` + +Filtering by set operations is represented by the `SetOperation` enumeration +type. Use `.sum` for a standard merger. For some given element value *x*, its +multiplicity in the merged result is based on the operation chosen and the +value's multiplicities in the operands: + +| Operation | Case | Multiplicity of *x* in the Result | +| --------- | ---- | --------------------------------- | +| ∅ | `none` | 0 | +| *First* \\ *Second* | `firstWithoutSecond` | max(*m*₁(x) - *m*₂(x), 0) | +| *Second* \\ *First* | `secondWithoutFirst` | max(*m*₂(x) - *m*₁(x), 0) | +| *First* ⊖ *Second* | `symmetricDifference` | \|*m*₁(x) - *m*₂(x)\| | +| *First* ∩ *Second* | `intersection` | min(*m*₁(x), *m*₂(x)) | +| *First* | `first` | *m*₁(x) | +| *Second* | `second` | *m*₂(x) | +| *First* ∪ *Second* | `union` | max(*m*₁(x), *m*₂(x)) | +| *First* + *Second* | `sum` | *m*₁(x) + *m*₂(x) | + +Equivalent elements preserve their relative order. + +When shared element values are read, which source has their copy passed through +depends on the operation. For `.sum`, all the equivalent elements from the first +sequence are vended before any from the second sequence. For `.second`, the copy +from the second sequence is used. For `.intersection`, `.first`, and `.union`; +the copy from the first sequence is used. + +If the two source sequences share a type, and said type conforms to +`RangeReplaceableCollection`, then `merge` will return that type instead. + +```swift +public func merge( + _ first: Base, _ second: Base, keeping operation: SetOperation = .sum +) -> Base where Base.Element: Comparable + +public func merge( + _ first: Base, _ second: Base, keeping operation: SetOperation = .sum, + along areInIncreasingOrder: (Base.Element, Base.Element) throws -> Bool +) rethrows -> Base +``` + +All versions of `merge` compute the merger eagerly during the function call. If +the result needs to be lazily generated, use the `lazilyMerge` function, which +returns a custom lazy sequence. However, the ordering predicate must be a +non-throwing function. Omitting the predicate sets the default to lexicographic +ordering with the `<` operator. + +```swift +public func lazilyMerge( + _ first: Base1, _ second: Base2, keeping operation: SetOperation = .sum +) -> Merged2Sequence +where Base1.Element == Base2.Element, Base2.Element: Comparable + +public func lazilyMerge( + _ first: Base1, _ second: Base2, keeping operation: SetOperation = .sum, + along areInIncreasingOrder: (Base1.Element, Base2.Element) -> Bool +) -> Merged2Sequence +where Base1.Element == Base2.Element +``` + +Variant functions with higher arities are not provided since many set +operations, besides set-sum, are poorly defined for three or more operands. + +### Complexity + +The `merge` function performs in O(_m \+ n_) time, where *m* and *n* are the +lengths of the source sequences. The `lazilyMerge` function returns its proxy in +O(1) time, but carries out the entire operation in the same time as the eager +version. + +### Comparison with other languages + +**C++:** For general merging, you can call either the `std::merge` or +`std::ranges::merge` functions with two pairs of iterators, or the +`std::ranges::merge` function with two ranges. For applying non-degenerate set +operations, separate functions are provided instead of a filtering parameter. +Given two pairs of iterators, you can call `std::set_difference`, +`std::set_intersection`, `std::set_symmetric_difference`, `std::set_union`, +`std::ranges::set_difference`, `std::ranges::set_intersection`, +`std::ranges::set_symmetric_difference`, or `std::ranges::set_union`. Given two +ranges, you can call `std::ranges::set_difference`, +`std::ranges::set_intersection`, `std::ranges::set_symmetric_difference`, or +`std::ranges::set_union`. diff --git a/README.md b/README.md index 270055fb..b963e830 100644 --- a/README.md +++ b/README.md @@ -23,6 +23,7 @@ Read more about the package, and the intent behind it, in the [announcement on s - [`cycled()`, `cycled(times:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Cycle.md): Repeats the elements of a collection forever or a set number of times. - [`joined(by:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Joined.md): Concatenate sequences of sequences, using an element or sequence as a separator, or using a closure to generate each separator. - [`product(_:_:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Product.md): Iterates over all the pairs of two collections; equivalent to nested `for`-`in` loops. +- [`merge(_:_:)`, `lazilyMerge(_:_:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Merge.md): Splices two sorted sequences into a new sorted sequence. #### Subsetting operations diff --git a/Sources/Algorithms/Merge.swift b/Sources/Algorithms/Merge.swift new file mode 100644 index 00000000..1baf6663 --- /dev/null +++ b/Sources/Algorithms/Merge.swift @@ -0,0 +1,713 @@ +//===----------------------------------------------------------------------===// +// +// This source file is part of the Swift Algorithms open source project +// +// Copyright (c) 2022 Apple Inc. and the Swift project authors +// Licensed under Apache License v2.0 with Runtime Library Exception +// +// See https://swift.org/LICENSE.txt for license information +// +//===----------------------------------------------------------------------===// + +//============================================================================== +// MARK: SetOperation +//============================================================================== + +/// Binary (multi-)set operations, using combinations of keeping or removing +/// shared and/or disjoint elements. +public enum SetOperation: UInt, CaseIterable { + + /// No elements from either set are preserved. + case none + /// The elements from the first set that are not shared with the second. + case firstWithoutSecond + /// The elements from the second set that are not shared with the first. + case secondWithoutFirst + /// The elements from either set that are not shared with the other. + case symmetricDifference + /// The elements shared by both sets. + case intersection + /// The elements of the first set. + case first + /// The elements of the second set. + case second + /// The elements of both sets, consolidating shared ones. + case union + /// The elements of both sets, preserving both copies of shared ones. + case sum = 0b1111 + +} + +extension SetOperation { + + /// Whether elements only in the first set are included in the operation. + @inlinable public var usesExclusivesFromFirst: Bool { rawValue & 0b0001 != 0 } + /// Whether elements only in the second set are included in the operation. + @inlinable public var usesExclusivesFromSecond: Bool {rawValue & 0b0010 != 0} + /// Whether elements that are shared between both sets are included in the + /// operation. + @inlinable public var usesShared: Bool { rawValue & 0b0100 != 0 } + /// Whether both copies of elements shared between both sets are included in + /// the operation. + @inlinable public var duplicatesShared: Bool { rawValue & 0b1000 != 0 } + + /// Creates an operation with the given combination of keeping or removing + /// shared and/or disjoint elements. + /// + /// - Warning: Keeping everything always results in `.union`, not `.sum`. The + /// latter is not reachable from this initializer. + /// + /// - Parameters: + /// - keepExclusivesToFirst: whether elements that are only in the first set + /// are preserved. + /// - keepExclusivesToSecond: whether elements that are only in the second + /// set are preserved. + /// - keepShared: whether elements that are shared between the sets are + /// preserved. + /// - Postcondition: `.usesExclusivesFromFirst == keepExclusivesToFirst`, + /// `.usesExclusivesFromSecond == keepExclusivesToSecond`, + /// `.usesShared == keepShared`, `.duplicatesShared == false`. + @inlinable + public init( + keepExclusivesToFirst: Bool, keepExclusivesToSecond: Bool, keepShared: Bool + ) { + let k1, k2, ks: UInt + k1 = keepExclusivesToFirst ? 0b0001 : 0 + k2 = keepExclusivesToSecond ? 0b0010 : 0 + ks = keepShared ? 0b0100 : 0 + self.init(rawValue: k1 | k2 | ks)! + } + +} + +//============================================================================== +// MARK: - merge +//============================================================================== + +/// Merges the two given sorted sequences into a sorted array, but retaining +/// only the given subset of elements from the merger. +/// +/// When shared elements are copied, the source sequence depends on `operation`. +/// +/// - For `.intersection`, `.first`, or `.union`; `first` is the source. +/// - For `.second`,`second` is the source. +/// - For `.sum`, all of the elements from `first` are used first, followed by +/// the ones from `second`. +/// +/// Elements from the same source preserve their relative order. +/// +/// - Precondition: Both `first` and `second` are finite. And they are both +/// considered sorted. +/// +/// - Parameters: +/// - first: The first sequence to be spliced together. +/// - second: The second sequence to be spliced together. +/// - operation: Which set operation to apply when generating the returned +/// object, treating the source sequences as multi-sets. If omitted, `.sum` +/// is used as the default, resulting in a conventional full merger. +/// - Returns: A sorted array of the merger, excluding the elements banned by +/// the set operation. +/// +/// - Complexity: O(*m* + *n*), where *m* and *n* are the lengths of the two +/// sequences. +@inlinable +public func merge( + _ first: Base1, _ second: Base2, keeping operation: SetOperation = .sum +) -> [Base1.Element] +where Base1.Element == Base2.Element, Base2.Element: Comparable { + merge(first, second, keeping: operation, along: <) +} + +/// Merges the two given sorted collections into a new sorted collection, but +/// retaining only the given subset of elements from the merger. +/// +/// When shared elements are copied, the source sequence depends on `operation`. +/// +/// - For `.intersection`, `.first`, or `.union`; `first` is the source. +/// - For `.second`,`second` is the source. +/// - For `.sum`, all of the elements from `first` are used first, followed by +/// the ones from `second`. +/// +/// Elements from the same source preserve their relative order. +/// +/// - Precondition: Both `first` and `second` are considered sorted. +/// +/// - Parameters: +/// - first: The first collection to be spliced together. +/// - second: The second collection to be spliced together. +/// - operation: Which set operation to apply when generating the returned +/// object, treating the source collections as multi-sets. If omitted, +/// `.sum` is used as the default, resulting in a conventional full merger. +/// - Returns: A sorted collection of the merger, excluding the elements banned +/// by the set operation. +/// +/// - Complexity: O(*m* + *n*), where *m* and *n* are the lengths of the two +/// collections. +@inlinable +public func merge( + _ first: Base, _ second: Base, keeping operation: SetOperation = .sum +) -> Base where Base.Element: Comparable { + merge(first, second, keeping: operation, along: <) +} + +/// Merges the two given sequences, each sorted using the given predicate as the +/// comparison between elements, into a sorted array, but retaining only the +/// given subset of elements from the merger. +/// +/// The predicate must be a *strict weak ordering* over the elements. That is, +/// for any elements `a`, `b`, and `c`, the following conditions must hold: +/// +/// - `areInIncreasingOrder(a, a)` is always `false`. (Irreflexivity) +/// - If `areInIncreasingOrder(a, b)` and `areInIncreasingOrder(b, c)` are both +/// `true`, then `areInIncreasingOrder(a, c)` is also `true`. (Transitive +/// comparability) +/// - Two elements are *incomparable* if neither is ordered before the other +/// according to the predicate. If `a` and `b` are incomparable, and `b` and +/// `c` are incomparable, then `a` and `c` are also incomparable. (Transitive +/// incomparability) +/// +/// When shared elements are copied, the source sequence depends on `operation`. +/// +/// - For `.intersection`, `.first`, or `.union`; `first` is the source. +/// - For `.second`,`second` is the source. +/// - For `.sum`, all of the elements from `first` are used first, followed by +/// the ones from `second`. +/// +/// Elements from the same source preserve their relative order. +/// +/// - Precondition: Both `first` and `second` are finite. And they are both +/// considered sorted according to `areInIncreasingOrder`. +/// +/// - Parameters: +/// - first: The first sequence to be spliced together. +/// - second: The second sequence to be spliced together. +/// - operation: Which set operation to apply when generating the returned +/// object, treating the source sequences as multi-sets. If omitted, `.sum` +/// is used as the default, resulting in a conventional full merger. +/// - areInIncreasingOrder: A predicate that returns `true` if its first +/// argument should be ordered before its second argument; otherwise, +/// `false`. +/// - Returns: A sorted array of the merger, excluding the elements banned by +/// the set operation. +/// +/// - Complexity: O(*m* + *n*), where *m* and *n* are the lengths of the two +/// sequences. +@inlinable +public func merge( + _ first: Base1, _ second: Base2, keeping operation: SetOperation = .sum, + along areInIncreasingOrder: (Base1.Element, Base2.Element) throws -> Bool +) rethrows -> [Base1.Element] where Base1.Element == Base2.Element { + try merge(first, second, into: Array.self, keeping: operation, + along: areInIncreasingOrder) +} + +/// Merges the two given collections, each sorted using the given predicate as +/// the comparison between elements, into a sorted collection, but retaining +/// only the given subset of elements from the merger. +/// +/// The predicate must be a *strict weak ordering* over the elements. That is, +/// for any elements `a`, `b`, and `c`, the following conditions must hold: +/// +/// - `areInIncreasingOrder(a, a)` is always `false`. (Irreflexivity) +/// - If `areInIncreasingOrder(a, b)` and `areInIncreasingOrder(b, c)` are both +/// `true`, then `areInIncreasingOrder(a, c)` is also `true`. (Transitive +/// comparability) +/// - Two elements are *incomparable* if neither is ordered before the other +/// according to the predicate. If `a` and `b` are incomparable, and `b` and +/// `c` are incomparable, then `a` and `c` are also incomparable. (Transitive +/// incomparability) +/// +/// When shared elements are copied, the source sequence depends on `operation`. +/// +/// - For `.intersection`, `.first`, or `.union`; `first` is the source. +/// - For `.second`,`second` is the source. +/// - For `.sum`, all of the elements from `first` are used first, followed by +/// the ones from `second`. +/// +/// Elements from the same source preserve their relative order. +/// +/// - Precondition: Both `first` and `second` are considered sorted according to +/// `areInIncreasingOrder`. +/// +/// - Parameters: +/// - first: The first collection to be spliced together. +/// - second: The second collection to be spliced together. +/// - operation: Which set operation to apply when generating the returned +/// object, treating the source collections as multi-sets. If omitted, +/// `.sum` is used as the default, resulting in a conventional full merger. +/// - areInIncreasingOrder: A predicate that returns `true` if its first +/// argument should be ordered before its second argument; otherwise, +/// `false`. +/// - Returns: A sorted collection of the merger, excluding the elements banned +/// by the set operation. +/// +/// - Complexity: O(*m* + *n*), where *m* and *n* are the lengths of the two +/// collections. +@inlinable +public func merge( + _ first: Base, _ second: Base, keeping operation: SetOperation = .sum, + along areInIncreasingOrder: (Base.Element, Base.Element) throws -> Bool +) rethrows -> Base { + try merge(first, second, into: Base.self, keeping: operation, + along: areInIncreasingOrder) +} + +//============================================================================== +// MARK: merge, Implementation +//============================================================================== + +/// Merges the two given sequences, each sorted using the given predicate as the +/// comparison between elements, into a sorted collection of the given type, but +/// retaining only the given subset of elements from the merger. +/// +/// The predicate must be a *strict weak ordering* over the elements. That is, +/// for any elements `a`, `b`, and `c`, the following conditions must hold: +/// +/// - `areInIncreasingOrder(a, a)` is always `false`. (Irreflexivity) +/// - If `areInIncreasingOrder(a, b)` and `areInIncreasingOrder(b, c)` are both +/// `true`, then `areInIncreasingOrder(a, c)` is also `true`. (Transitive +/// comparability) +/// - Two elements are *incomparable* if neither is ordered before the other +/// according to the predicate. If `a` and `b` are incomparable, and `b` and +/// `c` are incomparable, then `a` and `c` are also incomparable. (Transitive +/// incomparability) +/// +/// When shared elements are copied, the source sequence depends on `operation`. +/// +/// - For `.intersection`, `.first`, or `.union`; `first` is the source. +/// - For `.second`,`second` is the source. +/// - For `.sum`, all of the elements from `first` are used first, followed by +/// the ones from `second`. +/// +/// Elements from the same source preserve their relative order. +/// +/// - Precondition: Both `first` and `second` are finite. And they are both +/// considered sorted according to `areInIncreasingOrder`. +/// +/// - Parameters: +/// - first: The first sequence to be spliced together. +/// - second: The second sequence to be spliced together. +/// - type: A metatype specifier for the returned object's type. +/// - operation: Which set operation to apply when generating the returned +/// object, treating the source sequences as multi-sets. Use `.sum` for a +/// conventional merger. +/// - areInIncreasingOrder: A predicate that returns `true` if its first +/// argument should be ordered before its second argument; otherwise, +/// `false`. +/// - Returns: A sorted collection of the merger, excluding the elements banned +/// by the set operation. +/// +/// - Complexity: O(*m* + *n*), where *m* and *n* are the lengths of the two +/// sequences. +@usableFromInline +internal func merge< + Base1: Sequence, Base2: Sequence, Result: RangeReplaceableCollection +>( + _ first: Base1, _ second: Base2, into type: Result.Type, + keeping operation: SetOperation, + along areInIncreasingOrder: (Base1.Element, Base2.Element) throws -> Bool +) rethrows -> Result +where Base1.Element == Base2.Element, Base2.Element == Result.Element { + var result = Result() + result.reserveCapacity(combinedUnderestimatedCount(first.underestimatedCount, + second.underestimatedCount, + keeping: operation)) + try withoutActuallyEscaping(areInIncreasingOrder) { predicate in + var iterator = Merged2Iterator(with: first.makeIterator(), + and: second.makeIterator(), + keeping: operation, along: predicate) + while let element = try iterator.throwingNext() { + result.append(element) + } + } + return result +} + +/// Returns the worst case `underestimatedCount` for the given sequence counts +/// and the set operation combining them. +/// +/// Since the actual elements cannot be read, operations that would require +/// reading the elements first will report the worst-case count instead. +fileprivate func combinedUnderestimatedCount( + _ first: Int, _ second: Int, keeping operation: SetOperation +) -> Int { + switch operation { + case .none: + return 0 + case .firstWithoutSecond: + return max(first - second, 0) + case .secondWithoutFirst: + return max(second - first, 0) + case .symmetricDifference: + return abs(first - second) + case .intersection: + return 0 + case .first: + return first + case .second: + return second + case .union: + return max(first, second) + case .sum: + let (sum, didOverflow) = first.addingReportingOverflow(second) + return didOverflow ? .max : sum + } +} + +//============================================================================== +// MARK: - lazilyMerge +//============================================================================== + +/// Lazily merges the two given sorted lazy sequences into a new sorted lazy +/// sequence, where only the given subset of merged elements is retained. +/// +/// When shared elements are copied, the source sequence depends on `operation`. +/// +/// - For `.intersection`, `.first`, or `.union`; `first` is the source. +/// - For `.second`,`second` is the source. +/// - For `.sum`, all of the elements from `first` are used first, followed by +/// the ones from `second`. +/// +/// Elements from the same source preserve their relative order. +/// +/// - Precondition: Both `first` and `second` are considered sorted. +/// +/// - Parameters: +/// - first: The first sequence to be spliced together. +/// - second: The second sequence to be spliced together. +/// - operation: Which set operation to apply when generating the returned +/// object, treating the source sequences as multi-sets. If omitted, `.sum` +/// is used as the default, resulting in a conventional full merger. +/// - Returns: A lazy sequence of the sorted merger, excluding the elements +/// banned by the set operation. +/// +/// - Complexity: O(1), but generating the actual sequence will work in O(*m* + +/// *n*) time, where *m* and *n* are the lengths of the two sequences. +@inlinable +public func lazilyMerge< + Base1: LazySequenceProtocol, Base2: LazySequenceProtocol +>( + _ first: Base1, _ second: Base2, keeping operation: SetOperation = .sum +) -> Merged2Sequence +where Base1.Element == Base2.Element, Base2.Element: Comparable { + lazilyMerge(first, second, keeping: operation, along: <) +} + +/// Lazily merges the two given lazy sequences, each sorted using the given +/// predicate as the comparison between elements, into a new sorted lazy +/// sequence, where only the given subset of merged elements is retained. +/// +/// The predicate must be a *strict weak ordering* over the elements. That is, +/// for any elements `a`, `b`, and `c`, the following conditions must hold: +/// +/// - `areInIncreasingOrder(a, a)` is always `false`. (Irreflexivity) +/// - If `areInIncreasingOrder(a, b)` and `areInIncreasingOrder(b, c)` are both +/// `true`, then `areInIncreasingOrder(a, c)` is also `true`. (Transitive +/// comparability) +/// - Two elements are *incomparable* if neither is ordered before the other +/// according to the predicate. If `a` and `b` are incomparable, and `b` and +/// `c` are incomparable, then `a` and `c` are also incomparable. (Transitive +/// incomparability) +/// +/// When shared elements are copied, the source sequence depends on `operation`. +/// +/// - For `.intersection`, `.first`, or `.union`; `first` is the source. +/// - For `.second`,`second` is the source. +/// - For `.sum`, all of the elements from `first` are used first, followed by +/// the ones from `second`. +/// +/// Elements from the same source preserve their relative order. +/// +/// - Precondition: Both `first` and `second` are considered sorted according to +/// `areInIncreasingOrder`. +/// +/// - Parameters: +/// - first: The first sequence to be spliced together. +/// - second: The second sequence to be spliced together. +/// - operation: Which set operation to apply when generating the returned +/// object, treating the source sequences as multi-sets. If omitted, `.sum` +/// is used as the default, resulting in a conventional full merger. +/// - areInIncreasingOrder: A predicate that returns `true` if its first +/// argument should be ordered before its second argument; otherwise, +/// `false`. +/// - Returns: A lazy sequence of the sorted merger, excluding the elements +/// banned by the set operation. +/// +/// - Complexity: O(1), but generating the actual sequence will work in O(*m* + +/// *n*) time, where *m* and *n* are the lengths of the two sequences. +@inlinable +public func lazilyMerge< + Base1: LazySequenceProtocol, Base2: LazySequenceProtocol +>( + _ first: Base1, _ second: Base2, keeping operation: SetOperation = .sum, + along areInIncreasingOrder: @escaping (Base1.Element, Base2.Element) -> Bool +) -> Merged2Sequence +where Base1.Element == Base2.Element { + Merged2Sequence(with: first.elements, and: second.elements, + keeping: operation, along: areInIncreasingOrder) +} + +//============================================================================== +// MARK: - Merged2Sequence +//============================================================================== + +/// A sequence vending the sorted merger of its source sequences. +public struct Merged2Sequence +where Base1.Element == Base2.Element { + + /// The first sequence to merge. + let base1: Base1 + /// The second sequence to merge. + let base2: Base2 + /// The set operation filtering out elements from the merger. + let operation: SetOperation + /// The ordering predicate. + let areInIncreasingOrder: (Base1.Element, Base2.Element) -> Bool + + /// Creates a sequence-merging sequence from the given parameters. + @usableFromInline + init(with base1: Base1, and base2: Base2, keeping operation: SetOperation, + along areInIncreasingOrder: @escaping (Base1.Element, Base2.Element) + -> Bool) { + self.base1 = base1 + self.base2 = base2 + self.operation = operation + self.areInIncreasingOrder = areInIncreasingOrder + } + +} + +extension Merged2Sequence: LazySequenceProtocol { + + public typealias Element = Base1.Element + public typealias Iterator = Merged2Iterator + + public func makeIterator() -> Iterator { + return Merged2Iterator(with: base1.makeIterator(), + and: base2.makeIterator(), keeping: operation, + along: areInIncreasingOrder) + } + public var underestimatedCount: Int { + combinedUnderestimatedCount(base1.underestimatedCount, + base2.underestimatedCount, keeping: operation) + } + public func withContiguousStorageIfAvailable( + _ body: (UnsafeBufferPointer) throws -> R + ) rethrows -> R? { + switch operation { + case .none: + return try body(UnsafeBufferPointer(start: nil, count: 0)) + case .first: + return try base1.withContiguousStorageIfAvailable(body) + case .second: + return try base2.withContiguousStorageIfAvailable(body) + default: + // The other cases may alternate elements from both sequences, take only + // some elements from a given sequence, or both. These prevent using + // either of the two potentially available memory blocks. + return nil + } + } + + public func _customContainsEquatableElement(_ element: Element) -> Bool? { + switch operation { + case .none: + return false + case .first: + return base1._customContainsEquatableElement(element) + case .second: + return base2._customContainsEquatableElement(element) + case .sum: + switch (base1._customContainsEquatableElement(element), + base2._customContainsEquatableElement(element)) { + case (_, .some(true)), (.some(true), _): + return true + case (.some(false), .some(false)): + return false + case (.none, _), (_, .none): + return nil + } + default: + // An element cannot be checked for inclusion without reading both + // sequences; and also depends on the operation, the ordering predicate, + // and if said predicate is compatible with `==`. All of these prevent + // confirmation. + return nil + } + } + +} + +//============================================================================== +// MARK: - Merged2Iterator +//============================================================================== + +/// An iterator vending the sorted merger of its source iterators. +public struct Merged2Iterator +where Base1.Element == Base2.Element { + + /// The first Iterator to merge. + var base1: Base1 + /// The second iterator to merge. + var base2: Base2 + /// The ordering predicate. + let areInIncreasingOrder: (Base1.Element, Base2.Element) throws -> Bool + + /// Whether to stop reading from `base1`. + var didFinish1: Bool + /// Whether to stop reading from `base2`. + var didFinish2: Bool + /// The last element read but unused from `base1`. + var previous1: Base1.Element? + /// The last element read but unused from `base2`. + var previous2: Base2.Element? + + /// Handler for elements exclusive to `base1`, including any trailing ones. + let exclusiveHandler1: (Base1.Element) -> Base1.Element? + /// Handler for elements exclusive to `base2`, including any trailing ones. + let exclusiveHandler2: (Base2.Element) -> Base2.Element? + /// Copy handler for the shared elements. + /// + /// There's no `dequeue1` because it's always `true`. There's no parameter for + /// the element from `base2` because the operations that need it (`.second` + /// and `.sum`) end up on code paths that skip calls to this handler. + let sharedHandler: (Base1.Element) -> (Base1.Element?, dequeue2: Bool) + + /// Creates an iterator-merging iterator from the given parameters. + @usableFromInline + init(with base1: Base1, and base2: Base2, keeping operation: SetOperation, + along areInIncreasingOrder: @escaping (Base1.Element, Base2.Element) + throws -> Bool) { + // Retain sources and predicate. + self.base1 = base1 + self.base2 = base2 + self.areInIncreasingOrder = areInIncreasingOrder + + // Pre-ignore certain sources. + switch operation { + case .none: + didFinish1 = true + didFinish2 = true + case .first: + didFinish1 = false + didFinish2 = true + case .second: + didFinish1 = true + didFinish2 = false + default: + didFinish1 = false + didFinish2 = false + } + + // Set policy for each grouping class. + if operation.usesExclusivesFromFirst { + exclusiveHandler1 = { e in return e } + } else { + exclusiveHandler1 = { _ in return nil } + } + if operation.usesExclusivesFromSecond { + exclusiveHandler2 = { e in return e } + } else { + exclusiveHandler2 = { _ in return nil } + } + if operation.usesShared { + sharedHandler = { e in return (e, !operation.duplicatesShared) } + } else { + sharedHandler = { _ in return (nil, true) } + } + } + +} + +extension Merged2Iterator: IteratorProtocol { + + public mutating func next() -> Base1.Element? { + return try! throwingNext() + } + + /// Advances to the next element and returns it, or `nil` if no next element + /// exists, but could throw in the process. + internal mutating func throwingNext() throws -> Base2.Element? { + switch (didFinish1, didFinish2) { + case (false, false): + repeat { + // Read the latest elements of each iterator as needed. + previous1 = previous1 ?? base1.next() + previous2 = previous2 ?? base2.next() + + // Remove any elements that actually got read or ignored to prepare for + // the next loop. + var didUse1 = false, didUse2 = false + defer { + didFinish1 = previous1 == nil + didFinish2 = previous2 == nil + if didUse1 { + previous1 = nil + } + if didUse2 { + previous2 = nil + } + } + + // Compare the latest elements for vending order (or skip). + check: switch (previous1, previous2) { + case let (first?, second?): + var handledElement: Base2.Element? + if try areInIncreasingOrder(first, second) { + // Exclusive to first + didUse1 = true + handledElement = exclusiveHandler1(first) + } else if try areInIncreasingOrder(second, first) { + // Exclusive to second + didUse2 = true + handledElement = exclusiveHandler2(second) + } else { + // Shared + didUse1 = true + (handledElement, didUse2) = sharedHandler(first) + } + if let returnedElement = handledElement { + return returnedElement + } else { + break check + } + case (let first?, nil): + // Start draining the first iterator, or wrap up operations if + // elements exclusive to that iterator aren't supported. + didUse1 = true + previous1 = exclusiveHandler1(first) + return previous1 + case (nil, let second?): + // Start draining the second iterator, or wrap up operations if + // elements exclusive to that iterator aren't supported. + didUse2 = true + previous2 = exclusiveHandler2(second) + return previous2 + case (nil, nil): + return nil + } + } while !didFinish1 && !didFinish2 + + // At least one of the iterators got exhausted or permanently skipped + // while looking for a qualifying element. Shift to one of the other + // top-level cases to handle it. + return try throwingNext() + case (false, true): + // Drain the first iterator. + previous1 = previous1 ?? base1.next() + didFinish1 = previous1 == nil + defer { previous1 = nil } + return previous1 + case (true, false): + // Drain the second iterator. + previous2 = previous2 ?? base2.next() + didFinish2 = previous2 == nil + defer { previous2 = nil } + return previous2 + case (true, true): + // Both iterators exhausted/ignored + return nil + } + } + +} diff --git a/Tests/SwiftAlgorithmsTests/MergeTests.swift b/Tests/SwiftAlgorithmsTests/MergeTests.swift new file mode 100644 index 00000000..f203e54b --- /dev/null +++ b/Tests/SwiftAlgorithmsTests/MergeTests.swift @@ -0,0 +1,235 @@ +//===----------------------------------------------------------------------===// +// +// This source file is part of the Swift Algorithms open source project +// +// Copyright (c) 2022 Apple Inc. and the Swift project authors +// Licensed under Apache License v2.0 with Runtime Library Exception +// +// See https://swift.org/LICENSE.txt for license information +// +//===----------------------------------------------------------------------===// + +import XCTest +import Algorithms + +/// Tests for the `merge` and `lazilyMerge` functions, including the +/// `SetOperation` support type. +final class MergeTests: XCTestCase { + /// Check the members of `SetOperation`. + func testSetOperation() { + // Case iteration and value + XCTAssertEqualSequences(SetOperation.allCases, [ + .none, .firstWithoutSecond, .secondWithoutFirst, .symmetricDifference, + .intersection, .first, .second, .union, .sum + ]) + XCTAssertEqualSequences(SetOperation.allCases.map(\.rawValue), + [0, 1, 2, 3, 4, 5, 6, 7, 15]) + + // Subset confirmation + XCTAssertEqualSequences( + SetOperation.allCases.map(\.usesExclusivesFromFirst), + [false, true, false, true, false, true, false, true, true] + ) + XCTAssertEqualSequences( + SetOperation.allCases.map(\.usesExclusivesFromSecond), + [false, false, true, true, false, false, true, true, true] + ) + XCTAssertEqualSequences( + SetOperation.allCases.map(\.usesShared), + [false, false, false, false, true, true, true, true, true] + ) + XCTAssertEqualSequences( + SetOperation.allCases.map(\.duplicatesShared), + [false, false, false, false, false, false, false, false, true] + ) + + // Initializer + XCTAssertEqual(.none, SetOperation(keepExclusivesToFirst: false, + keepExclusivesToSecond: false, + keepShared: false)) + XCTAssertEqual(.firstWithoutSecond, SetOperation( + keepExclusivesToFirst: true, keepExclusivesToSecond: false, + keepShared: false)) + XCTAssertEqual(.secondWithoutFirst, SetOperation( + keepExclusivesToFirst: false, keepExclusivesToSecond: true, + keepShared: false)) + XCTAssertEqual(.symmetricDifference, SetOperation( + keepExclusivesToFirst: true, keepExclusivesToSecond: true, + keepShared: false)) + XCTAssertEqual(.intersection, SetOperation(keepExclusivesToFirst: false, + keepExclusivesToSecond: false, + keepShared: true)) + XCTAssertEqual(.first, SetOperation(keepExclusivesToFirst: true, + keepExclusivesToSecond: false, + keepShared: true)) + XCTAssertEqual(.second, SetOperation(keepExclusivesToFirst: false, + keepExclusivesToSecond: true, + keepShared: true)) + XCTAssertEqual(.union, SetOperation(keepExclusivesToFirst: true, + keepExclusivesToSecond: true, + keepShared: true)) + } + + /// Check the eager versions of merging. + func testEagerMerge() { + // Same (collection) type + let first = "acegg", second = "bdfgh", sum = "abcdefgggh" + XCTAssertEqual(merge(first, second), sum) + XCTAssertEqual(merge([1, 2, 4, 5], [3, 6, 8, 9]), [1, 2, 3, 4, 5, 6, 8, 9]) + + // Different sequence types + XCTAssertEqual(merge(first[...], second), Array(sum)) + + // Various set operations + XCTAssertEqual(merge(first, second, keeping: .none), "") + XCTAssertEqual(merge(first, second, keeping: .firstWithoutSecond), "aceg") + XCTAssertEqual(merge(first, second, keeping: .secondWithoutFirst), "bdfh") + XCTAssertEqual(merge(first,second,keeping: .symmetricDifference),"abcdefgh") + XCTAssertEqual(merge(first, second, keeping: .intersection), "g") + XCTAssertEqual(merge(first, second, keeping: .first), first) + XCTAssertEqual(merge(first, second, keeping: .second), second) + XCTAssertEqual(merge(first, second, keeping: .union), "abcdefggh") + XCTAssertEqual(merge(first, second, keeping: .sum), sum) + + // Flip which sequence gets exhausted first. + XCTAssertEqual(merge(second, first, keeping: .none), "") + XCTAssertEqual(merge(second, first, keeping: .firstWithoutSecond), "bdfh") + XCTAssertEqual(merge(second, first, keeping: .secondWithoutFirst), "aceg") + XCTAssertEqual(merge(second,first,keeping: .symmetricDifference),"abcdefgh") + XCTAssertEqual(merge(second, first, keeping: .intersection), "g") + XCTAssertEqual(merge(second, first, keeping: .first), second) + XCTAssertEqual(merge(second, first, keeping: .second), first) + XCTAssertEqual(merge(second, first, keeping: .union), "abcdefggh") + XCTAssertEqual(merge(second, first, keeping: .sum), sum) + + // Custom check when both sequences end at the same time + XCTAssertEqual(merge("", ""), "") + } + + /// Check the estimated length for (lazy) merging. + func testMergerUnderestimatedCount() { + // Set up + let array1 = [0, 2, 3, 4, 4, 7], array2 = [-3, 0, 1, 6, 7, 7, 10] + let lazyMergers = SetOperation.allCases.map { + lazilyMerge(array1.lazy, array2.lazy, keeping: $0) + } + XCTAssertEqualSequences(lazyMergers.map(Array.init), [ + [], [2, 3, 4, 4], [-3, 1, 6, 7, 10], [-3, 1, 2, 3, 4, 4, 6, 7, 10], + [0, 7], array1, array2, [-3, 0, 1, 2, 3, 4, 4, 6, 7, 7, 10], + [-3, 0, 0, 1, 2, 3, 4, 4, 6, 7, 7, 7, 10] + ]) + + // Finite estimates + XCTAssertEqualSequences(lazyMergers.map(\.underestimatedCount), + [0, 0, 1, 1, 0, 6, 7, 7, 13]) + + // Over-sized estimates + let big = lazilyMerge(repeatElement(1.0, count: .max).lazy, + repeatElement(2.0, count: .max).lazy) + XCTAssertEqual(big.underestimatedCount, .max) + } + + /// Check accessing memory to the elements of a (lazy) merger. + func testMergerMemoryBlocks() { + // Set up, using sequence type(s) with internal storage + let array1 = [0, 2, 3, 4, 4, 7], array2 = [-3, 0, 1, 6, 7, 7, 10] + let lazyMergers = SetOperation.allCases.map { + lazilyMerge(array1.lazy, array2.lazy, keeping: $0) + } + + // Only the degenerate cases can support a single memory block. + XCTAssertEqualSequences(lazyMergers.map({ merger in + return merger.withContiguousStorageIfAvailable { buffer in + buffer.baseAddress == nil + } + }), [true, nil, nil, nil, nil, false, false, nil, nil]) + } + + /// Check searching for an element within a (lazy) merger, for `contains`. + func testMergerEasyContainmentSearch() { + // Set up, using sequence type(s) with custom `contains` search + // (Guarantee sorted order in a `Set` by using at most one element.) + let set1: Set = [5], set2: Set = [6] + let setMergers = SetOperation.allCases.map { + lazilyMerge(set1.lazy, set2.lazy, keeping: $0) + } + XCTAssertEqualSequences(setMergers.map(Array.init), [ + [], [5], [6], [5, 6], [], [5], [6], [5, 6], [5, 6] + ]) + + // One total miss, and one match per operand + XCTAssertEqualSequences(setMergers.map({ merger in + return merger._customContainsEquatableElement(4) + }), [false, nil, nil, nil, nil, false, false, nil, false]) + XCTAssertEqualSequences(setMergers.map({ merger in + return merger._customContainsEquatableElement(5) + }), [false, nil, nil, nil, nil, true, false, nil, true]) + XCTAssertEqualSequences(setMergers.map({ merger in + return merger._customContainsEquatableElement(6) + }), [false, nil, nil, nil, nil, false, true, nil, true]) + } + + /// Check searching for an element within a (lazy) merger, for `contains`, + /// when only one operand supports custom search. + func testMergerHardContainmentSearch() { + // Set up + let array = [-3, 0, 1, 6, 7, 7, 10], set: Set = [6] + let mixedMerger1 = SetOperation.allCases.map { + lazilyMerge(array.lazy, set.lazy, keeping: $0) + } + XCTAssertEqualSequences(mixedMerger1.map(Array.init), [ + [], [-3, 0, 1, 7, 7, 10], [], [-3, 0, 1, 7, 7, 10], [6], + [-3, 0, 1, 6, 7, 7, 10], [6], [-3, 0, 1, 6, 7, 7, 10], + [-3, 0, 1, 6, 6, 7, 7, 10] + ]) + + // One total miss, and one match + XCTAssertEqualSequences(mixedMerger1.map({ merger in + return merger._customContainsEquatableElement(4) + }), [false, nil, nil, nil, nil, nil, false, nil, nil]) + XCTAssertEqualSequences(mixedMerger1.map({ merger in + return merger._customContainsEquatableElement(6) + }), [false, nil, nil, nil, nil, nil, true, nil, true]) + + // Repeat the tests, but flip the operand order. + let mixedMerger2 = SetOperation.allCases.map { + lazilyMerge(set.lazy, array.lazy, keeping: $0) + } + XCTAssertEqualSequences(mixedMerger2.map(Array.init), [ + [], [], [-3, 0, 1, 7, 7, 10], [-3, 0, 1, 7, 7, 10], [6], + [6], [-3, 0, 1, 6, 7, 7, 10], [-3, 0, 1, 6, 7, 7, 10], + [-3, 0, 1, 6, 6, 7, 7, 10] + ]) + + XCTAssertEqualSequences(mixedMerger2.map({ merger in + return merger._customContainsEquatableElement(4) + }), [false, nil, nil, nil, nil, false, nil, nil, nil]) + XCTAssertEqualSequences(mixedMerger2.map({ merger in + return merger._customContainsEquatableElement(6) + }), [false, nil, nil, nil, nil, true, nil, nil, true]) + } + + /// Check using a custom predicate, especially one that access only some of + /// each elements' data. + func testCustomPredicate() { + struct Pair: Hashable { + let value: Int + let flag: Bool + } + func compare(_ a: Pair, _ b: Pair) -> Bool { + return a.value < b.value + } + + let p0 = Pair(value: 0, flag: true), p1a = Pair(value: 1, flag: true) + let p1b = Pair(value: 1, flag: false), p2 = Pair(value: 2, flag: false) + let list1 = [p0, p1a], list2 = [p1b, p2] + XCTAssertEqualSequences(SetOperation.allCases.map { + merge(list1, list2, keeping: $0, along: compare) + }, [[], [p0], [p2], [p0, p2], [p1a], [p0, p1a], [p1b, p2], [p0, p1a, p2], + [p0, p1a, p1b, p2]]) + XCTAssertEqualSequences(SetOperation.allCases.map { + merge(list2, list1, keeping: $0, along: compare) + }, [[], [p2], [p0], [p0, p2], [p1b], [p1b, p2], [p0, p1a], [p0, p1b, p2], + [p0, p1b, p1a, p2]]) + } +}