diff --git a/proto/substrait/algebra.proto b/proto/substrait/algebra.proto
index eed7bcc79..efeff6852 100644
--- a/proto/substrait/algebra.proto
+++ b/proto/substrait/algebra.proto
@@ -333,9 +333,11 @@ message SetRel {
enum SetOp {
SET_OP_UNSPECIFIED = 0;
SET_OP_MINUS_PRIMARY = 1;
+ SET_OP_MINUS_PRIMARY_ALL = 7;
SET_OP_MINUS_MULTISET = 2;
SET_OP_INTERSECTION_PRIMARY = 3;
SET_OP_INTERSECTION_MULTISET = 4;
+ SET_OP_INTERSECTION_MULTISET_ALL = 8;
SET_OP_UNION_DISTINCT = 5;
SET_OP_UNION_ALL = 6;
}
diff --git a/site/docs/relations/logical_relations.md b/site/docs/relations/logical_relations.md
index d71b57d2f..ebee1acc4 100644
--- a/site/docs/relations/logical_relations.md
+++ b/site/docs/relations/logical_relations.md
@@ -268,14 +268,23 @@ The set operation encompasses several set-level operations that support combinin
The set operation type determines both the records that are emitted and the type of the output record.
-| Property | Description | Output Nullability
-| ----------------------- | ------------------------------------------------------------------------------------------------------------- | ----------------------------- |
-| Minus (Primary) | Returns all records from the primary input excluding any matching records from secondary inputs. | The same as the primary input.
-| Minus (Multiset) | Returns all records from the primary input excluding any records that are included in *all* secondary inputs. | The same as the primary input.
-| Intersection (Primary) | Returns all records from the primary input that match at least one record from *any* secondary inputs. | If a field is nullable in the primary input and in any of the secondary inputs, it is nullable in the output.
-| Intersection (Multiset) | Returns all records from the primary input that match at least one record from *all* secondary inputs. | If a field is required in any of the inputs, it is required in the output.
-| Union Distinct | Returns all the records from each set, removing any rows that are duplicated (within or across sets). | If a field is nullable in any of the inputs, it is nullable in the output.
-| Union All | Returns all records from each set, allowing duplicates. | If a field is nullable in any of the inputs, it is nullable in the output. |
+For some set operations, whether a specific record is included in the output and if it appears more than once depends on the number of times it occurs across all inputs. In the following table, treat:
+* m: the number of time a records occurs in the primary input (p)
+* n1: the number of times a record occurs in the 1st secondary input (s1)
+* n2: the number of times a record occurs in the 2nd secondary input (s2)
+* ...
+* n: the number of times a record occurs in the nth secondary input
+
+| Operation | Description | Examples | Output Nullability
+|-----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------| -----------------------------
+| Minus (Primary) | Returns all records from the primary input excluding any matching rows from secondary inputs, removing duplicates.
Each value is treated as a unique member of the set, so duplicates in the first set don’t affect the result.
This operation maps to SQL EXCEPT DISTINCT. | MINUS
p: {1, 2, 2, 3, 3, 3, 4}
s1: {1, 2}
s2: {3}
YIELDS
{4} | The same as the primary input.
+| Minus (Primary All) | Returns all records from the primary input excluding any matching records from secondary inputs.
For each specific record returned, the output contains max(0, m - sum(n1, n2, …, n)) copies.
This operation maps to SQL EXCEPT ALL. | MINUS ALL
p: {1, 2, 2, 3, 3, 3, 3}
s1: {1, 2, 3, 4}
s2: {3}
YIELDS
{2, 3, 3} | The same as the primary input.
+| Minus (Multiset) | Returns all records from the primary input excluding any records that are included in *all* secondary inputs.
This operation does not have a direct SQL mapping. | MINUS MULTISET
p: {1, 2, 3, 4}
s1: {1, 2}
s2: {1, 2, 3}
YIELDS
{3, 4} | The same as the primary input.
+| Intersection (Primary) | Returns all records from the primary input that are present in any secondary input, removing duplicates.
This operation does not have a direct SQL mapping. | INTERSECT
p: {1, 2, 2, 3, 3, 3, 4}
s1: {1, 2, 3, 5}
s2: {2, 3, 6}
YIELDS
{1, 2, 3} | If a field is nullable in the primary input and in any of the secondary inputs, it is nullable in the output.
+| Intersection (Multiset) | Returns all records from the primary input that match at least one record from *all* secondary inputs.
This operation maps to SQL INTERSECT DISTINCT | INTERSECT MULTISET
p: {1, 2, 3, 4}
s1: {2, 3}
s2: {3, 4}
YIELDS
{3} | If a field is required in any of the inputs, it is required in the output.
+| Intersection (Multiset All) | Returns all records from the primary input that are present in every secondary input.
For each specific record returned, the output contains min(m, n1, n2, …, n) copies.
This operation maps to SQL INTERSECT ALL. | INTERSECT ALL
p: {1, 2, 2, 3, 3, 3, 4}
s1: {1, 2, 3, 3, 5}
s2: {2, 3, 3, 6}
YIELDS
{2, 3, 3} | If a field is required in any of the inputs, it is required in the output.
+| Union Distinct | Returns all records from each set, removing duplicates.
This operation maps to SQL UNION DISTINCT. | UNION
p: {1, 2, 2, 3, 3, 3, 4}
s1: {2, 3, 5}
s2: {1, 6}
YIELDS
{1, 2, 3, 4, 5, 6} | If a field is nullable in any of the inputs, it is nullable in the output.
+| Union All | Returns all records from all inputs.
For each specific record returned, the output contains (m + n1 + n2 + … + n) copies.
This operation maps to SQL UNION ALL. | UNION ALL
p: {1, 2, 2, 3, 3, 3, 4}
s1: {2, 3, 5}
s2: {1, 6}
YIELDS
{1, 2, 2, 3, 3, 3, 4, 2, 3, 5, 1, 6} | If a field is nullable in any of the inputs, it is nullable in the output.
Note that for set operations, NULL matches NULL. That is
```
@@ -294,14 +303,16 @@ Input 3: (R, N, R, N, R, N, R, N) Secondary Input
The output type is as follows for the various operations
-| Property | Output Type
-| ----------------------- | -----------------------------------------------------------------------------------------------------
-| Minus (Primary) | (R, R, R, R, N, N, N, N)
-| Minus (Multiset) | (R, R, R, R, N, N, N, N)
-| Intersection (Primary) | (R, R, R, R, R, N, N, N)
-| Intersection (Multiset) | (R, R, R, R, R, R, R, N)
-| Union Distinct | (R, N, N, N, N, N, N, N)
-| Union All | (R, N, N, N, N, N, N, N)
+| Property | Output Type
+|-----------------------------| -----------------------------------------------------------------------------------------------------
+| Minus (Primary) | (R, R, R, R, N, N, N, N)
+| Minus (Primary All) | (R, R, R, R, N, N, N, N)
+| Minus (Multiset) | (R, R, R, R, N, N, N, N)
+| Intersection (Primary) | (R, R, R, R, R, N, N, N)
+| Intersection (Multiset) | (R, R, R, R, R, R, R, N)
+| Intersection (Multiset All) | (R, R, R, R, R, R, R, N)
+| Union Distinct | (R, N, N, N, N, N, N, N)
+| Union All | (R, N, N, N, N, N, N, N)
=== "SetRel Message"