-
Notifications
You must be signed in to change notification settings - Fork 340
Set operations of Excel inter row data (intersection, union, difference)
In the workplace, it is common to encounter tasks that require merging and comparing data from two or more spreadsheets, where duplicate data needs to be picked out. This is essentially a conventional set operation, but Excel cannot directly support it and requires a series of functions to implement it, which is cumbersome and not easy to understand.
Here is a useful plugin that can directly perform set operations.
This plugin is called SPL, which excels in handling various complex Excel operations and is used to assist Excel calculations, doubling the work efficiency in seconds.
For example, there are product names and salesperson names in the top ten sales rankings for January and February:
We use SPL to calculate intersection, union, and difference based on key columns and entire row data respectively.
Using the Product Name as the key column, identify the product data that entered the top ten for both January and February (just list the salesperson's name for January).
The operation is simple, enter the following code in the blank cell:
=spl("=\[E(?1),E(?2)\].merge@oi(ProductName)",Jan!A1:B11,Feb!A1:B11)
As shown in the figure:
Then ctrl-enter returns the result directly, obtaining the intersection of product names in two sheets.
Code explanation: =spl() indicates calling the SPL plugin, ?1 and ?2 represents the data parameters involved in the calculation, here referring to Jan!A1:B11 and Feb!A1:B11. Product Name is the specified key column, and [].merge() represents merging the tables within [], the letter after the symbol @ represents the method of merging, i represents merging after intersection, and o represents regarding the key column ProductName to be unordered. Therefore, the meaning of the entire code means to intersect the tables of January and February according to ProductName.
Similarly, to implement union and difference, you only need to change the letter option after @.
Using the Product Name as the key column, identify the product data that entered the top ten once or more in January and February (just list the salesperson's name for January).
To calculate union, just change the letter after @ to u, and the code is as follows:
=spl("=\[E(?1),E(?2)\].merge@ou(ProductName)",Jan!A1:B11,Feb!A1:B11)
Identify product sales data that entered the top ten in January but did not enter the top ten in February.
The difference is represented by the letter d
=spl("=\[E(?1),E(?2)\].merge@od(ProductName)",Jan!A1:C11,Feb!B1: B11)
If there are two or more key columns, simply add them in parentheses. For example, merge by the intersection of the ProductName and Name fields: merge@oi(ProductName,Name)
Of course, the merge function can also implement merging based on whether the entire row of data is the same, simply removing the field names in parentheses, such as Example 2 below.
Identify the products and salesperson data that both entered the top ten in January and February.
=spl("=\[E(?1),E(?2)\].merge@oi()",Jan!A1:B11,Feb!A1:B11)
Identify data on products and salespeople who have entered the top ten once or more.
=spl("=\[E(?1),E(?2)\].merge@ou()",Jan!A1:B11,Feb!A1:B11)
Identify the products and salespeople data that entered the top ten in January but did not enter the top ten in February.
=spl("=\[E(?1),E(?2)\].merge@od()",Jan!A1:B11,Feb!A1:B11)
Sometimes, there may be situations where multiple spreadsheets are merged and compared, such as the sales data of top10Sales.xlsx for multiple months, and the number of months may increase over time.
We now need to perform set operations on data from multiple months, and of course, we can continue to use the writing method of [E(?1),E(?2),E(?3),……].merge() in the above examples. However, when there are many sheets, it is still a bit troublesome. At this point, we can operate more efficiently in the SPL ide.
Identify the list of products that have all entered the top ten in recent months.
Enter the following code in the IDE:
A | |
---|---|
1 | =file("top10Sales.xlsx").xlsopen() |
2 | =A1.(A1.xlsimport@t(;stname)).merge@oi(ProductName) |
A1 represents reading an Excel file and returning the names, number of rows and columns of all sheets in the file:
A2 opens all data in the sheets based on the sheet name stname, and then calculates the intersection
Similarly, you can calculate union and difference.
Identify the products that have made it to the top ten once or more:
A | |
---|---|
1 | =file("top10Sales.xlsx").xlsopen() |
2 | =A1.(A1.xlsimport@t(;stname)).merge@ou(ProductName) |
Identify the list of products that made it to the top ten in January but did not make it to the top ten in other months:
A | |
---|---|
1 | =file("top10Sales.xlsx").xlsopen() |
2 | =A1.(A1.xlsimport@t(;stname)).merge@od(ProductName) |
For complex set operations in Excel, with the help of SPL, all can be done with a merge() function!!!
In SPL, there are many flexible data processing functions that can handle various complex Excel operations, and the syntax is also simple and very user-friendly.
And SPL also comes with rich reference cases esProc Desktop and Excel Processing , where 90% of Excel problems in the workplace can be solved. The code in the book can be basically copied and then used with slight modifications.
SPL download address: esProc Desktop Download
Plugin Installation Method: SPL XLL Installation and Configuration
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code