-
Notifications
You must be signed in to change notification settings - Fork 339
Comparison of SQL and SPL:Static Transposition
【Abstract】
Data transposition aims to convert queried data into a specified format to display using front-end applications, such as reporting tools. There are row-to-column transposition, column-to-row transposition, and more complicated dynamic transposition. This essay focuses on solutions and basic principles of SQL and SPL, the two commonly used programming languages, in handling transposition scenarios, and tries to find the convenient and efficient way for you through sample programs in SQL and SPL.
A static transposition can define the data structure after data is transposed beforehand and the structure does not change accordingly as data is changed.
As the name shows, row-to-column transposition aims to convert row values into column names to transfer rows to columns. In practice, PIVOT is often used after the grouping and aggregation, which gets unique values from the row values under the to-be-transposed columns and then arrange the row values into column names. The role of PIVOT is to group and put the aggregates in a specific column into multiple columns for an intuitive representation.
【Example 1】Based on the following scores table, find the highest score of each subject in each class and present the results in columns. Below is part of the source table:
CLASS | STUDENTID | SUBJECT | SCORE |
---|---|---|---|
1 | 1 | English | 84 |
1 | 1 | Math | 77 |
1 | 1 | PE | 69 |
1 | 2 | English | 81 |
1 | 2 | Math | 80 |
… | … | … | … |
Below is the result of expected layout:
CLASS | MAX_MATH | MAX_ENGLISH | MAX_PE |
---|---|---|---|
1 | 97 | 96 | 97 |
2 | 97 | 96 | 97 |
… | … | … | … |
SQL solution:
SQL PIVOT is used to achieve row-to-column transposition and column-to-row transposition. But only the relatively new versions of certain database products support this method. ORACLE, for instance, supports it from the 11g version, and some databases, such as MYSQL, still uses the subquery to do the grouping and aggregation and then a left join to achieve the transposition and have not given any support to it. In this case, we use ORACLE 11g to write the SQL query:
SELECT *
FROM (
SELECT
CLASS, SUBJECT, SCORE
FROM SCORES
)
PIVOT (
MAX(SCORE) FOR SUBJECT
IN (
'Math' AS MAX_MATH,
'English' AS MAX_ENGLISH,
'PE' AS MAX_PE
)
)
SPL solution:
SPL offers A.pivot() function to perform transposition, row-to-column by default.
A | |
---|---|
1 | =T("Scores.csv") |
2 | =A1.groups(CLASS,SUBJECT; max(SCORE):MAX_SCORE) |
3 | =A2.pivot(CLASS; SUBJECT, MAX_SCORE; "Math":"MAX_MATH", "English":"MAX_ENGLISH", "PE":"MAX_PE") |
A1: Import Scores table from the source file.
A2: Group A1’s table by class and subject and calculate the highest score of each subject in each class.
A3: A.pivot() function transposes the row-wise highest scores into column names.
SPL supports retrieving a data table from the database, too. A1 in the above script can be modified as:
A | |
---|---|
1 | =connect("db").query("SELECT * FROM SCORES") |
Contrary to row-to-column transposition, the column-to-row transposition converts each field into a new row, where the new field value comes from the original field name or alias, and transforms the original field values into field values of a new field.
【Example 2】Based on the following Olympic medal table, generate a new table recording information for each type of medal. Below is part of the source table:
Game | Nation | Gold | Silver | Copper |
---|---|---|---|---|
30 | USA | 46 | 29 | 29 |
30 | China | 38 | 27 | 23 |
30 | UK | 29 | 17 | 19 |
30 | Russia | 24 | 26 | 32 |
30 | Korea | 13 | 8 | 7 |
… | … | … | … | … |
Below is the result of expected layout:
GAME | NATION | MEDAL_TYPE | MEDALS |
---|---|---|---|
30 | USA | GOLD | 46 |
30 | USA | SILVER | 29 |
30 | USA | COPPER | 29 |
30 | China | GOLD | 38 |
30 | China | SILVER | 27 |
… | … | … | … |
SQL solution:
SPL uses UNPIVOT function to perform a column-to-row transposition:
SELECT *
FROM OLYMPIC
UNPIVOT (
MEDALS FOR MEDAL_TYPE IN (
GOLD,SILVER,COPPER
)
)
SPL solution:
SPL A.pivot() function works with @r option to perform a column-to-row transposition:
A | |
---|---|
1 | =T("Olympic.txt") |
2 | =A1.pivot@r(GAME,NATION; MEDAL_TYPE, MEDALS; GOLD, SILVER, COPPER) |
A1: Import Olympic medal table.
A2: A.pivot@r() function transposes columns GOLD, SILVER and COPPER into new rows.
【Example 3】Based on the following channel-based sales table, generate a new table storing information by date. Below is part of the source table:
YEAR | MONTH | ONLINE | STORE |
---|---|---|---|
2020 | 1 | 2440 | 3746.2 |
2020 | 2 | 1863.4 | 448.0 |
2020 | 3 | 1813.0 | 624.8 |
2020 | 4 | 670.8 | 2464.8 |
2020 | 5 | 3730.0 | 724.5 |
… | … | … | … |
Below is the result of expected layout:
CATEGORY | 1 | 2 | 3 | … |
---|---|---|---|---|
ONLINE | 2440 | 1863.4 | 1813.0 | … |
STORE | 3746.2 | 448.0 | 624.8 | … |
SQL solution:
We need to perform both row-to-column transposition and column-to-row transposition to get this done. First, we perform column-to-row transposition to transform channel types into values under CATEGORY field:
YEAR | MONTH | CATEGORY | AMOUNT |
---|---|---|---|
2020 | 1 | ONLINE | 2440 |
2020 | 1 | STORE | 3746.2 |
2020 | 2 | ONLINE | 1863.4 |
2020 | 2 | STORE | 448.0 |
… | … | … | … |
Then we perform row-to-column transposition to convert MONTH values into column names. The complete SQL query is as follows:
SELECT *
FROM (
SELECT *
FROM MONTH_SALES
UNPIVOT (
AMOUNT FOR CATEGORY IN (
"ONLINE",STORE
)
)
WHERE YEAR=2020
)
PIVOT (
MAX(AMOUNT) FOR MONTH
IN (
1 AS "1",2 AS "2",2 AS "3",
4 AS "4",5 AS "5",6 AS "6",
7 AS "7",8 AS "8",9 AS "9",
10 AS "10",11 AS "11",12 AS "12"
)
)
SPL does not support using a non-constant expression as PIVOT/UNPIVOT value, so all months need to be enumerated for the row-to-column transposition.
SPL solution:
According to the natural logic, SPL handles the task using A.pivot@r() and A.pivot() respectively for column-to-row transposition and row-to-column transposition:
A | |
---|---|
1 | =T("MonthSales.csv").select(YEAR:2020) |
2 | =A1.pivot@r(YEAR,MONTH; CATEGORY, AMOUNT) |
3 | =A2.pivot(CATEGORY; MONTH, AMOUNT) |
A1: Import MonthSales table and select records of the year 2020.
A2: A.pivot@r() performs column-to-row transposition to convert channel types into values of CATEGORY field.
A3: A.pivot () performs row-to-column transposition to transform MONTH field values into column names.
Both SQL and SPL handle simple transposition scenarios well. The issue is that the real-world situations cannot always dealt with using the mode of grouping & aggregation plus PIVOT. In the subsequent essays in our transposition series, we will introduce how the two languages handle complicated static transpositions and dynamic transpositions.
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code