Skip to content

Comparison of SQL and SPL:Static Transposition

esProcSPL edited this page May 6, 2024 · 1 revision

【Abstract】

Data transposition aims to convert queried data into a specified format to display using front-end applications, such as reporting tools. There are row-to-column transposition, column-to-row transposition, and more complicated dynamic transposition. This essay focuses on solutions and basic principles of SQL and SPL, the two commonly used programming languages, in handling transposition scenarios, and tries to find the convenient and efficient way for you through sample programs in SQL and SPL.


A static transposition can define the data structure after data is transposed beforehand and the structure does not change accordingly as data is changed.

1. Row-to-column transposition

As the name shows, row-to-column transposition aims to convert row values into column names to transfer rows to columns. In practice, PIVOT is often used after the grouping and aggregation, which gets unique values from the row values under the to-be-transposed columns and then arrange the row values into column names. The role of PIVOT is to group and put the aggregates in a specific column into multiple columns for an intuitive representation.

【Example 1】Based on the following scores table, find the highest score of each subject in each class and present the results in columns. Below is part of the source table:

CLASS STUDENTID SUBJECT SCORE
1 1 English 84
1 1 Math 77
1 1 PE 69
1 2 English 81
1 2 Math 80

Below is the result of expected layout:

CLASS MAX_MATH MAX_ENGLISH MAX_PE
1 97 96 97
2 97 96 97

SQL solution:

SQL PIVOT is used to achieve row-to-column transposition and column-to-row transposition. But only the relatively new versions of certain database products support this method. ORACLE, for instance, supports it from the 11g version, and some databases, such as MYSQL, still uses the subquery to do the grouping and aggregation and then a left join to achieve the transposition and have not given any support to it. In this case, we use ORACLE 11g to write the SQL query:

SELECT * 
FROM (
    SELECT 
        CLASS, SUBJECT, SCORE 
    FROM SCORES
)
PIVOT (
    MAX(SCORE) FOR SUBJECT 
    IN (
        'Math' AS MAX_MATH,
        'English' AS MAX_ENGLISH,
        'PE' AS MAX_PE
    )
)

SPL solution:

SPL offers A.pivot() function to perform transposition, row-to-column by default.

A
1 =T("Scores.csv")
2 =A1.groups(CLASS,SUBJECT; max(SCORE):MAX_SCORE)
3 =A2.pivot(CLASS; SUBJECT, MAX_SCORE; "Math":"MAX_MATH", "English":"MAX_ENGLISH", "PE":"MAX_PE")

A1: Import Scores table from the source file.

A2: Group A1’s table by class and subject and calculate the highest score of each subject in each class.

A3: A.pivot() function transposes the row-wise highest scores into column names.

SPL supports retrieving a data table from the database, too. A1 in the above script can be modified as:

A
1 =connect("db").query("SELECT * FROM SCORES")

2. Column-to-row transposition

Contrary to row-to-column transposition, the column-to-row transposition converts each field into a new row, where the new field value comes from the original field name or alias, and transforms the original field values into field values of a new field.

【Example 2】Based on the following Olympic medal table, generate a new table recording information for each type of medal. Below is part of the source table:

Game Nation Gold Silver Copper
30 USA 46 29 29
30 China 38 27 23
30 UK 29 17 19
30 Russia 24 26 32
30 Korea 13 8 7

Below is the result of expected layout:

GAME NATION MEDAL_TYPE MEDALS
30 USA GOLD 46
30 USA SILVER 29
30 USA COPPER 29
30 China GOLD 38
30 China SILVER 27

SQL solution:

SPL uses UNPIVOT function to perform a column-to-row transposition:

SELECT * 
FROM OLYMPIC
UNPIVOT (
    MEDALS FOR MEDAL_TYPE IN (
        GOLD,SILVER,COPPER
    )
)

SPL solution:

SPL A.pivot() function works with @r option to perform a column-to-row transposition:

A
1 =T("Olympic.txt")
2 =A1.pivot@r(GAME,NATION; MEDAL_TYPE, MEDALS; GOLD, SILVER, COPPER)

A1: Import Olympic medal table.

A2: A.pivot@r() function transposes columns GOLD, SILVER and COPPER into new rows.

3. Scenarios containing both types of transpositions

【Example 3】Based on the following channel-based sales table, generate a new table storing information by date. Below is part of the source table:

YEAR MONTH ONLINE STORE
2020 1 2440 3746.2
2020 2 1863.4 448.0
2020 3 1813.0 624.8
2020 4 670.8 2464.8
2020 5 3730.0 724.5

Below is the result of expected layout:

CATEGORY 1 2 3
ONLINE 2440 1863.4 1813.0
STORE 3746.2 448.0 624.8

SQL solution:

We need to perform both row-to-column transposition and column-to-row transposition to get this done. First, we perform column-to-row transposition to transform channel types into values under CATEGORY field:

YEAR MONTH CATEGORY AMOUNT
2020 1 ONLINE 2440
2020 1 STORE 3746.2
2020 2 ONLINE 1863.4
2020 2 STORE 448.0

Then we perform row-to-column transposition to convert MONTH values into column names. The complete SQL query is as follows:

SELECT * 
FROM (
    SELECT * 
    FROM MONTH_SALES
    UNPIVOT (
        AMOUNT FOR CATEGORY IN (
        "ONLINE",STORE
        )
    )
    WHERE YEAR=2020
)
PIVOT (
    MAX(AMOUNT) FOR MONTH 
    IN (
        1 AS "1",2 AS "2",2 AS "3",
        4 AS "4",5 AS "5",6 AS "6",
        7 AS "7",8 AS "8",9 AS "9",
        10 AS "10",11 AS "11",12 AS "12"
    )
)

SPL does not support using a non-constant expression as PIVOT/UNPIVOT value, so all months need to be enumerated for the row-to-column transposition.

SPL solution:

According to the natural logic, SPL handles the task using A.pivot@r() and A.pivot() respectively for column-to-row transposition and row-to-column transposition:

A
1 =T("MonthSales.csv").select(YEAR:2020)
2 =A1.pivot@r(YEAR,MONTH; CATEGORY, AMOUNT)
3 =A2.pivot(CATEGORY; MONTH, AMOUNT)

A1: Import MonthSales table and select records of the year 2020.

A2: A.pivot@r() performs column-to-row transposition to convert channel types into values of CATEGORY field.

A3: A.pivot () performs row-to-column transposition to transform MONTH field values into column names.

Both SQL and SPL handle simple transposition scenarios well. The issue is that the real-world situations cannot always dealt with using the mode of grouping & aggregation plus PIVOT. In the subsequent essays in our transposition series, we will introduce how the two languages handle complicated static transpositions and dynamic transpositions.

Clone this wiki locally