Skip to content

Comparison of SQL and SPL:Complicated Static Transposition

esProcSPL edited this page May 6, 2024 · 2 revisions

【Abstract】 Data transposition aims to convert queried data into a specified format to display using front-end applications, such as reporting tools. There are row-to-column transposition, column-to-row transposition, and more complicated dynamic transposition. This essay focuses on solutions and basic principles of SQL and SPL, the two commonly used programming languages, in handling transposition scenarios, and tries to find the convenient and efficient way for you through sample programs in SQL and SPL.


Let’s take a look at how SQL and SPL handle complicated static transposition scenarios.

1. Multirow-to-Multirow transposition

【Example 1】Based on the following punch-in data table, generate a new table recording the daily activities for each employee. Each person corresponds to seven records per day:

PER_CODE IN_OUT DATE TIME TYPE
1110263 1 2013-10-11 09:17:14 In
1110263 6 2013-10-11 11:37:00 Break
1110263 5 2013-10-11 11:38:21 Return
1110263 0 2013-10-11 11:43:21 NULL
1110263 6 2013-10-11 13:21:30 Break
1110263 5 2013-10-11 14:25:58 Return
1110263 2 2013-10-11 18:28:55 Out

Below is the result of the expected layout:

PER_CODE DATE IN OUT BREAK RETURN
1110263 2013-10-11 9:17:14 18:28:55 11:37:00 11:38:21
1110263 2013-10-11 9:17:14 18:28:55 13:21:30 14:25:58

SQL solution:

Here multiple records should be combined to do the calculation and return multiple records according to the specified structure. We sort records every 7 rows by PER_CODE and DATE, get TIME field values from rows numbered [1,7,2,3] in turn from each group and make them first group of values under IN, OUT, BREAK, and RETURN fields, and then retrieve TIME field values from rows numbered [1,7,5,6] in turn from each group and make them the second groups of values under these fields. With databases that do not support PIVOT/UNPIVOT, SQL has the following query:

WITH CTE1 AS (
    SELECT 
        PER_CODE,IN_OUT,"DATE","TIME",TYPE, 
        MOD(ROWNUM-1,7)+1 GROUP_ORDER 
    FROM DAILY_TIME 
    ORDER BY PER_CODE,"DATE","TIME"
)
SELECT * FROM (
    SELECT 
        T_IN.PER_CODE,T_IN."DATE",T_IN."IN",T_OUT."OUT",
        T_BREAK.BREAK,T_RETURN."RETURN" 
    FROM (
        SELECT PER_CODE,"DATE","TIME" "IN"
        FROM CTE1 
        WHERE GROUP_ORDER=1
    ) T_IN
    LEFT JOIN (
        SELECT PER_CODE,"DATE","TIME" "OUT"
        FROM CTE1 
        WHERE GROUP_ORDER=7
    ) T_OUT
    ON T_IN.PER_CODE=T_OUT.PER_CODE 
        AND T_IN."DATE"=T_OUT."DATE"
    LEFT JOIN (
        SELECT PER_CODE,"DATE","TIME" BREAK
        FROM CTE1 
        WHERE GROUP_ORDER=2
    ) T_BREAK
    ON T_IN.PER_CODE=T_BREAK.PER_CODE 
        AND T_IN."DATE"=T_BREAK."DATE"
    LEFT JOIN (
        SELECT PER_CODE,"DATE","TIME" "RETURN"
        FROM CTE1 
        WHERE GROUP_ORDER=3
    ) T_RETURN
    ON T_IN.PER_CODE=T_RETURN.PER_CODE 
        AND T_IN."DATE"=T_RETURN."DATE"
    UNION ALL (
        SELECT 
        T_IN.PER_CODE,T_IN."DATE",T_IN."IN",T_OUT."OUT",
        T_BREAK.BREAK,T_RETURN."RETURN" 
        FROM (
        SELECT PER_CODE,"DATE","TIME" "IN"
        FROM CTE1 
        WHERE GROUP_ORDER=1
    ) T_IN
    LEFT JOIN (
        SELECT PER_CODE,"DATE","TIME" "OUT"
        FROM CTE1 
        WHERE GROUP_ORDER=7
    ) T_OUT
    ON T_IN.PER_CODE=T_OUT.PER_CODE 
        AND T_IN."DATE"=T_OUT."DATE"
    LEFT JOIN (
        SELECT PER_CODE,"DATE","TIME" BREAK
        FROM CTE1 
        WHERE GROUP_ORDER=5
    ) T_BREAK
    ON T_IN.PER_CODE=T_BREAK.PER_CODE 
        AND T_IN."DATE"=T_BREAK."DATE"
    LEFT JOIN (
        SELECT PER_CODE,"DATE","TIME" "RETURN"
        FROM CTE1 
        WHERE GROUP_ORDER=6
    ) T_RETURN
    ON T_IN.PER_CODE=T_RETURN.PER_CODE 
        AND T_IN."DATE"=T_RETURN."DATE"
    )
)
ORDER BY PER_CODE,"DATE",BREAK

The query is hard to understand. Take ORACLE 11g as an example, we use PIVOT to simplify the LEFT JOIN part. Then the SQL query is as follows:

WITH CTE1 AS (
    SELECT 
        PER_CODE,IN_OUT,"DATE","TIME",TYPE,
        MOD(ROWNUM-1,7)+1 GROUP_ORDER 
    FROM DAILY_TIME 
    ORDER BY PER_CODE,"DATE","TIME"
)
SELECT * 
FROM (
    SELECT * 
    FROM (
        SELECT 
        PER_CODE,"DATE","TIME",GROUP_ORDER
        FROM CTE1
        WHERE GROUP_ORDER IN (1,7,2,3)
    )
    PIVOT(
        MIN("TIME") FOR GROUP_ORDER 
        IN (1 AS "IN",7 AS "OUT",2 AS BREAK,3 AS "RETURN")
    )
    UNION ALL
    (
        SELECT * 
        FROM (
        SELECT 
            PER_CODE,"DATE","TIME",GROUP_ORDER
        FROM CTE1
        WHERE GROUP_ORDER IN (1,7,5,6)
        )
        PIVOT(
        MIN("TIME") FOR GROUP_ORDER 
        IN (1 AS "IN",7 AS "OUT",5 AS BREAK,6 AS "RETURN")
        )
    )
)
ORDER BY PER_CODE,"DATE",BREAK

SPL solution:

Though the table structure after transposition is definite, it is still complicated to get this done using A.pivot() function. As an alternative, we create the target data structure instead and then populate data to it.

A
1 =create(PER_CODE,DATE,IN,OUT,BREAK,RETURN)
2 =T("DailyTime.txt").sort(PER_CODE,DATE,TIME)
3 =A2.group((#-1)\7).(~([1,7,2,3,1,7,5,6]))
4 >A1.record(A3.conj([~.PER_CODE,~.DATE]|~.(TIME).m([1,2,3,4])|[~.PER_CODE,~.DATE]|~.(TIME).m([5,6,7,8])))

A1: Create an empty table according to the specified data structure.

A2: Import the daily punch-in records and sort them by employee code and date.

A3: Group A2’s records every seven rows and, for each group, return records by retrieving values in the order of [1,7,2,3,1,7,5,6].

A4: Concatenate all returned records according to the target order and insert them to A3’s table.

Let’s examine the SPL solution. First we create an empty table according to the target structure, sort original records by grouping them every seven rows, and in each group, get time values from rows [1,7,2,3,1,7,5,6] in turn for the future two records, and finally, populate values to the empty table in sequence. The SQL query is extremely complicated though it uses PIVOT function. PIVOT is not a suitable method for handling this case. Yet SQL has problems in dealing with the task using SPL’s way. SQL’s grouping operation cannot return the post-grouping subsets, which makes it unable to perform further computations on the subset as SPL does. A SQL set is unordered. The language can generate sequence numbers for members in each group through row numbers, but it is inconvenient for it to access members in turn using multiple sequence numbers. SPL, however, can do that.

2. Inter-row calculation during row-to-column transposition

【Example 2】Based on the following user payment detail table, generate a new table storing the payable amount per month for each user in the year 2014. Below is part of the source table:

ID CUSTOMERID NAME UNPAID ORDER_DATE
112101 C013 CA 12800 2014/02/21
112102 C013 CA 3500 2014/06/15
112103 C013 CA 2600 2015/03/21

Below is the result of the following layout:

NAME 1 2 3 4 5 6 7 8 9 10 11 12
CA 12800 12800 12800 12800 3500 3500 3500 3500 3500 3500 3500

SQL solution:

It is impossible for SQL to implement the general method of creating the target table structure and then populating data to the structure. The language needs to work out a solution for each case. In the last step, PIVOT function can be used to transpose month values into column names. Before that, we need to prepare the original data according to the following format:

NAME ORDER_MONTH UNPAID
CA 1
CA 2 12800
CA 3 12800
CA 4 12800
CA 5 12800
CA 6 3500
CA 7 3500

Yet in the original table, not each month has records. What we expect is a table recording the monthly payable amounts for each user in the year 2014. We perform a cross product on the list of unique customers who have the payment records and the months from January to December to generate a table of the target structure. Then we left join this table with the user payment detail table to get the monthly payable amount for each user. Now the table is as follows:

NAME ORDER_MONTH UNPAID
CA 1
CA 2 12800
CA 3
CA 4
CA 5
CA 6 3500
CA 7

There is more to do as we expect to populate January’s payable amount, 12,800, to months from March to May, and for customer CA, we want to insert June’s payable amount 3500 to months after July. To do those, we can use the subquery, where we select the previous records containing the payable amounts for a customer if the current month does not have a payable amount and get the one with the latest month. Below is the complete SQL query:

WITH CTE1 AS(
    SELECT T1.NAME,T1.ORDER_MONTH,T2.UNPAID 
    FROM (
        SELECT * 
        FROM (
        SELECT DISTINCT CUSTOMERID, NAME 
        FROM PAYMENT 
        WHERE EXTRACT (YEAR FROM ORDER_DATE)=2014
        )
        CROSS JOIN (
        SELECT 1 ORDER_MONTH FROM DUAL
        UNION ALL SELECT 2 ORDER_MONTH FROM DUAL
        UNION ALL SELECT 3 ORDER_MONTH FROM DUAL
        UNION ALL SELECT 4 ORDER_MONTH FROM DUAL
        UNION ALL SELECT 5 ORDER_MONTH FROM DUAL
        UNION ALL SELECT 6 ORDER_MONTH FROM DUAL
        UNION ALL SELECT 7 ORDER_MONTH FROM DUAL
        UNION ALL SELECT 8 ORDER_MONTH FROM DUAL
        UNION ALL SELECT 9 ORDER_MONTH FROM DUAL
        UNION ALL SELECT 10 ORDER_MONTH FROM DUAL
        UNION ALL SELECT 11 ORDER_MONTH FROM DUAL
        UNION ALL SELECT 12 ORDER_MONTH FROM DUAL
        )
    ) T1
    LEFT JOIN (
        SELECT 
        CUSTOMERID, NAME, 
        EXTRACT (MONTH FROM ORDER_DATE) ORDER_MONTH, UNPAID 
        FROM PAYMENT
        WHERE EXTRACT (YEAR FROM ORDER_DATE)=2014
    ) T2
    ON T1.NAME=T2.NAME AND 
        T1.ORDER_MONTH=T2.ORDER_MONTH
    ORDER BY NAME,ORDER_MONTH
),
CTE2 AS (
    SELECT 
        T1.NAME,T1.ORDER_MONTH,
        NVL(T1.UNPAID, 
        (
            SELECT 
                MIN(UNPAID) KEEP (DENSE_RANK FIRST ORDER BY ORDER_MONTH DESC) 
            FROM CTE1 T2
            WHERE T1.NAME=T2.NAME AND
                T2.UNPAID>0 AND
                T2.ORDER_MONTH<T1.ORDER_MONTH
        )
        ) UNPAID
    FROM CTE1 T1
    ORDER BY T1.NAME,T1.ORDER_MONTH
)
SELECT * 
FROM CTE2
PIVOT(
    MIN(UNPAID) FOR ORDER_MONTH 
    IN (
        1 AS "1",2 AS "2",2 AS "3",
        4 AS "4",5 AS "5",6 AS "6",
        7 AS "7",8 AS "8",9 AS "9",
        10 AS "10",11 AS "11",12 AS "12"
    )
)

The SQL query is too complicated to understand even though we have given a detailed explanation. Now let’s look at how SPL handles the task:

SPL solution:

It is complicated to do this with A.pivot() function. According to the logic in the previous example, we create a target data structure and then populate data to it.

A
1 =create(NAME,${12.string()})
2 =T("Payment1.txt").select(year(ORDER_DATE)==2014).group(CUSTOMERID)
3 >A2.((m12=12.(null),~.(m12(month(ORDER_DATE))=UNPAID), m12.(~=ifn(~,~[-1])),A1.record(NAME|m12)))

A1: Create an empty table of the target data structure.

A2: Import the user payment table containing records of the year 2014 and sort it by customer ID.

A3: Loop through each group and for each member in a group calculate the payable amount per month, and then insert the results to A1’s table along with customer names.

The SPL script is concise and, more importantly, clear in logic. For a complicated static transposition task, we can first create the target data structure and then populate data to it. As the SPL set is ordered, it is convenient to perform an inter-row calculation.

3. Present data in horizontal column groups

Presenting data in horizontal column groups is often used for data visualization, where data having same type of attributes is displayed in groups of columns for convenient viewing. Let’s see how SQL and SPL handle this type of transposition through an example.

【Example 3】List names and salaries of employees whose get paid over 10,000 in both sales department and R&D department. Below is part of the employee table:

ID NAME SURNAME STATE DEPT SALARY
1 Rebecca Moore California R&D 7000
2 Ashley Wilson New York Finance 11000
3 Rachel Johnson New Mexico Sales 9000
4 Emily Smith Texas HR 7000
5 Ashley Smith Texas R&D 16000

Below is result of the expected layout:

SALESNAME SALARY RDNAME SALARY
Madeline 15000 Ashley 16000
Jacob 12000 Jacob 16000
Andrew 12000 Ryan 13000

SQL solution:

First, we perform conditional filtering according to the condition that a record has the department value of sales or R&D and the salary value above 10,000. To display data in groups of columns, we need to get the rows numbers for members in each group and then perform a full join by matching row numbers. Below is the SQL query:

SELECT
    T1.NAME SALESNAME, T1.SALARY, T2.NAME RDNAME, T2.SALARY
FROM (
    SELECT
        NAME,SALARY, ROW_NUMBER()OVER (ORDER BY SALARY DESC) NO
    FROM EMPLOYEE
    WHERE DEPT='Sales' AND SALARY >10000
) T1 
FULL JOIN (
    SELECT
        NAME,SALARY, ROW_NUMBER()OVER (ORDER BY SALARY DESC) NO
    FROM EMPLOYEE
    WHERE DEPT='R&D' AND SALARY >10000
) T2
ON T1.NO=T2.NO

SPL solution:

As it handles dynamic transpositions, SPL handles this type of transposition by creating the target structure first and then populating data to it. Below is the SPL script:

A
1 =T("Employee.csv").select(SALARY >10000).sort@z(SALARY)
2 =A1.select(DEPT:"Sales")
3 =A1.select(DEPT:"R&D")
4 =create('SALESNAME',SALARY,'RDNAME', SALARY)
5 =A4.paste(A2.(NAME),A2.(SALARY),A3.(NAME),A3.(SALARY))

A1: Import Employee data table, select records where salaries are above 10,000, and sort them by salary in descending order.

A2: Get records of sales department.

A3: Get records of R&D department.

A4: Create an empty table of the target data structure.

A5: Use A.paste() function to paste result values to corresponding columns.

We can see that SQL’s static transposition methods PIVOT and UNPIVOT have limited applications and are supported only by certain database products. The SQL query will often be too complicated when it tries to handle certain complex static transposition scenarios. Moreover, SQL lacks a standard method of dealing with them.

SPL provides a flexible and adaptable method for handling various complicated transposition tasks. More importantly, the language has a clear and stable logic. It will first create the target data structure and then populated the calculated result to the table.

In the next essay in the transposition series, we will introduce how SQL and SPL handle dynamic transpositions.

Clone this wiki locally