-
Notifications
You must be signed in to change notification settings - Fork 339
Comparison of SQL and SPL:Dynamic Transposition
Data transposition aims to convert queried data into a specified format to display using front-end applications, such as reporting tools. There are row-to-column transposition, column-to-row transposition, and more complicated dynamic transposition. This essay focuses on solutions and basic principles of SQL and SPL, the two commonly used programming languages, in handling transposition scenarios, and tries to find the convenient and efficient way for you through sample programs in SQL and SPL.
Dynamic row-to-column transposition cannot specify to-be-generated fields in advance but can only define them on an ad hoc basis according to the original fields.
【Example 1】Based on the following employee table, calculate the average salary in different regions for each department. Below is part of the source table:
ID | NAME | SURNAME | STATE | DEPT | SALARY |
---|---|---|---|---|---|
1 | Rebecca | Moore | California | R&D | 7000 |
2 | Ashley | Wilson | New York | Finance | 11000 |
3 | Rachel | Johnson | New Mexico | Sales | 9000 |
4 | Emily | Smith | Texas | HR | 7000 |
5 | Ashley | Smith | Texas | R&D | 16000 |
… | … | … | … | … | … |
Below is the result of the expected layout:
DEPT | California | Colorado | Florida | … |
---|---|---|---|---|
Administration | 9333.333 | … | ||
Finance | 8000 | 5000 | 10000 | … |
HR | 10000 | 7000 | … | |
… | … | … | … | … |
SQL solution:
Field names in the result set are extracted from STATE field values in the original table. As SQL does not allow using a non-constant expression as the PIVOT/UNPIVOT value, PIVOT cannot be used to handle the dynamic row-to-column transposition. SQL also does not support returning a dynamic data structure for achieving the dynamic row-to-column transposition.
To deal with this task using the database, we can turn to the stored procedure to create a dynamic SQL statement. Since what we focus on is the comparison of SQL and SPL, we just skip the stored procedure solution.
SPL solution:
This row-to-column transposition gets its target fields from the original table. SPL offers A.pivot() function to do this. When no target fields are specified, the function will automatically extract names for the target fields.
A | |
---|---|
1 | =T("Employee.csv") |
2 | =A1.groups(STATE,DEPT;avg(SALARY):AVG_SALARY) |
3 | =A2.pivot(DEPT; STATE, AVG_SALARY) |
A1: Import the Employee table.
A2: Group the table by STATE and DEPT to calculate the average salary in each state for each department.
A3: A.pivot() function is used to perform the row-to-column transposition. It will automatically extract names for the target fields if they are not specified.
【Example 2】Based on the following income details table, get the income information of each employee with categories generated automatically. Below is part of the source table:
NAME | SOURCE | INCOME |
---|---|---|
David | Salary | 8000 |
David | Bonus | 15000 |
Daniel | Salary | 9000 |
Andrew | Shares | 26000 |
Andrew | Sales | 23000 |
… | … | … |
Below is the result of the expected layout:
NAME | SOURCE1 | INCOME1 | SOURCE2 | INCOME2 | … |
---|---|---|---|---|---|
Andrew | Shares | 26000 | Sales | 23000 | … |
Daniel | Salary | 9000 | … | ||
David | Salary | 8000 | Bonus | 15000 | … |
Robert | Bonus | 13000 | … | ||
… | … | … | … | … | … |
SQL solution:
In this case, column names need to be dynamically generated through calculation according to the original field values. The number of columns, even the column names, cannot be defined in advance. SQL cannot implement this type of dynamic row-to-column transposition.
SPL solution:
The target fields are not extracted from a specific field dynamically but need to be calculated dynamically. Here A.pivot() function is not suitable. We can first generate the target data structure according to the group with the most income categories and then populate data into it.
A | |
---|---|
1 | =T("Income.txt").group(NAME) |
2 | =A1.max(~.len()) |
3 | =create(NAME, ${A2.("SOURCE"/~/", INCOME"/~).concat@c()}) |
4 | >A1.(A3.record(~.NAME | ~.conj([SOURCE, INCOME]))) |
A1: Import the Income table and group it by name.
A2: Get the largest number of members among the groups, which is the largest number of income categories.
A3: Create an empty table by generating column names dynamically according to A3’s result.
A4: Loop through each group to populate the names, income sources, and amounts to A3’s empty table.
【Example 3】Based on the following ORDERS table, ORDER_DETAIL table, and PRODUCT table, generate a table recording information of product purchased each day by each customer in the year 2014. The relationship between ORDERS table and ORDER_DETAIL table is one to many. Each order corresponds to multiple order detail records. The relationship between ORDER_DETAIL table and PRODUCT table is many to one. The ID field of ORDER_DETAIL table points to the ID field of PRODUCT table. Below are parts of the three tables: ORDERS:
ID | CUSTOMERID | EMPLOYEEID | ORDER_DATE | ARRIVAL_DATE |
---|---|---|---|---|
10248 | VINET | 5 | 2012/07/04 | 2012/08/01 |
10249 | TOMSP | 6 | 2012/07/05 | 2012/08/16 |
10250 | HANAR | 4 | 2012/07/08 | 2012/08/05 |
10251 | VICTE | 3 | 2012/07/08 | 2012/08/05 |
10252 | SUPRD | 4 | 2012/07/09 | 2012/08/06 |
… | … | … | … | … |
ORDER_DETAIL:
ID | ORDER_NUMBER | PRODUCTID | PRICE | COUNT | DISCOUNT |
---|---|---|---|---|---|
10814 | 1 | 48 | 102.0 | 8 | 0.15 |
10814 | 2 | 48 | 102.0 | 8 | 0.15 |
10814 | 3 | 48 | 306.0 | 24 | 0.15 |
10814 | 4 | 48 | 102.0 | 8 | 0.15 |
10814 | 5 | 48 | 204.0 | 16 | 0.15 |
… | … | … | … | … | … |
PRODUCT:
ID | NAME | SUPPLIERID | CATEGORY |
---|---|---|---|
1 | Apple Juice | 2 | 1 |
2 | Milk | 1 | 1 |
3 | Tomato sauce | 1 | 2 |
4 | Salt | 2 | 2 |
5 | Sesame oil | 2 | 2 |
… | … | … | … |
Below is result of the expected layout
ORDER_DATE | CUSTOMERID | PRODUCT1 | COUNT1 | PRODUCT2 | COUNT2 | … |
---|---|---|---|---|---|---|
2014/1/5 | VICTE | Corn flakes | 8 | |||
2014/1/23 | CONSH | Chicken | 3 | Cake | 9 | |
… | … | … | … | … | … | … |
SQL solution:
SQL cannot implement this type of dynamic row-to-column transposition, too.
SPL solution:
Join up the three source tables through their associated relationships and do the subsequent calculations using the methods for handling the previous two examples.
A | |
---|---|
1 | =T("Orders.txt").select(year(ORDER_DATE)==2014) |
2 | =T("Product.txt") |
3 | =T("OrderDetail.txt").switch(PRODUCTID,A2:ID).group(ID) |
4 | =join(A1:ORDERS,ID;A3:DETAIL,ID) |
5 | =create(DATE,CUSTOMERID,${A4.max(DETAIL.len()).("PRODUCT"/~/","/"COUNT"/~).concat@c()}) |
6 | >A4.run(A5.record([ORDERS.ORDER_DATE,ORDERS.CUSTOMERID]|DETAIL.([PRODUCTID.NAME,COUNT]).conj())) |
A1: Import Orders table and select records of the year 2014.
A2: Import Product table.
A3: Import OrderDetail table and join it with Product table through PRODUCTID field; replace PRODCUTID field values with corresponding Product records; and then group the joining result by order ID.
A4: Join up Orders table with OrderDetail table through order ID.
A5: Generate target data structure according to the group containing the most OrderDetail records and create an empty table.
A6: Loop through A4’s table to populate data into A5’s empty table in turn.
Actually, this case is essentially the same as the previous two cases. It just has one more step to join up the three tables together. The steps after that are the same.
Through the three essays about the comparison of SQL and SPL in handling transposition, we can see that SQL’s static transposition methods PIVOT and UNPIVOT have limited applications and are supported only by certain database products. The SQL query will often be too complicated when it tries to handle certain complex static transposition scenarios. Moreover, SQL lacks a standard method of dealing with them. For dynamic transpositions, SQL just cannot achieve them. Usually, it turns to the more roundabout way, like the stored procedure, to generate the dynamic SQL query.
SPL provides flexible and adaptable method for handling various complicated transposition tasks. More importantly, the language has a clear and stable logic. It will first create the target data structure and then populated the calculated result to the table.
When query is complex, SQL statement becomes even more complicated. Temporary tables and nested queries are methods it often uses. This makes it hard to write and maintain the SQL statement. SPL, by contrast, can write the logic according to the natural way of thinking and generate concise code step by step.
The SPL-driven esProc is a professional data computation engine. It is based on ordered sets and provides a complete set of set-oriented operations, making it the combination of JAVA advantages and SQL merits. Transpositions become easy and simple with SPL.
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code