-
Notifications
You must be signed in to change notification settings - Fork 339
Positioning function and positioning calculation
Filtering is a very basic operation, which is to find a subset of members from a set that meet certain conditions. There are WHERE and HAVING in SQL to implement filtering operations (strictly speaking, due to that SQL lacks discreteness, the result of WHERE operation is a replica of the subset of the original data table). SPL also has a select function to implement operations similar to WHERE in SQL (SPL that supports discreteness returns the true subset).
For unordered sets, the filtering operation only needs to find members that meet the conditions, and there is no need for any other related actions. But when considering ordered sets, we sometimes also care about the positions of these members that meet the conditions.
For example, the closing price table of a certain stock sorted by date, we want to know how many days it took for the stock price to rise to over 100 dollar.
It is meaningless to use filtering operation to find records with stock prices exceeding 100 dollar (i.e. members that meet the conditions). What we are actually concerned about is the position of the first member in the ordered set that satisfies the condition.
SQL would be quite cumbersome in dealing with such tasks, as it cannot directly find it through filtering. Instead, it is necessary to create a sequence number and calculate the minimum value of the corresponding sequence number for records with a closing price of over 100 dollar. The thinking is a bit “convoluted”.
SELECT MIN(NO)
FROM (SELECT ROWNUMBER() OVER ( ORDER BY TradeDate ) NO, ClosingPrice FROM Stock)
WHERE ClosingPrice>100
SPL enhances support for ordered sets and provides a pselect function to return the positions of members that meet the conditions.
Stock.select( CloseingPrice > 100 ) // Records with closing price greater than 100
Stock.pselect@a( ClosingPirce > 100 ) // The set of sequence numbers for records with a closing price greater than 100
Stock.pselect( ClosingPirce > 100 ) // The sequence number of the first record with a closing price greater than 100
The earlier task is very easy to implement.
This function that returns the position of certain members in an ordered set is called a positioning function in SPL, and correspondingly, the function that returns a subset of members is called a selection function, which is the filtering function. In fact, in SPL, the selection function is defined based on the positioning function, which means there is pselect function first and then select function, and the pselect function is more fundamental. All sets in SPL are ordered, and the position of members is a very basic information.
These positions have business significance. Let’s modify the earlier calculation requirements to calculate the increase in stock price when it rises above 100 dollar for the first time.
This task involves ordered computation and unconventional aggregation, and implementing it in SQL would be much more convoluted:
WITH T AS (
SELECT ROWNUMBER() OVER ( ORDER BY TradeDate ) NO, ClosingPrice,
ClosingPrice - LAG(ClosingPrice) OVER ( ORDER BY TradeDate) Rising
FROM Stock)
SELECT Rising FROM T WHERE NO = ( SELECT MIN(NO) FROM T WHERE ClosingPrice>100 )
This requires calculating the daily increase, and the intermediate table generated by CTE syntax needs to be traversed twice.
If the set is ordered, we only need to find the position of the first record with a closing price greater than 100 dollar, and then subtract the closing price of its previous record. This is a natural idea. Writing it in SPL is very simple:
p=Stock.pselect( ClosingPrice > 100 )
return Stock(p).ClosingPrice - stock(p-1).ClosingPrice
The second statement here uses positioning information to extract the closing price from the table, requiring two references to the data table. We have mentioned that SPL's Lambda syntax can easily implement adjacent references, but Lambda syntax can only be used in operations of looping sets and is not suitable for situations where a specific position is calculated.
For this purpose, SPL provides the positioning calculation function calc, which can use Lambda syntax for specified positions in ordered sets:
p=Stock.pselect( ClosingPrice > 100 )
return Stock.calc(p,ClosingPrice - ClosingPrice[-1) )
This way, the readability is better.
As we mentioned earlier, unconventional aggregation may return the records where the maximum or minimum value are located in the data table, which is equivalent to returning members or subsets of a set (there may be multiple records with the maximum or minimum value). It can also be considered as a selection function to some extent. So, should there also be a corresponding positioning function?
Yes, there are also pmax and pmin functions in SPL that are used to return the position of the maximum and minimum values in an ordered set. Similarly, the maxp and minp functions are also defined based on pmax and pmin.
Change the earlier task, now we want to calculate the increase in stock price on the highest day. It is easy to write using this positioning function:
p=Stock.pmax( ClosingPrice )
return Stock.calc(p,ClosingPrice - ClosingPrice[-1) )
SQL processing this task is still quite cumbersome (assuming the highest stock price is unique, otherwise it would be even more troublesome):
WITH T AS (
SELECT ClosingPrice,
ClosingPrice - LAG(ClosingPrice) OVER ( ORDER BY TradeDate) Rising
FROM Stock)
SELECT Rising FROM T WHERE ClosingPrice = ( SELECT MAX(ClosingPrice) FROM T )
The top function that returns the member set can also be understood as a selection function, and there is also a corresponding positioning function ptop.
Calculate the average increase in stock price for the highest three days:
p=Stock.ptop( -3; ClosingPrice )
return Stock.calc(p,ClosingPrice - ClosingPrice[-1) ).avg()
We won't write SQL here, and you can think for yourself.
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code