-
Notifications
You must be signed in to change notification settings - Fork 340
How to Call a Python Program from SPL
Though esProc is a powerful computing engine, it is not good at handling machine learning algorithms. Python, however, is excellent in doing that. So esProc offers the YM external library to call a Python program in an esProc SPL program. That’s smart.
We’ll illustrate how to call a Python program from SPL in three aspects:
1. Standards and requirements in Python module development;
2. Interface call using ym_exec;
3. Uses of model building algorithm module.
The diagram shows relationships between the SPL program, the interface and the Python program:
The SPL program calls ym_exec interface to pass in a parameter to Python apply() interface. And apply() calls the Python program to execute and returns the result to SPL.
A. def apply(ls) interface calls and executes a Python program and returns it to SPL program.
B. The list type parameter ls functions in the same way as parameter argv in Java entry interface void main(string argv[]).
C. The return value, which is of DataFrame structure and stored in the list type variable, can be viewed in SPL.
D. Below is a sample program (demo.py) of building a Python module:
import pandas as pd
import sys
def apply(lists):
cols = ["value"]
ls = []
for x in lists:
ls.append("{}".format(x))
df = pd.DataFrame(ls, columns=cols)
lls=[]
lls.append(df)
return lls
if __name__ == "__main__":
res = apply(sys.argv[1:])
print('res={}'.format(res))
Execution: python demo.py "AAA" "BBB" 1000
Output:
res=[ value
0 AA
1 BBB
2 1000]
The apply() interface adds the passed-in parameter to the variable list ls, puts ls in the DataFrame structure, and then places the dataframe in the to-be-returned variable list lls. Then we test the apply() interface in Python to make sure it operates well and then we can call the it in the SPL program.
Note: Dataframe is returned in msgpack format. This requires data in same column be of same type; otherwise errors will happen in masgpack serialization and SPL won’t receive the dataframe.
Format: ym_exec(pyfile, p1,p2,…)
The esProc interface function calls and executes the py file using passed-in parameters p1 and p2. The number of parameters vary according to those in apply() interface.
This interface needs to work with esProc external library pythonCli. The external library connexts to a Python program through userconfig.xml, whose configuration will be explained later.
A. Install Python:
Download Python 3.0 to install it in, for example, c:\Program Files\raqsoft\yimming\Python37.
B. Install esProc external library:
By default the external library is installed in esProc\extlib\pythonCli. Then select pythonCli on Select external libraries tab.
C. Configuring parameters:
Configure parameters in userconfig.xml under esProc’s external library directory (esProc\extlib\pythonCli\userconfig.xml):
Parameter | Name | Description |
---|---|---|
sAppHome | C:\Program Files\raqsoft\yimming | application directory |
sPythonHome | c:\Program Files\raqsoft\yimming\Python37\python.exe | Python file |
sPythonHost | localhost | IP address |
iPythonScriptPort | 8512 | Port number |
The application is the Python service-side application:
After all configuration is done, restart esProc to employ the ym_exec() interface.
To call demo.py, for example:
A | |
---|---|
1 | =ym_env() |
2 | =ym_exec("d:/demo.py", false, 12345, 10737418240, 123.45, decimal(1234567890123456), "aaa 123") |
3 | >ym_close(A1) |
Result:
value | |
---|---|
1 | False |
2 | 12345 |
3 | 10737418240 |
4 | 123.45 |
5 | 1234567890123456 |
6 | aaa 123 |
To call a Python Partial Least Squares algorithm (PLS, which esProc deosn’t offer) in SPL, first you need to install Yimming External Library. Configuration guide can be found in SPL Smart Modeling and Scoring.
The PLS algorithm contains complex parameters. We specify the invocation format to make it convenient:
ym_exec(pyfile, data, jsonstr)
The SPL program calls and executes pyfile; data is the table sequence for which model is built; the algorithm’s many parameters will be written in JSON strings and represented by parameter jsonstr. Make sure the parameters correspond to those in pyfile’s apply() interface handling to be correctly parsed.
data: Name of a data file over which scoring is to be performed or that has column headers. It includes the column where the target variable (target) settles.
jsonstr: JSON strings. For example:
{target:0,n_components:3,deflation_mode:'regression',
mode:'A',norm_y_weights:False,
scale:False,algorithm:'nipals',
max_iter:500,tol:0.000001,copy:True}
target, which must not be absent, specifies the column holding the target variable.
SPL script (pls_demo.dfx):
A | B | |
---|---|---|
1 | =ym_env() | |
2 | ="d:/script/pls_demo.py" | |
3 | =file("d:/script/data_test.csv").import@cqt() | //Data file |
4 | {target:0,n_components:3,deflation_mode:'regression', mode:'A',norm_y_weights:False } |
//The first column is the target variable and parameters are written in JSON format |
5 | =ym_exec(A2, A3, A4) | |
6 | >ym_close(A2) |
The data file (data_test.csv) where the first column is the target variable:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|
181.6 | -0.00182 | -0.00796 | -0.00748 | -0.00286 | 0.004846 | 0.015545 | 0.028104 | 0.039865 | 0.046408 |
154.5 | -0.00102 | -0.00789 | -0.00795 | -0.00361 | 0.004065 | 0.015055 | 0.028321 | 0.041063 | 0.048227 |
195 | 0.001206 | -0.00464 | -0.00404 | 0.000681 | 0.008794 | 0.020834 | 0.036321 | 0.051656 | 0.059063 |
150.8 | -0.00154 | -0.00802 | -0.00768 | -0.0028 | 0.00554 | 0.01712 | 0.03072 | 0.043453 | 0.050239 |
… |
A sample of coding Python algorithm module (Take pls_demo.py file for example)
from scipy.linalg import pinv2
import numpy as np
import pandas as pd
import demjson
#algorithm class pls_demo:
class pls_demo():
. . . . . . .
Pass
#interface implementation
def apply(lists):
if len(lists)<2:
return None
data = lists[0] #data parameter
val = lists[1] #jsonstr string parameter
if (type(data).__name__ =="str"):
data = pd.read_csv(data)
#1. Handle special values in JSON strings
#print(val)
val = val.lower().replace("false", "'False'")
val = val.replace("true", "'True'")
val = val.replace("none", "'None'")
dic = demjson.decode(val)
if dic.__contains__('target') ==False:
print("param target is not set")
return
#2. Handle parameter *target* that is either column count or column name
targ = dic['target']
if type(targ).__name__ == "int":
col = data.columns
colname = col.tolist()[targ]
else:
colname = targ
Y = data[colname]
X = data.drop(colname, axis=1)
# 3. Handle model building parameters, during which defaults should be set for those without passed-in values
if dic.__contains__('n_components') :n_components=dic['n_components']
else: n_components=15
if dic.__contains__('deflation_mode') :deflation_mode=dic['deflation_mode']
else: deflation_mode="regression"
if dic.__contains__('mode'):mode=dic['mode']
else: mode="A"
…….
# 4. Load algorithm module
#print("n_components={}".format( n_components))
pls_model = pls_demo(n_components,
deflation_mode,
mode,…)
# Training data
pls_model.fit(X, Y)
# Scoring
y_pred = pls_model.predict(X)
#5. Append return value
f = ["value"]
df = pd.DataFrame(y_pred, columns=f)
#print(y_pred)
lls=[]
lls.append(df)
return lls
#6. Test
if __name__ == '__main__':
ls = []
ls.append("a2ef764c53ec1fbc_X.new.csv")
val = "{target:0,n_components:3,deflation_mode:'regression'," \
" mode:'a',norm_y_weights:False," \
" scale:False,algorithm:'nipals'," \
" max_iter:500,tol:0.000001,copy:True}"
ls.append(val)
apply(ls)
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code