How to Call a Python Program from SPL

 

Though esProc is a powerful computing engine, it is not good at handling machine learning algorithms. Python, however, is excellent in doing that. So esProc offers the YM external library to call a Python program in an esProc SPL program. That’s smart.

We’ll illustrate how to call a Python program from SPL in three aspects:

1. Standards and requirements in Python module development;

2. Interface call using ym_exec;

3. Uses of model building algorithm module.

The diagram shows relationships between the SPL program, the interface and the Python program:

undefined 


The SPL program calls ym_exec interface to pass in a parameter to Python apply()interface. And apply() calls the Python program to execute and returns the result to SPL.

1. Standards and requirements in Python module development

A. def apply(ls) interface calls and executes a Python program and returns it to SPL program.

B. The list type parameter ls functions in the same way as parameter argv in Java entry interface void main(string argv[]).
C. The return value, which is of DataFrame structure and stored in the list type variable, can be viewed in SPL.
D. Below is a sample program (demo.py) of building a Python module:

import pandas as pd

import sys

def apply(lists):

cols = ["value"]

ls = []

for x in lists:

ls.append("{}".format(x))  

 

df = pd.DataFrame(ls, columns=cols)

lls=[]

lls.append(df)

return lls

if __name__ == "__main__":

res = apply(sys.argv[1:])

print('res={}'.format(res))
Execution: python demo.py "AAA" "BBB" 1000
Output: res=[  value

0    AA

1    BBB

2    1000]

The apply() interface adds the passed-in parameter to the variable list ls, puts ls in the DataFrame structure, and then places the dataframe in the to-be-returned variable list lls. Then we test the apply() interface in Python to make sure it operates well and then we can call the it in the SPL program.


Note: Dataframe is returned in msgpack format. This requires data in same column be of same type; otherwise errors will happen in masgpack serialization and SPL wont receive the dataframe.

 

2 Interface call using ym_exec
    Format: ym_exec(pyfile, p1,p2,…)

The esProc interface function calls and executes the py file using passed-in parameters p1 and p2. The number of parameters vary according to those in apply() interface.
   This interface needs to work with esProc external library pythonCli. The external library connexts to a Python program through userconfig.xml, whose configuration will be explained later.

A. Install Python:
Download Python 3.0 to install it in, for example, c:\Program Files\raqsoft\yimming\Python37.
B. Install esProc external library:
By default the external library is installed in esProc\extlib\pythonCli. Then select pythonCli on Select external libraries tab.
undefined

C.  Configuring parameters:

Configure parameters in userconfig.xml under esProcs external library directory (esProc\extlib\pythonCli\userconfig.xml):

Parameter

Name

Description

sAppHome

C:\Program Files\raqsoft\yimming

application directory

sPythonHome

c:\Program Files\raqsoft\yimming\
Python37\python.exe

Python file

sPythonHost

localhost

IP address

iPythonScriptPort

8512

Port number

The application is the Python service-side application:
undefined 

After all configuration is done, restart esProc to employ the ym_exec() interface.

 To call demo.py, for example:


A

1

=ym_env()

2

=ym_exec("d:/demo.py",  false, 12345, 10737418240, 123.45, decimal(1234567890123456), "aaa 123")

3

>ym_close(A1)

Result:


value

1

False

2

12345

3

10737418240

4

123.45

5

1234567890123456

6

aaa 123

 

3. Uses of model building algorithm module

To call a Python Partial Least Squares algorithm (PLS, which esProc deosnt offer) in SPL, first you need to install Yimming External Library. Configuration guide can be found in SPL Smart Modeling and Scoring.

The PLS algorithm contains complex parameters. We specify the invocation format to make it convenient:

ym_exec(pyfile, data, jsonstr)

The SPL program calls and executes pyfile; data is the table sequence for which model is built; the algorithms many parameters will be written in JSON strings and represented by parameter jsonstr. Make sure the parameters correspond to those in pyfiles apply() interface handling to be correctly parsed.

 

dataName of a data file over which scoring is to be performed or that has column headers. It includes the column where the target variable (target) settles.

jsonstr: JSON strings. For example:
{target:0,n_components:3,deflation_mode:'regression',

                mode:'A',norm_y_weights:False,

                scale:False,algorithm:'nipals',

                max_iter:500,tol:0.000001,copy:True}
target, which must not be absent, specifies the column holding the target variable.

    

SPL script (pls_demo.dfx):


A

B

1

=ym_env()


2

="d:/script/pls_demo.py"


3

=file("d:/script/data_test.csv").import@cqt()

//Data file

4

{target:0,n_components:3,deflation_mode:'regression',

                mode:'A',norm_y_weights:False

}

//The first column is the target variable and parameters are written in JSON format

5

=ym_exec(A2, A3, A4)


6

>ym_close(A2)


 

The data file (data_test.csv) where the first column is the target variable:

0

1

2

3

4

5

6

7

8

9

181.6

-0.00182

-0.00796

-0.00748

-0.00286

0.004846

0.015545

0.028104

0.039865

0.046408

154.5

-0.00102

-0.00789

-0.00795

-0.00361

0.004065

0.015055

0.028321

0.041063

0.048227

195

0.001206

-0.00464

-0.00404

0.000681

0.008794

0.020834

0.036321

0.051656

0.059063

150.8

-0.00154

-0.00802

-0.00768

-0.0028

0.00554

0.01712

0.03072

0.043453

0.050239










 

A sample of coding Python algorithm module (Take pls_demo.py file for example)

from scipy.linalg import pinv2

import numpy as np

import pandas as pd

import demjson

 

#algorithm class pls_demo

class pls_demo():

. . . . . . .

Pass

 

#interface implementation 

def apply(lists):

    if len(lists)<2:

        return None

    

    data = lists[0] #data parameter

    val =  lists[1] #jsonstr string parameter

    if (type(data).__name__ =="str"):

        data = pd.read_csv(data)

 

#1. Handle special values in JSON strings

    #print(val)

    val = val.lower().replace("false", "'False'")

    val = val.replace("true", "'True'")

    val = val.replace("none", "'None'")

    dic = demjson.decode(val)

    if dic.__contains__('target') ==False:

        print("param target is not set")

        return

    #2. Handle parameter target that is either column count or column name

    targ = dic['target']

    if type(targ).__name__ == "int":

        col = data.columns

        colname = col.tolist()[targ]

    else:

        colname = targ

    Y = data[colname]               

    X = data.drop(colname, axis=1)

         

    # 3. Handle model building parameters, during which defaults should be set for those without passed-in values

    if dic.__contains__('n_components') :n_components=dic['n_components']

    else: n_components=15

    if dic.__contains__('deflation_mode') :deflation_mode=dic['deflation_mode']

    else: deflation_mode="regression"

    if dic.__contains__('mode'):mode=dic['mode']   

else: mode="A"

…….

    # 4. Load algorithm module

    #print("n_components={}".format(n_components))

    pls_model = pls_demo(n_components,

                       deflation_mode,

                       mode,…)

 

    # Training data

    pls_model.fit(X, Y)

 

    # Scoring

    y_pred = pls_model.predict(X)

    

#5. Append return value

    f = ["value"]

    df = pd.DataFrame(y_pred, columns=f)

    #print(y_pred)

    lls=[]

    lls.append(df)

    return lls


#6. Test

if __name__ == '__main__':

    ls = []

    ls.append("a2ef764c53ec1fbc_X.new.csv")

val = "{target:0,n_components:3,deflation_mode:'regression'," \

      "mode:'a',norm_y_weights:False," \

      "scale:False,algorithm:'nipals'," \

      "max_iter:500,tol:0.000001,copy:True}"

 

ls.append(val)

apply(ls)