teradataml makes available to Python users a collection of analytic functions that reside on Teradata Vantage. This allows users to perform analytics on Teradata Vantage with no SQL coding. In addition, the teradataml library provides functions for scaling data manipulation and transformation, data filtering and sub-setting, and can be used in conjunction with other open-source python libraries.
For community support, please visit the Teradata Community.
For Teradata customer support, please visit Teradata Support.
Copyright 2024, Teradata. All Rights Reserved.
-
teradataml no longer supports setting the
auth_token
usingset_config_params()
. Users should useset_auth_token()
to set the token. -
-
- New Function
alias()
- Creates a DataFrame with alias name.
- New Properties
db_object_name
- Get the underlying database object name, on which DataFrame is created.
- New Function
-
- New Function
alias()
- Creates a GeoDataFrame with alias name.
- New Function
-
- Arithmetic Functions
DataFrameColumn.isnan()
- Function evaluates expression to determine if the floating-point argument is a NaN (Not-a-Number) value.DataFrameColumn.isinf()
- Function evaluates expression to determine if the floating-point argument is an infinite number.DataFrameColumn.isfinite()
- Function evaluates expression to determine if it is a finite floating value.
- Arithmetic Functions
-
- FeatureStore Components
- Feature - Represents a feature which is used in ML Modeling.
- Entity - Represents the columns which serves as uniqueness for the data used in ML Modeling.
- DataSource - Represents the source of Data.
- FeatureGroup - Collection of Feature, Entity and DataSource.
- Methods
apply()
- Adds Feature, Entity, DataSource to a FeatureGroup.from_DataFrame()
- Creates a FeatureGroup from teradataml DataFrame.from_query()
- Creates a FeatureGroup using a SQL query.remove()
- Removes Feature, Entity, or DataSource from a FeatureGroup.reset_labels()
- Removes the labels assigned to the FeatureGroup, that are set usingset_labels()
.set_labels()
- Sets the Features as labels for a FeatureGroup.
- Properties
features
- Get the features of a FeatureGroup.labels
- Get the labels of FeatureGroup.
- Methods
- FeatureStore
- Methods
apply()
- Adds Feature, Entity, DataSource, FeatureGroup to FeatureStore.archive_data_source()
- Archives a specified DataSource from a FeatureStore.archive_entity()
- Archives a specified Entity from a FeatureStore.archive_feature()
- Archives a specified Feature from a FeatureStore.archive_feature_group()
- Archives a specified FeatureGroup from a FeatureStore. Method archives underlying Feature, Entity, DataSource also.delete_data_source()
- Deletes an archived DataSource.delete_entity()
- Deletes an archived Entity.delete_feature()
- Deletes an archived Feature.delete_feature_group()
- Deletes an archived FeatureGroup.get_data_source()
- Get the DataSources associated with FeatureStore.get_dataset()
- Get the teradataml DataFrame based on Features, Entities and DataSource from FeatureGroup.get_entity()
- Get the Entity associated with FeatureStore.get_feature()
- Get the Feature associated with FeatureStore.get_feature_group()
- Get the FeatureGroup associated with FeatureStore.list_data_sources()
- List DataSources.list_entities()
- List Entities.list_feature_groups()
- List FeatureGroups.list_features()
- List Features.list_repos()
- List available repos which are configured for FeatureStore.repair()
- Repairs the underlying FeatureStore schema on database.set_features_active()
- Marks the Features as active.set_features_inactive()
- Marks the Features as inactive.setup()
- Setup the FeatureStore for a repo.
- Property
repo
- Property for FeatureStore repo.grant
- Property to Grant access on FeatureStore to user.revoke
- Property to Revoke access on FeatureStore from user.
- Methods
- FeatureStore Components
-
Image2Matrix()
- Converts an image into a matrix.
-
-
New Analytics Database Analytic Functions:
CFilter()
NaiveBayes()
TDNaiveBayesPredict()
Shap()
SMOTE()
-
- New Unbounded Array Framework(UAF) Functions:
CopyArt()
- New Unbounded Array Framework(UAF) Functions:
-
-
- Vantage File Management Functions
list_files()
- List the installed files in Database.
- Vantage File Management Functions
-
- teradataml adds support for lightGBM package through
OpensourceML
(OpenML
) feature. The following functionality is added in the current release:td_lightgbm
- Interface object to run lightgbm functions and classes through Teradata Vantage. Example usage below:from teradataml import td_lightgbm, DataFrame df_train = DataFrame("multi_model_classification") feature_columns = ["col1", "col2", "col3", "col4"] label_columns = ["label"] part_columns = ["partition_column_1", "partition_column_2"] df_x = df_train.select(feature_columns) df_y = df_train.select(label_columns) # Dataset creation. # Single model case. obj_s = td_lightgbm.Dataset(df_x, df_y, silent=True, free_raw_data=False) # Multi model case. obj_m = td_lightgbm.Dataset(df_x, df_y, free_raw_data=False, partition_columns=part_columns) obj_m_v = td_lightgbm.Dataset(df_x, df_y, free_raw_data=False, partition_columns=part_columns) ## Model training. # Single model case. opt = td_lightgbm.train(params={}, train_set = obj_s, num_boost_round=30) opt.predict(data=df_x, num_iteration=20, pred_contrib=True) # Multi model case. opt = td_lightgbm.train(params={}, train_set = obj_m, num_boost_round=30, callbacks=[td_lightgbm.record_evaluation(rec)], valid_sets=[obj_m_v, obj_m_v]) # Passing `label` argument to get it returned in output DataFrame. opt.predict(data=df_x, label=df_y, num_iteration=20)
- Added support for accessing scikit-learn APIs using exposed inteface object
td_lightgbm
.
Refer Teradata Python Package User Guide for more details of this feature, arguments, usage, examples and supportability in Vantage.
- teradataml adds support for lightGBM package through
-
register()
- Registers a user defined function (UDF).call_udf()
- Calls a registered user defined function (UDF) and returns ColumnExpression.list_udfs()
- List all the UDFs registered using 'register()' function.deregister()
- Deregisters a user defined function (UDF).
-
- Configuration Options
table_operator
- Specifies the name of table operator.
- Configuration Options
-
-
-
set_auth_token()
- Addedbase_url
parameter which accepts the CCP url. 'ues_url' will be deprecated in future and users will need to specify 'base_url' instead.
-
join()
- Now supports compound ColumExpression having more than one binary operator in
on
argument. - Now supports ColumExpression containing FunctionExpression(s) in
on
argument. - self-join now expects aliased DataFrame in
other
argument.
- Now supports compound ColumExpression having more than one binary operator in
-
join()
- Now supports compound ColumExpression having more than one binary operator in
on
argument. - Now supports ColumExpression containing FunctionExpression(s) in
on
argument. - self-join now expects aliased DataFrame in
other
argument.
- Now supports compound ColumExpression having more than one binary operator in
-
SAX()
- Default value added forwindow_size
andoutput_frequency
.DickeyFuller()
- Supports TDAnalyticResult as input.
- Default value added for
max_lags
. - Removed parameter
drift_trend_formula
. - Updated permitted values for
algorithm
.
-
AutoML
,AutoRegressor
andAutoClassifier
- Now supports DECIMAL datatype as input.
-
TextParser()
- Argument name
covert_to_lowercase
changed toconvert_to_lowercase
.
- Argument name
-
-
db_list_tables()
now returns correct results when '%' is used.
-
teradataml will no longer be supported with SQLAlchemy < 2.0.
-
teradataml no longer shows the warnings from Vantage by default.
- Users should set
display.suppress_vantage_runtime_warnings
toFalse
to display warnings.
- Users should set
-
-
- New Analytics Database Analytic Functions:
TFIDF()
Pivoting()
UnPivoting()
- New Unbounded Array Framework(UAF) Functions:
AutoArima()
DWT()
DWT2D()
FilterFactory1d()
IDWT()
IDWT2D()
IQR()
Matrix2Image()
SAX()
WindowDFFT()
- New Analytics Database Analytic Functions:
-
udf()
- Creates a user defined function (UDF) and returns ColumnExpression.set_session_param()
is added to set the database session parameters.unset_session_param()
is added to unset database session parameters.
-
materialize()
- Persists DataFrame into database for current session.create_temp_view()
- Creates a temporary view for session on the DataFrame.
-
- Date Time Functions
DataFrameColumn.to_timestamp()
- Converts string or integer value to a TIMESTAMP data type or TIMESTAMP WITH TIME ZONE data type.DataFrameColumn.extract()
- Extracts date component to a numeric value.DataFrameColumn.to_interval()
- Converts a numeric value or string value into an INTERVAL_DAY_TO_SECOND or INTERVAL_YEAR_TO_MONTH value.
- String Functions
DataFrameColumn.parse_url()
- Extracts a part from a URL.
- Arithmetic Functions
DataFrameColumn.log
- Returns the logarithm value of the column with respect to 'base'.
- Date Time Functions
-
- New methods added for
AutoML()
,AutoRegressor()
andAutoClassifier()
:evaluate()
- Performs evaluation on the data using the best model or the model of users choice from the leaderboard.load()
: Loads the saved model from database.deploy()
: Saves the trained model inside database.remove_saved_model()
: Removes the saved model in database.model_hyperparameters()
: Returns the hyperparameter of fitted or loaded models.
- New methods added for
-
-
-
AutoML()
,AutoRegressor()
- New performance metrics added for task type regression i.e., "MAPE", "MPE", "ME", "EV", "MPD" and "MGD".
AutoML()
,AutoRegressor()
andAutoClassifier
- New arguments added:
volatile
,persist
. predict()
- Data input is now mandatory for generating predictions. Default model evaluation is now removed.
- New arguments added:
-
DataFrameColumn.cast()
: Accepts 2 new argumentsformat
andtimezone
. -
DataFrame.assign()
: Accepts ColumnExpressions returned byudf()
. -
set_config_params()
- Following arguments will be deprecated in the future:
ues_url
auth_token
- Following arguments will be deprecated in the future:
-
to_pandas()
- Function returns the pandas dataframe with Decimal columns types as float instead of object. If user want datatype to be object, set argumentcoerce_float
to False.
-
list_td_reserved_keywords()
- Accepts a list of strings as argument.
-
ACF()
-round_results
parameter removed as it was used for internal testing.BreuschGodfrey()
- Added default_value 0.05 for parametersignificance_level
.GoldfeldQuandt()
-- Removed parameters
weights
andformula
. Replaced parameterorig_regr_paramcnt
withconst_term
. Changed description for parameteralgorithm
. Please refer document for more details. - Note: This will break backward compatibility.
- Removed parameters
HoltWintersForecaster()
- Default value of parameterseasonal_periods
removed.IDFFT2()
- Removed parameteroutput_fmt_row_major
as it is used for internal testing.Resample()
- Added parameteroutput_fmt_index_style
.
-
-
- KNN
predict()
function can now predict on test data which does not contain target column. - Metrics functions are supported on the Lake system.
- The following OpensourceML functions from different sklearn modules in single model case are fixed.
sklearn.ensemble
:- ExtraTreesClassifier -
apply()
- ExtraTreesRegressor -
apply()
- RandomForestClassifier -
apply()
- RandomForestRegressor -
apply()
- ExtraTreesClassifier -
sklearn.impute
:- SimpleImputer -
transform()
,fit_transform()
,inverse_transform()
- MissingIndicator -
transform()
,fit_transform()
- SimpleImputer -
sklearn.kernel_approximations
:- Nystroem -
transform()
,fit_transform()
- PolynomialCountSketch -
transform()
,fit_transform()
- RBFSampler -
transform()
,fit_transform()
- Nystroem -
sklearn.neighbors
:- KNeighborsTransformer -
transform()
,fit_transform()
- RadiusNeighborsTransformer -
transform()
,fit_transform()
- KNeighborsTransformer -
sklearn.preprocessing
:- KernelCenterer -
transform()
- OneHotEncoder -
transform()
,inverse_transform()
- KernelCenterer -
- The following OpensourceML functions from different sklearn modules in multi model case are fixed.
sklearn.feature_selection
:- SelectFpr -
transform()
,fit_transform()
,inverse_transform()
- SelectFdr -
transform()
,fit_transform()
,inverse_transform()
- SelectFromModel -
transform()
,fit_transform()
,inverse_transform()
- SelectFwe -
transform()
,fit_transform()
,inverse_transform()
- RFECV -
transform()
,fit_transform()
,inverse_transform()
- SelectFpr -
sklearn.clustering
:- Birch -
transform()
,fit_transform()
- Birch -
- OpensourceML returns teradataml objects for model attributes and functions instead of sklearn
objects so that the user can perform further operations like
score()
,predict()
etc on top of the returned objects. - AutoML
predict()
function now generates correct ROC-AUC value for positive class. deploy()
method ofScript
andApply
classes retries model deployment if there is any intermittent network issues.
- KNN
-
teradataml no longer supports Python versions less than 3.8.
-
-
set_auth_token()
- teradataml now supports authentication via PAT in addition to OAuth 2.0 Device Authorization Grant (formerly known as the Device Flow).- It accepts UES URL, Personal AccessToken (PAT) and Private Key file generated from VantageCloud Lake Console
and optional argument
username
andexpiration_time
in seconds.
- It accepts UES URL, Personal AccessToken (PAT) and Private Key file generated from VantageCloud Lake Console
and optional argument
-
-
-
ANOVA()
- New arguments added:
group_name_column
,group_value_name
,group_names
,num_groups
for data containing group values and group names.
- New arguments added:
FTest()
- New arguments added:
sample_name_column
,sample_name_value
,first_sample_name
,second_sample_name
.
- New arguments added:
GLM()
- Supports stepwise regression and accept new arguments
stepwise_direction
,max_steps_num
andinitial_stepwise_columns
. - New arguments added:
attribute_data
,parameter_data
,iteration_mode
andpartition_column
.
- Supports stepwise regression and accept new arguments
GetFutileColumns()
- Arguments
category_summary_column
andthreshold_value
are now optional.
- Arguments
KMeans()
- New argument added:
initialcentroids_method
.
- New argument added:
NonLinearCombineFit()
- Argument
result_column
is now optional.
- Argument
ROC()
- Argument
positive_class
is now optional.
- Argument
SVMPredict()
- New argument added:
model_type
.
- New argument added:
ScaleFit()
- New arguments added:
ignoreinvalid_locationscale
,unused_attributes
,attribute_name_column
,attribute_value_column
. - Arguments
attribute_name_column
,attribute_value_column
andtarget_attributes
are supported for sparse input. - Arguments
attribute_data
,parameter_data
andpartition_column
are supported for partitioning.
- New arguments added:
ScaleTransform()
- New arguments added:
attribute_name_column
andattribute_value_column
support for sparse input.
- New arguments added:
TDGLMPredict()
- New arguments added:
family
andpartition_column
.
- New arguments added:
XGBoost()
- New argument
base_score
is added for initial prediction value for all data points.
- New argument
XGBoostPredict()
- New argument
detailed
is added for detailed information of each prediction.
- New argument
ZTest()
- New arguments added:
sample_name_column
,sample_value_column
,first_sample_name
andsecond_sample_name
.
- New arguments added:
-
AutoML()
,AutoRegressor()
andAutoClassifier()
- New argument
max_models
is added as an early stopping criterion to limit the maximum number of models to be trained.
- New argument
-
DataFrame.agg()
- Accepts ColumnExpressions and list of ColumnExpressions as arguments.
-
- Data Transfer Utility
fastload()
- Improved error and warning table handling with below-mentioned new arguments.err_staging_db
err_tbl_name
warn_tbl_name
err_tbl_1_suffix
err_tbl_2_suffix
fastload()
- Change in behaviour ofsave_errors
argument. Whensave_errors
is set toTrue
, error information will be available in two persistent tablesERR_1
andERR_2
. Whensave_errors
is set toFalse
, error information will be available in single pandas dataframe.
- Garbage collector location is now configurable. User can set configure.local_storage to a desired location.
- Data Transfer Utility
-
-
- UAF functions now work if the database name has special characters.
- OpensourceML can now read and process NULL/nan values.
- Boolean values output will now be returned as VARBYTE column with 0 or 1 values in OpensourceML.
- Fixed bug for
Apply
'sdeploy()
. - Issue with volatile table creation is fixed where it is created in the right database, i.e., user's spool space, regardless of the temp database specified.
ColumnTransformer
function now processes its arguments in the order they are passed.
-
-
OpenML
dynamically exposes opensource packages through Teradata Vantage.OpenML
provides an interface object through which exposed classes and functions of opensource packages can be accessed with the same syntax and arguments. The following functionality is added in the current release:td_sklearn
- Interface object to run scikit-learn functions and classes through Teradata Vantage. Example usage below:from teradataml import td_sklearn, DataFrame df_train = DataFrame("multi_model_classification") feature_columns = ["col1", "col2", "col3", "col4"] label_columns = ["label"] part_columns = ["partition_column_1", "partition_column_2"] linear_svc = td_sklearn.LinearSVC()
OpenML
is supported in both Teradata Vantage Enterprise and Teradata Vantage Lake.- Argument Support:
Use of X and y arguments
- Scikit-learn users are familiar with usingX
andy
as argument names which take data as pandas DataFrames, numpy arrays or lists etc. However, in OpenML, we pass teradataml DataFrames for argumentsX
andy
.df_x = df_train.select(feature_columns) df_y = df_train.select(label_columns) linear_svc = linear_svc.fit(X=df_x, y=df_y)
Additional support for data, feature_columns, label_columns and group_columns arguments
- Apart from traditional arguments, OpenML supports additional arguments -data
,feature_columns
,label_columns
andgroup_columns
. These are used as alternatives toX
,y
andgroups
.linear_svc = linear_svc.fit(data=df_train, feature_columns=feature_columns, label_colums=label_columns)
Support for classification and regression metrics
- Metrics functions for classification and regression insklearn.metrics
module are supported. Other metrics functions' support will be added in future releases.Distributed Modeling and partition_columns argument support
- Existing scikit-learn supports only single model generation. However, OpenML supports both single model use case and distributed (multi) model use case. For this, user has to additionally passpartition_columns
argument to existingfit()
,predict()
or any other function to be run. This will generate multiple models for multiple partitions, using the data in corresponding partition.df_x_1 = df_train.select(feature_columns + part_columns) linear_svc = linear_svc.fit(X=df_x_1, y=df_y, partition_columns=part_columns)
Support for load and deploy models
- OpenML provides additional support for saving (deploying) the trained models. These models can be loaded later to perform operations like prediction, score etc. The following functions are provided by OpenML:<obj>.deploy()
- Used to deploy/save the model created and/or trained by OpenML.td_sklearn.deploy()
- Used to deploy/save the model created and/or trained outside teradataml.td_sklearn.load()
- Used to load the saved models.
Refer Teradata Python Package User Guide for more details of this feature, arguments, usage, examples and supportability in both VantageCloud Enterprise and VantageCloud Lake. -
AutoML is an approach to automate the process of building, training, and validating machine learning models. It involves automation of various aspects of the machine learning workflow, such as feature exploration, feature engineering, data preparation, model training and evaluation for given dataset. teradataml AutoML feature offers best model identification, model leaderboard generation, parallel execution, early stopping feature, model evaluation, model prediction, live logging, customization on default process.
AutoML
AutoML is a generic algorithm that supports all three tasks, i.e. 'Regression', 'Binary Classification' and 'Multiclass Classification'.- Methods of AutoML
__init__()
- Instantiate an object of AutoML with given parameters.fit()
- Perform fit on specified data and target column.leaderboard()
- Get the leaderboard for the AutoML. Presents diverse models, feature selection method, and performance metrics.leader()
- Show best performing model and its details such as feature selection method, and performance metrics.predict()
- Perform prediction on the data using the best model or the model of users choice from the leaderboard.generate_custom_config()
- Generate custom config JSON file required for customized run of AutoML.
- Methods of AutoML
AutoRegressor
AutoRegressor is a special purpose AutoML feature to run regression specific tasks.- Methods of AutoRegressor
__init__()
- Instantiate an object of AutoRegressor with given parameters.fit()
- Perform fit on specified data and target column.leaderboard()
- Get the leaderboard for the AutoRegressor. Presents diverse models, feature selection method, and performance metrics.leader()
- Show best performing model and its details such as feature selection method, and performance metrics.predict()
- Perform prediction on the data using the best model or the model of users choice from the leaderboard.generate_custom_config()
- Generate custom config JSON file required for customized run of AutoRegressor.
- Methods of AutoRegressor
AutoClassifier
AutoClassifier is a special purpose AutoML feature to run classification specific tasks.- Methods of AutoClassifier
__init__()
- Instantiate an object of AutoClassifier with given parameters.fit()
- Perform fit on specified data and target column.leaderboard()
- Get the leaderboard for the AutoClassifier. Presents diverse models, feature selection method, and performance metrics.leader()
- Show best performing model and its details such as feature selection method, and performance metrics.predict()
- Perform prediction on the data using the best model or the model of users choice from the leaderboard.generate_custom_config()
- Generate custom config JSON file required for customized run of AutoClassifier.
- Methods of AutoClassifier
-
fillna
- Replace the null values in a column with the value specified.- Data Manipulation
cube()
- Analyzes data by grouping it into multiple dimensions.rollup()
- Analyzes a set of data across a single dimension with more than one level of detail.replace()
- Replaces the values for columns.
-
deploy()
- Function deploys the model, generated afterexecute_script()
, in database or user environment in lake. The function is available in both Script and Apply.
-
fillna
- Replaces every occurrence of null value in column with the value specified.
-
-
- Date Time Functions
DataFrameColumn.week_start()
- Returns the first date or timestamp of the week that begins immediately before the specified date or timestamp value in a column as a literal.DataFrameColumn.week_begin()
- It is an alias forDataFrameColumn.week_start()
function.DataFrameColumn.week_end()
- Returns the last date or timestamp of the week that ends immediately after the specified date or timestamp value in a column as a literal.DataFrameColumn.month_start()
- Returns the first date or timestamp of the month that begins immediately before the specified date or timestamp value in a column or as a literal.DataFrameColumn.month_begin()
- It is an alias forDataFrameColumn.month_start()
function.DataFrameColumn.month_end()
- Returns the last date or timestamp of the month that ends immediately after the specified date or timestamp value in a column or as a literal.DataFrameColumn.year_start()
- Returns the first date or timestamp of the year that begins immediately before the specified date or timestamp value in a column or as a literal.DataFrameColumn.year_begin()
- It is an alias forDataFrameColumn.year_start()
function.DataFrameColumn.year_end()
- Returns the last date or timestamp of the year that ends immediately after the specified date or timestamp value in a column or as a literal.DataFrameColumn.quarter_start()
- Returns the first date or timestamp of the quarter that begins immediately before the specified date or timestamp value in a column as a literal.DataFrameColumn.quarter_begin()
- It is an alias forDataFrameColumn.quarter_start()
function.DataFrameColumn.quarter_end()
- Returns the last date or timestamp of the quarter that ends immediately after the specified date or timestamp value in a column as a literal.DataFrameColumn.last_sunday()
- Returns the date or timestamp of Sunday that falls immediately before the specified date or timestamp value in a column as a literal.DataFrameColumn.last_monday()
- Returns the date or timestamp of Monday that falls immediately before the specified date or timestamp value in a column as a literal.DataFrameColumn.last_tuesday()
- Returns the date or timestamp of Tuesday that falls immediately before the specified date or timestamp value in a column as a literal.DataFrameColumn.last_wednesday()
- Returns the date or timestamp of Wednesday that falls immediately before specified date or timestamp value in a column as a literal.DataFrameColumn.last_thursday()
- Returns the date or timestamp of Thursday that falls immediately before specified date or timestamp value in a column as a literal.DataFrameColumn.last_friday()
- Returns the date or timestamp of Friday that falls immediately before specified date or timestamp value in a column as a literal.DataFrameColumn.last_saturday()
- Returns the date or timestamp of Saturday that falls immediately before specified date or timestamp value in a column as a literal.DataFrameColumn.day_of_week()
- Returns the number of days from the beginning of the week to the specified date or timestamp value in a column as a literal.DataFrameColumn.day_of_month()
- Returns the number of days from the beginning of the month to the specified date or timestamp value in a column as a literal.DataFrameColumn.day_of_year()
- Returns the number of days from the beginning of the year to the specified date or timestamp value in a column as a literal.DataFrameColumn.day_of_calendar()
- Returns the number of days from the beginning of the business calendar to the specified date or timestamp value in a column as a literal.DataFrameColumn.week_of_month()
- Returns the number of weeks from the beginning of the month to the specified date or timestamp value in a column as a literal.DataFrameColumn.week_of_quarter()
- Returns the number of weeks from the beginning of the quarter to the specified date or timestamp value in a column as a literal.DataFrameColumn.week_of_year()
- Returns the number of weeks from the beginning of the year to the specified date or timestamp value in a column as a literal.DataFrameColumn.week_of_calendar()
- Returns the number of weeks from the beginning of the calendar to the specified date or timestamp value in a column as a literal.DataFrameColumn.month_of_year()
- Returns the number of months from the beginning of the year to the specified date or timestamp value in a column as a literal.DataFrameColumn.month_of_calendar()
- Returns the number of months from the beginning of the calendar to the specified date or timestamp value in a column as a literal.DataFrameColumn.month_of_quarter()
- Returns the number of months from the beginning of the quarter to the specified date or timestamp value in a column as a literal.DataFrameColumn.quarter_of_year()
- Returns the number of quarters from the beginning of the year to the specified date or timestamp value in a column as a literal.DataFrameColumn.quarter_of_calendar()
- Returns the number of quarters from the beginning of the calendar to the specified date or timestamp value in a column as a literal.DataFrameColumn.year_of_calendar()
- Returns the year of the specified date or timestamp value in a column as a literal.DataFrameColumn.day_occurrence_of_month()
- Returns the nth occurrence of the weekday in the month for the date to the specified date or timestamp value in a column as a literal.DataFrameColumn.year()
- Returns the integer value for year in the specified date or timestamp value in a column as a literal.DataFrameColumn.month()
- Returns the integer value for month in the specified date or timestamp value in a column as a literal.DataFrameColumn.hour()
- Returns the integer value for hour in the specified timestamp value in a column as a literal.DataFrameColumn.minute()
- Returns the integer value for minute in the specified timestamp value in a column as a literal.DataFrameColumn.second()
- Returns the integer value for seconds in the specified timestamp value in a column as a literal.DataFrameColumn.week()
- Returns the number of weeks from the beginning of the year to the specified date or timestamp value in a column as a literal.DataFrameColumn.next_day()
- Returns the date of the first weekday specified as 'day_value' that is later than the specified date or timestamp value in a column as a literal.DataFrameColumn.months_between()
- Returns the number of months between value in specified date or timestamp value in a column as a literal and date or timestamp value in argument.DataFrameColumn.add_months()
- Adds an integer number of months to specified date or timestamp value in a column as a literal.DataFrameColumn.oadd_months()
- Adds an integer number of months, date or timestamp value in specified date or timestamp value in a column as a literal.DataFrameColumn.to_date()
- Function converts a string-like representation of a DATE or PERIOD type to Date type.
- String Functions
DataFrameColumn.concat()
- Function to concatenate the columns with a separator.DataFrameColumn.like()
- Function to match the string pattern. String match is case sensitive.DataFrameColumn.ilike()
- Function to match the string pattern. String match is not case sensitive.DataFrameColumn.substr()
- Returns the substring from a string column.DataFrameColumn.startswith()
- Function to check if the column value starts with the specified value or not.DataFrameColumn.endswith()
- Function to check if the column value ends with the specified value or not.DataFrameColumn.format()
- Function to format the values in column based on formatter.DataFrameColumn.to_char()
- Function converts numeric type or datetype to character type.DataFrameColumn.trim()
- Function trims the string values in the column.
- Regular Arithmetic Functions
DataFrameColumn.cbrt()
- Computes the cube root of values in the column.DataFrameColumn.hex()
- Computes the Hexadecimal from decimal for the values in the column.DataframeColumn.hypot()
- Computes the decimal from Hexadecimal for the values in the column.DataFrameColumn.unhex()
- computes the hypotenuse for the values between two columns.
- Bit Byte Manipulation Functions
DataFrameColumn.from_byte()
- Encodes a sequence of bits into a sequence of characters.
- Comparison Functions
DataFrameColumn.greatest()
- Returns the greatest values from columns.DataFrameColumn.least()
- Returns the least values from columns.
- Behaviour of
DataFrameColumn.replace()
is changed. - Behaviour of
DataFrameColumn.to_byte()
is changed. It now decodes a sequence of characters in a given encoding into a sequence of bits. - Behaviour of
DataFrameColumn.trunc()
is changed. It now accepts Date type columns.
- Date Time Functions
-
- Argument
url_encode
is no longer used increate_context()
and is deprecated.- Important notes
- Users do not need to encode password even if password contain special characters.
- Pass the password to the
create_context()
function argumentpassword
as it is without changing special characters.
- Important notes
fillna()
in VAL transformation allows to replace NULL values with empty string.
- Argument
-
- Support for following deprecated functionality is removed:
- ML Engine functions
- STO and APPLY sandbox feature support for testing the script.
- sandbox_container_utils is removed. Following methods can no longer be used:
setup_sandbox_env()
copy_files_from_container()
cleanup_sandbox_env()
- sandbox_container_utils is removed. Following methods can no longer be used:
- Model Cataloging APIs can no longer be used:
describe_model()
delete_model()
list_models()
publish_model()
retrieve_model()
save_model()
DataFrame.join()
- Arguments
lsuffix
andrsuffix
now add suffixes to new column names for join operation.
- Arguments
DataFrame.describe()
- New argument
columns
is added to generate statistics on only those columns instead of all applicable columns.
- New argument
DataFrame.groupby()
- Supports
CUBE
andROLLUP
with additional optional argumentoption
.
- Supports
DataFrame.column.window()
- Supports ColumnExpressions for
partition_columns
andorder_columns
arguments.
- Supports ColumnExpressions for
DataFrame.column.contains()
allows ColumnExpressions forpattern
argument.DataFrame.window()
- Supports ColumnExpressions for
partition_columns
andorder_columns
arguments.
- Supports ColumnExpressions for
- Support for following deprecated functionality is removed:
-
-
- Manage all user environments.
create_env()
:- new argument
conda_env
is added to create a conda environment.
- new argument
list_user_envs()
:- User can list conda environment(s) by using filter with new argument
conda_env
.
- User can list conda environment(s) by using filter with new argument
- Conda environment(s) can be managed using APIs for installing , updating, removing files/libraries.
- Manage all user environments.
-
-
columns
argument forFillNa
function is made optional.
-
-
ColumnExpression.nulls_first()
- Displays NULL values at first. -
ColumnExpression.nulls_last()
- Displays NULL values at last. -
Bit Byte Manipulation Functions
DataFrameColumn.bit_and()
- Returns the logical AND operation on the bits from the column and corresponding bits from the argument.DataFrameColumn.bit_get()
- Returns the bit specified by input argument from the column and returns either 0 or 1 to indicate the value of that bit.DataFrameColumn.bit_or()
- Returns the logical OR operation on the bits from the column and corresponding bits from the argument.DataFrameColumn.bit_xor()
- Returns the bitwise XOR operation on the binary representation of the column and corresponding bits from the argument.DataFrameColumn.bitand()
- It is an alias forDataFrameColumn.bit_and()
function.DataFrameColumn.bitnot()
- Returns a bitwise complement on the binary representation of the column.DataFrameColumn.bitor()
- It is an alias forDataFrameColumn.bit_or()
function.DataFrameColumn.bitwise_not()
- It is an alias forDataFrameColumn.bitnot()
function.DataFrameColumn.bitwiseNOT()
- It is an alias forDataFrameColumn.bitnot()
function.DataFrameColumn.bitxor()
- It is an alias forDataFrameColumn.bit_xor()
function.DataFrameColumn.countset()
- Returns the count of the binary bits within the column that are either set to 1 or set to 0, depending on the input argument value.DataFrameColumn.getbit()
- It is an alias forDataFrameColumn.bit_get()
function.DataFrameColumn.rotateleft()
- Returns an expression rotated to the left by the specified number of bits, with the most significant bits wrapping around to the right.DataFrameColumn.rotateright()
- Returns an expression rotated to the right by the specified number of bits, with the least significant bits wrapping around to the left.DataFrameColumn.setbit()
- Sets the value of the bit specified by input argument to the value of column.DataFrameColumn.shiftleft()
- Returns the expression when value in column is shifted by the specified number of bits to the left.DataFrameColumn.shiftright()
- Returns the expression when column expression is shifted by the specified number of bits to the right.DataFrameColumn.subbitstr()
- Extracts a bit substring from the column expression based on the specified bit position.DataFrameColumn.to_byte()
- Converts a numeric data type to the Vantage byte representation (byte value) of the column expression value.
-
Regular Expression Functions
DataFrameColumn.regexp_instr()
- Searches string value in column for a match to value specified in argument.DataFrameColumn.regexp_replace()
- Replaces the portions of string value in a column that matches the value specified regex string and replaces with the replace string.DataFrameColumn.regexp_similar()
- Compares value in column to value in argument and returns integer value.DataFrameColumn.regexp_substr()
- Extracts a substring from column that matches a regular expression specified in the input argument.
-
-
- Manage all user environments.
create_env()
:- User can create one or more user environments using newly added argument
template
by providing specifications in template json file. New feature allows user to create complete user environment, including file and library installation, in just single function call.
- User can create one or more user environments using newly added argument
- UserEnv Class – Manage individual user environment.
- Properties:
models
- Supports listing of models in user environment.
- Methods:
install_model()
- Install a model in user environment.uninstall_model()
- Uninstall a model from user environment.snapshot()
- Take the snapshot of the user environment.
- Properties:
- Manage all user environments.
-
- New Functions
DataRobotPredict()
- Score the data in Vantage using the model trained externally in datarobot and stored in Vantage.
- New Functions
-
DataFrame.describe()
- Method now accepts an argument
statistics
, which specifies the aggregate operation to be performed.
- Method now accepts an argument
DataFrame.sort()
- Method now accepts ColumnExpressions as well.
- Enables sorting using NULLS FIRST and NULLS LAST.
view_log()
downloads the Apply query logs based on query id.- Arguments which accepts floating numbers will accept integers also for
Analytics Database Analytic Functions
. - Argument
ignore_nulls
added toDataFrame.plot()
to ignore the null values while plotting the data. Dataframe.sample()
- Method supports column stratification.
-
DataFrameColumn.cast()
accepts all teradatasqlalchemy types.- Minor bug fix related to
DataFrame.merge()
.
-
-
Hyperparameter tuning is an optimization method to determine the optimal set of hyperparameters for the given dataset and learning model. teradataml hyperparameter tuning feature offers best model identification, parallel execution, early stopping feature, best data identification, model evaluation, model prediction, live logging, input data hyper-parameterization, input data sampling, numerous scoring functions, hyper-parameterization for non-model trainer functions.
GridSearch
GridSearch is an exhaustive search algorithm that covers all possible parameter values to identify optimal hyperparameters.- Methods of GridSearch
__init__()
- Instantiate an object of GridSearch for given model function and parameters.evaluate()
- Function to perform evaluation on the given teradataml DataFrame using default model.fit()
- Function to perform hyperparameter-tuning for given hyperparameters and model on teradataml DataFrame.get_error_log()
- Useful to get the error log if model execution failed, using the model identifier.get_input_data()
- Useful to get the input data using the data identifier, when input data is also parameterized.get_model()
- Returns the trained model for the given model identifier.get_parameter_grid()
- Returns the hyperparameter space used for hyperparameter optimization.is_running()
- Returns the execution status of hyperaparameter tuning.predict()
- Function to perform prediction on the given teradataml DataFrame using default model.set_model()
- Function to update the default model.
- Properties of GridSearch
best_data_id
- Returns the best data identifier used for model training.best_model
- Returns the best trained model.best_model_id
- Returns the identifier for best model.best_params_
- Returns the best set of hyperparameter.best_sampled_data_
- Returns the best sampled data used to train the best model.best_score_
- Returns the best trained model score.model_stats
- Returns the model evaluation reports.models
- Returns the metadata of all the models.
- Methods of GridSearch
RandomSearch
RandomSearch algorithm performs random sampling on hyperparameter space to identify optimal hyperparameters.- Methods of RandomSearch
__init__()
- Instantiate an object of RandomSearch for given model function and parameters.evaluate()
- Function to perform evaluation on the given teradataml DataFrame using default model.fit()
- Function to perform hyperparameter-tuning for given hyperparameters and model on teradataml DataFrame.get_error_log()
- Useful to get the error log if model execution failed, using the model identifier.get_input_data()
- Useful to get the input data using the data identifier, when input data is also parameterized.get_model()
- Returns the trained model for the given model identifier.get_parameter_grid()
- Returns the hyperparameter space used for hyperparameter optimization.is_running()
- Returns the execution status of hyperaparameter tuning.predict()
- Function to perform prediction on the given teradataml DataFrame using default model.set_model()
- Function to update the default model.
- Properties of GridSearch
best_data_id
- Returns the best data identifier used for model training.best_model
- Returns the best trained model.best_model_id
- Returns the identifier for best model.best_params_
- Returns the best set of hyperparameter.best_sampled_data_
- Returns the best sampled data used to train the best model.best_score_
- Returns the best trained model score.model_stats
- Returns the model evaluation reports.models
- Returns the metadata of all the models.
- Methods of RandomSearch
-
teradataml currently has different functions to generate a model, predict, transform and evaluate. All these functions are needed to be invoked individually, i.e., predict(), evaluate(), transform() cannot be invoked using the model trainer function output. Enhancement done to this feature now enables user to invoke these functions as methods of the model trainer function. Below is the list of functions, updated with this enhancement:
- Analytics Database Analytic Functions
BincodeFit()
- Supportstransform()
method.DecisionForest()
- Supportspredict()
,evaluate()
methods.Fit()
- Supportstransform()
method.GLM()
- Supportspredict()
,evaluate()
methods.GLMPerSegment()
- Supportspredict()
,evaluate()
methods.KMeans()
- Supportspredict()
method.KNN()
- Supportspredict()
,evaluate()
methods.NaiveBayesTextClassifierTrainer()
- Supportspredict()
,evaluate()
methods.NonLinearCombineFit()
- Supportstransform()
method.OneClassSVM()
- Supportspredict()
method.OneHotEncodingFit()
- Supportstransform()
method.OrdinalEncodingFit()
- Supportstransform()
method.OutlierFilterFit()
- Supportstransform()
method.PolynomialFeaturesFit()
- Supportstransform()
method.RandomProjectionFit()
- Supportstransform()
method.RowNormalizeFit()
- Supportstransform()
method.ScaleFit()
- Supportstransform()
method.SimpleImputeFit()
- Supportstransform()
method.SVM()
- Supportspredict()
,evaluate()
methods.TargetEncodingFit()
- Supportstransform()
method.XGBoost()
- Supportspredict()
,evaluate()
methods.
- Time Series Analytic (UAF) Functions
ArimaEstimate()
- Supportsforecast()
,validate()
methods.DFFT()
- Supportsconvolve()
,inverse()
methods.IDFFT()
- Supportsinverse()
method.DFFT2()
- Supportsconvolve()
,inverse()
methods.IDFFT2()
- Supportsinverse()
method.DIFF()
- Supportsinverse()
method.UNDIFF()
- Supportsinverse()
method.SeasonalNormalize()
- Supportsinverse()
method.
- Analytics Database Analytic Functions
-
- New Functions
DataFrame.plot()
- Generates the below type of plots on teradataml DataFrame.- line - Generates line plot.
- bar - Generates bar plot.
- scatter - Generates scatter plot.
- corr - Generates correlation plot.
- wiggle - Generates a wiggle plot.
- mesh - Generates a mesh plot.
DataFrame.itertuples()
- iterate over teradataml DataFrame rows as namedtuples or list.
- New Functions
-
- New Functions
GeoDataFrame.plot()
- Generate the below type of plots on teradataml GeoDataFrame.- line - Generates line plot.
- bar - Generates bar plot.
- scatter - Generates scatter plot.
- corr - Generates correlation plot.
- wiggle - Generates a wiggle plot.
- mesh - Generates a mesh plot.
- geometry - Generates plot on geospatial data.
- New Functions
-
Plot:
Axis
- Genertes the axis for plot.Figure
- Generates the figure for plot.subplots
- Helps in generating multiple plots on a singleFigure
.
-
Bring Your Own Model (BYOM) Function:
DataikuPredict
- Score the data in Vantage using the model trained externally in Dataiku UI and stored in Vantage.
-
async_run_status()
- Function to check the status of asynchronous run(s) using unique run id(s). -
- Regular Arithmetic Functions
DataFrameColumn.abs()
- Computes the absolute value.DataFrameColumn.ceil()
- Returns the ceiling value of the column.DataFrameColumn.ceiling()
- It is an alias forDataFrameColumn.ceil()
function.DataFrameColumn.degrees()
- Converts radians value from the column to degrees.DataFrameColumn.exp()
- Raises e (the base of natural logarithms) to the power of the value in the column, where e = 2.71828182845905.DataFrameColumn.floor()
- Returns the largest integer equal to or less than the value in the column.DataFrameColumn.ln()
- Computes the natural logarithm of values in column.DataFrameColumn.log10()
- Computes the base 10 logarithm.DataFrameColumn.mod()
- Returns the modulus of the column.DataFrameColumn.pmod()
- It is an alias forDataFrameColumn.mod()
function.DataFrameColumn.nullifzero()
- Converts data from zero to null to avoid problems with division by zero.DataFrameColumn.pow()
- Computes the power of the column raised to expression or constant.DataFrameColumn.power()
- It is an alias forDataFrameColumn.pow()
function.DataFrameColumn.radians()
- Converts degree value from the column to radians.DataFrameColumn.round()
- Returns the rounded off value.DataFrameColumn.sign()
- Returns the sign.DataFrameColumn.signum()
- It is an alias forDataFrameColumn.sign()
function.DataFrameColumn.sqrt()
- Computes the square root of values in the column.DataFrameColumn.trunc()
- Provides the truncated value of columns.DataFrameColumn.width_bucket()
- Returns the number of the partition to which column is assigned.DataFrameColumn.zeroifnull()
- Converts data from null to zero to avoid problems with null.
- Trigonometric Functions
DataFrameColumn.acos()
- Returns the arc-cosine value.DataFrameColumn.asin()
- Returns the arc-sine value.DataFrameColumn.atan()
- Returns the arc-tangent value.DataFrameColumn.atan2()
- Returns the arc-tangent value based on x and y coordinates.DataFrameColumn.cos()
- Returns the cosine value.DataFrameColumn.sin()
- Returns the sine value.DataFrameColumn.tan()
- Returns the tangent value.
- Hyperbolic Functions
DataFrameColumn.acosh()
- Returns the inverse hyperbolic cosine value.DataFrameColumn.asinh()
- Returns the inverse hyperbolic sine value.DataFrameColumn.atanh()
- Returns the inverse hyperbolic tangent value.DataFrameColumn.cosh()
- Returns the hyperbolic cosine value.DataFrameColumn.sinh()
- Returns the hyperbolic sine valueDataFrameColumn.tanh()
- Returns the hyperbolic tangent value.
- String Functions
DataFrameColumn.ascii()
- Returns the decimal representation of the first character in column.DataFrameColumn.char2hexint()
- Returns the hexadecimal representation for a character string in a column.DataFrameColumn.chr()
- Returns the Latin ASCII character of a given a numeric code value in column.DataFrameColumn.char()
- It is an alias forDataFrameColumn.chr()
function.DataFrameColumn.character_length()
- Returns the number of characters in the column.DataFrameColumn.char_length()
- It is an alias forDataFrameColumn.character_length()
function.DataFrameColumn.edit_distance()
- Returns the minimum number of edit operations required to transform string in a column into string specified in argument.DataFrameColumn.index()
- Returns the position of a string in a column where string specified in argument starts.DataFrameColumn.initcap()
- Modifies a string column and returns the string with the first character of each word in uppercase.DataFrameColumn.instr()
- Searches the string in a column for occurrences of search string passed as argument.DataFrameColumn.lcase()
- Returns a character string identical to string values in column, with all uppercase letters replaced with their lowercase equivalents.DataFrameColumn.left()
- Truncates string in a column to a specified number of characters desired from the left side of the string.DataFrameColumn.length()
- It is an alias forDataFrameColumn.character_length()
function.DataFrameColumn.levenshtein()
- It is an alias forDataFrameColumn.edit_distance()
function.DataFrameColumn.locate()
- Returns the position of the first occurrence of a string in a column within string in argument.DataFrameColumn.lower()
- It is an alias forDataFrameColumn.character_lcase()
function.DataFrameColumn.lpad()
- Returns the string in a column padded to the left with the characters specified in argument so that the resulting string has length specified in argument.DataFrameColumn.ltrim()
- Returns the string in a column, with its left-most characters removed up to the first character that is not in the string specified in argument.DataFrameColumn.ngram()
- Returns the number of n-gram matches between string in a column, and string specified in argument.DataFrameColumn.nvp()
- Extracts the value of a name-value pair where the name in the pair matches the name and the number of the occurrence specified.DataFrameColumn.oreplace()
- Replaces every occurrence of search string in the column.DataFrameColumn.otranslate()
- Returns string in a column with every occurrence of each character in string in argument replaced with the corresponding character in another argument.DataFrameColumn.replace()
- It is an alias forDataFrameColumn.oreplace()
function.DataFrameColumn.reverse()
- Returns the reverse of string in column.DataFrameColumn.right()
- Truncates input string to a specified number of characters desired from the right side of the string.DataFrameColumn.rpad()
- Returns the string in a column padded to the right with the characters specified in argument so the resulting string has length specified in argument.DataFrameColumn.rtrim()
- Returns the string in column, with its right-most characters removed up to the first character that is not in the string specified in argument.DataFrameColumn.soundex()
- Returns a character string that represents the Soundex code for string in a column.DataFrameColumn.string_cs()
- Returns a heuristically derived integer value that can be used to determine which KANJI1-compatible client character set was used to encode string in a column.DataFrameColumn.translate()
- It is an alias forDataFrameColumn.otranslate()
function.DataFrameColumn.upper()
- Returns a character string with all lowercase letters in a column replaced with their uppercase equivalents.
- Regular Arithmetic Functions
-
- Configuration Options
configure.indb_install_location
Specifies the installation location of In-DB Python package.
- Configuration Options
-
-
- Open Analytics Framework (OpenAF) APIs:
set_auth_token()
set_auth_token()
does not accept username and password anymore. Instead, function opens up a browser session and user should authenticate in browser.- After token expiry, teradataml will open a browser and user needs to authenticate again.
- If client machine does not have browser, then user should copy the URL posted by teradataml and authenticate themselves.
- Security fixes -
auth_token
is not set or retrieved from theconfigure
option anymore. - Manage all user environments.
create_env()
- supports creation of R environment.remove_env()
- Supports removal of remote R environment.remove_all_envs()
- Supports removal of all remote R environments.remove_env()
andremove_all_envs()
supports asynchronous call.
- UserEnv Class – Supports managing of R remote environments.
- Properties:
libs
- Supports listing of libraries in R remote environment.
- Methods:
install_lib()
- Supports installing of libraries in remote R environment.uninstall_lib()
- Supports uninstalling of libraries in remote R environment.update_lib()
- Supports updating of libraries in remote R environment.
- Properties:
- Unbounded Array Framework (UAF) Functions:
ArimaEstimate()
- Added support for
CSS
algorithm viaalgorithm
argument.
- Added support for
- Open Analytics Framework (OpenAF) APIs:
-
- Installation location of In-DB 2.0.0 package is changed. Script() will now work with both 2.0.0 and previous version.
-
-
teradataml is now compatible with SQLAlchemy 2.0.X
- Important notes when user has sqlalchemy version >= 2.0:
- Users will not be able to run the
execute()
method on SQLAlchemy engine object returned byget_context()
andcreate_context()
teradataml functions. This is due to the SQLAlchemy has removed the support forexecute()
method on the engine object. Thus, user scripts whereget_context().execute()
andcreate_context().execute()
, is used, Teradata recommends to replace those with eitherexecute_sql()
function exposed by teradataml orexec_driver_sql()
method on theConnection
object returned byget_connection()
function in teradataml. - Now
get_connection().execute()
accepts only executable sqlalchemy object. Refer tosqlalchemy.engine.base.execute()
for more details. - Teradata recommends to use either
execute_sql()
function exposed by teradataml orexec_driver_sql()
method on theConnection
object returned byget_connection()
function in teradataml, in such cases.
- Users will not be able to run the
- Important notes when user has sqlalchemy version >= 2.0:
-
New utility function
execute_sql()
is added to execute the SQL. -
Extending compatibility for MAC with ARM processors.
-
Added support for floor division (//) between two teradataml DataFrame Columns.
-
Analytics Database Analytic Functions:
GLMPerSegment()
GLMPredictPerSegment()
OneClassSVM()
OneClassSVMPredict()
SVM()
SVMPredict()
TargetEncodingFit()
TargetEncodingTransform()
TrainTestSplit()
WordEmbeddings()
XGBoost()
XGBoostPredict()
-
- Display Options
display.geometry_column_length
Option to display the default length of geometry column in GeoDataFrame.
- Display Options
-
set_auth_token()
function can generate the client id automatically based on org_id when user do not specify it.- Analytics Database Analytic Functions:
ColumnTransformer()
- Does not allow list values for arguments -
onehotencoding_fit_data
andordinalencoding_fit_data
.
- Does not allow list values for arguments -
OrdidnalEncodingFit()
- New arguments added -
category_data
,target_column_names
,categories_column
,ordinal_values_column
. - Allows the list of values for arguments -
target_column
,start_value
,default_value
.
- New arguments added -
OneHotEncodingFit()
- New arguments added -
category_data
,approach
,target_columns
,categories_column
,category_counts
. - Allows the list of values for arguments -
target_column
,other_column
.
- New arguments added -
-
DataFrame.sample()
method output is now deterministic.copy_to_sql()
now preserves the rows of the table even when the view content is copied to the same table name.list_user_envs()
does not raise warning when no user environments found.
-
-
- DataFrame.join
- New arguments
lprefix
andrprefix
added. - Behavior of arguments
lsuffix
andrsuffix
will be changed in future, use new arguments instead. - New and old affix arguments can now be used independently.
- New arguments
- Analytic functions can be imported regardless of context creation. Import after create context constraint is now removed.
ReadNOS
andWriteNOS
now accept dictionary value forauthorization
androw_format
arguments.WriteNOS
supports writing CSV files to external store.- Following model cataloging APIs will be deprecated in future:
- describe_model
- delete_model
- list_models
- publish_model
- retrieve_model
- save_model
- DataFrame.join
-
copy_to_sql()
bug related to NaT value has been fixed.- Tooltip on PyCharm IDE now points to SQLE.
value
argument ofFillNa()
, a Vantage Analytic Library function supports special characters.case
function accepts DataFrame column as value inwhens
argument.
-
-
- New Functions
set_auth_token()
- Sets the JWT token automatically for using Open AF API's.
- New Functions
-
- Display Options
display.suppress_vantage_runtime_warnings
Suppresses the VantageRuntimeWarning raised by teradataml, when set to True.
- Display Options
-
- SimpleImputeFit function arguments
stats_columns
andstats
are made to be optional. - New argument
table_format
is added to ReadNOS(). - Argument
full_scan
is changed toscan_pct
in ReadNOS().
- SimpleImputeFit function arguments
-
- Minor bug fix related to read_csv.
- APPLY and
DataFrame.apply()
supports hash by and local order by. - Output column names are changed for DataFrame.dtypes and DataFrame.tdtypes.
-
-
-
- New Functions
DataFrame.pivot()
- Rotate data from rows into columns to create easy-to-read DataFrames.DataFrame.unpivot()
- Rotate data from columns into rows to create easy-to-read DataFrames.DataFrame.drop_duplicate()
- Drop duplicate rows from teradataml DataFrame.
- New properties
Dataframe.is_art
- Check whether teradataml DataFrame is created on an Analytic Result Table, i.e., ART table or not.
- New Functions
-
-
New Functions
- New Functions Supported on Database Versions: 17.20.x.x
- MODEL PREPARATION AND PARAMETER ESTIMATION functions:
ACF()
ArimaEstimate()
ArimaValidate()
DIFF()
LinearRegr()
MultivarRegr()
PACF()
PowerTransform()
SeasonalNormalize()
Smoothma()
UNDIFF()
Unnormalize()
- SERIES FORECASTING functions:
ArimaForecast()
DTW()
HoltWintersForecaster()
MAMean()
SimpleExp()
- DATA PREPARATION functions:
BinaryMatrixOp()
BinarySeriesOp()
GenseriesFormula()
MatrixMultiply()
Resample()
- DIAGNOSTIC STATISTICAL TEST functions:
BreuschGodfrey()
BreuschPaganGodfrey()
CumulPeriodogram()
DickeyFuller()
DurbinWatson()
FitMetrics()
GoldfeldQuandt()
Portman()
SelectionCriteria()
SignifPeriodicities()
SignifResidmean()
WhitesGeneral()
- TEMPORAL AND SPATIAL functions:
Convolve()
Convolve2()
DFFT()
DFFT2()
DFFT2Conv()
DFFTConv()
GenseriesSinusoids()
IDFFT()
IDFFT2()
LineSpec()
PowerSpec()
- GENERAL UTILITY functions:
ExtractResults()
InputValidator()
MInfo()
SInfo()
TrackingOp()
- MODEL PREPARATION AND PARAMETER ESTIMATION functions:
- New Functions Supported on Database Versions: 17.20.x.x
-
New Features: Inputs to Unbounded Array Framework (UAF) functions
TDAnalyticResult()
- Allows to prepare function output generated by UAF functions to be passed.TDGenSeries()
- Allows to generate a series, that can be passed to a UAF function.TDMatrix()
- Represents a Matrix in time series, that can be created from a teradataml DataFrame.TDSeries()
- Represents a Series in time series, that can be created from a teradataml DataFrame.
-
-
- Native Object Store (NOS) functions support authorization by specifying authorization object.
display_analytic_functions()
categorizes the analytic functions based on function type.- ColumnTransformer accepts multiple values for arguments nonlinearcombine_fit_data, onehotencoding_fit_data, ordinalencoding_fit_data.
-
- Redundant warnings thrown by teradataml are suppressed.
- OpenAF supports when context is created with JWT Token.
- New argument "match_column_order" added to copy_to_sql, that allows DataFrame loading with any column order.
copy_to_sql
updated to map data type timezone(tzinfo) to TIMESTAMP(timezone=True), instead of VARCHAR.- Improved performance for DataFrame.sum and DataFrameColumn.sum functions.
-
-
-
- New Functions
-
ANOVA()
​ClassificationEvaluator()
​ColumnTransformer()
​DecisionForest()
GLM​()
GetFutileColumns()
KMeans()
​KMeansPredict()
​​NaiveBayesTextClassifierTrainer()
​NonLinearCombineFit()
​NonLinearCombineTransform()
​OrdinalEncodingFit​()
OrdinalEncodingTransform()
​RandomProjectionComponents​()
RandomProjectionFit​()
RandomProjectionTransform()
​RegressionEvaluator​()
ROC​()
SentimentExtractor()
​Silhouette​()
TDGLMPredict​()
TextParser​()
VectorDistance()
-
- Updates
display_analytic_functions()
categorizes the analytic functions based on function type.- Users can provide range value for columns argument.
- New Functions
-
- Manage all user environments.
list_base_envs()
- list the available python base versions.​create_env()
- create a new user environment. ​get_env()
- get existing user environment.list_user_envs()
- list the available user environments.​remove_env()
- delete user environment.​remove_all_envs()
- delete all the user environments.
- UserEnv Class – Manage individual user environment.
- Properties
files
- Get files in user environment.libs
- Get libraries in user environment.
- Methods
install_file()
- Install a file in user environment.​remove_file()
- Remove a file in user environment.​install_lib()
- Install a library in user environment.​update_lib()
- Update a library in user environment.​uninstall_lib()
- Uninstall a library in user environment.​status()
- Check the status of​- file installation​
- library installation​
- library update​
- library uninstallation​
refresh()
- Refresh the environment details in local client.
- Properties
- Apply Class – Execute a user script on VantageCloud Lake.​
__init__()
- Instantiate an object of apply for script execution.​install_file()
- Install a file in user environment.​remove_file()
- Remove a file in user environment.​set_data()
– Reset data and related arguments.​execute_script()
– Executes Python script.
- Manage all user environments.
-
- New Functions
DataFrame.apply()
- Execute a user defined Python function on VantageLake Cloud.
- New Functions
-
- New Functions
ONNXPredict()
- Score using model trained externally on ONNX and stored in Vantage.
- New Functions
-
- New Functions
- set_config_params() New API to set all config params in one go.
- New Configuration Options
- For Open Analytics support.​
- ues_url – User Environment Service URL for VantageCloud Lake.​
- auth_token – Authentication token to connect to VantageCloud Lake.
- certificate_file – Path to a CA_BUNDLE file or directory with certificates of trusted CAs.
- For Open Analytics support.​
- New Functions
-
accumulate
argument is working forScaleTransform()
.- Following functions have
accumulate
argument added on Database Versions: 17.20.x.xConvertTo()
GetRowsWithoutMissingValues()
GetRowsWithoutMissingValues()
OutlierFilterFit()
supports multiple output.- For
OutlierFilterFit()
function below arguments are optional in teradataml 17.20.x.xlower_percentile
upper_percentile
outlier_method
replacement_value
percentile_method
- Analytics Database analytic functions – In line help, i.e., help() for the functions is available.​
-
- Vantage Analytic Library FillNa() function: Now
columns
argument is required. output_responses
argument in MLE functionDecisionTreePredict()
, does not allow empty string.- teradataml closes docker sandbox environment properly.
- Users can create context using JWT token.
- Vantage Analytic Library FillNa() function: Now
-
-
-
list_td_reserved_keywords()
- Validates if the specified string is Teradata reserved keyword or not, else lists down all the Teradata reserved keywords.
-
-
-
- Updates
- Multiple columns can be selected using slice operator ([]).
- Updates
-
- Updates
- A warning will be raised, when Teradata reserved keyword is used in Script local mode.
- Updates
-
-
- Numeric overflow issue observed for describe(), sum(), csum(), and mean() has been fixed.
- Error messages are updated for SQLE function arguments accepting multiple datatypes.
- Error messages are updated for SQLE function arguments volatile and persist arguments when non-boolean value is provided.
- DataFrame sample() method can handle column names with special characters like space, hyphen, period etc.
- In-DB SQLE functions can be loaded for any locale setting.
create_context()
- Password containing special characters requires URL encoding as per https://docs.microfocus.com/OMi/10.62/Content/OMi/ExtGuide/ExtApps/URL_encoding.html. teradataml has added a fix to take care of the URL encoding of the password while creating a context. Also, a new argument is added to give a more control over the URL encoding to be done at the time of context creation.
-
-
The Geospatial feature in teradataml enables data manipulation, exploration and analysis on tables, views, and queries on Teradata Vantage that contains Geospatial data.
-
- Point
- LineString
- Polygon
- MultiPoint
- MultiLineString
- MultiPolygon
- GeometryCollection
- GeoSequence
-
- Properties
- columns
- dtypes
- geometry
- iloc
- index
- loc
- shape
- size
- tdtypes
- Geospatial Specific Properties
-
- boundary
- centroid
- convex_hell
- coord_dim
- dimension
- geom_type
- is_3D
- is_empty
- is_simple
- is_valid
- max_x
- max_y
- max_z
- min_x
- min_y
- min_z
- srid
-
- x
- y
- z
-
- is_closed_3D
- is_closed
- is_ring
-
- area
- exterior
- perimeter
-
- Methods
__getattr__()
__getitem__()
__init__()
__repr__()
assign()
concat()
count()
drop()
dropna()
filter()
from_query()
from_table()
get()
get_values()
groupby()
head()
info()
join()
keys()
merge()
sample()
select()
set_index()
show_query()
sort()
sort_index()
squeeze()
tail()
to_csv()
to_pandas()
to_sql()
- Geospatial Specific Methods
-
buffer()
contains()
crosses()
difference()
disjoint()
distance()
distance_3D()
envelope()
geom_equals()
intersection()
intersects()
make_2D()
mbb()
mbr()
overlaps()
relates()
set_exterior()
set_srid()
simplify()
sym_difference()
to_binary()
to_text()
touches()
transform()
union()
within()
wkb_geom_to_sql()
wkt_geom_to_sql()
-
spherical_buffer()
spherical_distance()
spheriodal_buffer()
spheriodal_distance()
set_x()
set_y()
set_z()
-
end_point()
length()
length_3D()
line_interpolate_point()
num_points()
point()
start_point()
-
interiors()
num_interior_ring()
point_on_surface()
-
geom_component()
num_geometry()
-
clip()
get_final_timestamp()
get_init_timestamp()
get_link()
get_user_field()
get_user_field_count()
point_heading()
set_link()
speed()
-
intersects_mbb()
mbb_filter()
mbr_filter()
within_mbb()
-
- Properties
-
- Geospatial Specific Properties
-
- boundary
- centroid
- convex_hell
- coord_dim
- dimension
- geom_type
- is_3D
- is_empty
- is_simple
- is_valid
- max_x
- max_y
- max_z
- min_x
- min_y
- min_z
- srid
-
- x
- y
- z
-
- is_closed_3D
- is_closed
- is_ring
-
- area
- exterior
- perimeter
-
- Geospatial Specific Methods
-
buffer()
contains()
crosses()
difference()
disjoint()
distance()
distance_3D()
envelope()
geom_equals()
intersection()
intersects()
make_2D()
mbb()
mbr()
overlaps()
relates()
set_exterior()
set_srid()
simplify()
sym_difference()
to_binary()
to_text()
touches()
transform()
union()
within()
wkb_geom_to_sql()
wkt_geom_to_sql()
-
spherical_buffer()
spherical_distance()
spheriodal_buffer()
spheriodal_distance()
set_x()
set_y()
set_z()
-
endpoint()
length()
length_3D()
line_interpolate_point()
num_points()
point()
start_point()
-
interiors()
num_interior_ring()
point_on_surface()
-
geom_component()
num_geometry()
-
clip()
get_final_timestamp()
get_init_timestamp()
get_link()
get_user_field()
get_user_field_count()
point_heading()
set_link()
speed()
-
intersects_mbb()
mbb_filter()
mbr_filter()
within_mbb()
-
- Geospatial Specific Properties
-
-
- New Functions
to_csv()
- New Functions
-
- New Functions
- Newly added SQLE functions are accessible only after establishing the connection to Vantage.
display_analytic_functions()
API displays all the available SQLE Analytic functions based on database version.-
Antiselect()
Attribution()
DecisionForestPredict()
DecisionTreePredict()
GLMPredict()
MovingAverage()
NaiveBayesPredict()
NaiveBayesTextClassifierPredict()
NGramSplitter()
NPath()
Pack()
Sessionize()
StringSimilarity()
SVMParsePredict()
Unpack()
-
Antiselect()
Attribution()
BincoodeFit()
BncodeTransform()
CategoricalSummary()
ChiSq()
ColumnSummary()
ConvertTo()
DecisionForestPredict()
DecisionTreePredict()
GLMPredict()
FillRowId()
FTest()
Fit()
Transform()
GetRowsWithMissingValues()
GetRowsWithoutMissingValues()
MovingAverage()
Histogram()
NaiveBayesPredict()
NaiveBayesTextClassifierPredict()
NGramSplitter()
NPath()
NumApply()
OneHotEncodingFit()
OneHotEncodingTransform()
OutlierFilterFit()
OutlierFilterTransform()
Pack()
PolynomialFeatuesFit()
PolynomialFeatuesTransform()
QQNorm()
RoundColumns()
RowNormalizeFit()
RowNormalizeTransform()
ScaleFit()
ScaleTransform()
Sessionize()
SimpleImputeFit()
SimpleImputeTransform()
StrApply()
StringSimilarity()
SVMParsePredict()
UniVariateStatistics()
Unpack()
WhichMax()
WhichMin()
ZTest()
- New Functions
-
- New Functions
- Data Transfer Utility
read_csv()
- Data Transfer Utility
- New Functions
-
- New Functions
- Table Operators
read_nos()
write_nos()
- Table Operators
- New Functions
-
- New Functions
- Model Cataloging
get_license()
set_byom_catalog()
set_license()
- Model Cataloging
- New Functions
-
-
-
- Data Transfer Utility
copy_to_sql()
- New argument "chunksize" added to load data in chunks.- Following Data Transfer Utility Functions updated to specify the number of Teradata sessions to open for data transfer using "open_session" argument:
fastexport()
fastload()
to_pandas()
- Data Transfer Utility
-
- Following Set Operator Functions updated to work with Geospatial data:
concat()
td_intersect()
td_expect()
td_minus()
- Following Set Operator Functions updated to work with Geospatial data:
-
- Model cataloging APIs mentioned below are updated to use session level parameters set by
set_byom_catalog()
andset_license()
such as table name, schema name and license details respectively.delete_byom()
list_byom()
retrieve_byom()
save_byom()
view_log()
- Allows user to view BYOM logs.
- Model cataloging APIs mentioned below are updated to use session level parameters set by
-
-
- CS0733758 -
db_python_package_details()
function is fixed to support latest STO release for pip and Python aliases used. - DataFrame
print()
issue related toResponse Row size is greater than the 1MB allowed maximum.
has been fixed to print the data with lot of columns. - New parameter "chunksize" is added to
DataFrame.to_sql()
andcopy_to_sql()
to fix the issue where the function was failing with error - "Request requires too many SPOOL files.". Reducing the chunksize than the default one will result in successful operation. remove_context()
is fixed to remove the active connection from database.- Support added to specify the number of Teradata data transfer sessions to open for data transfer using
fastexport()
andfastload()
functions. DataFrame.to_sql()
is fixed to support temporary table when default database differs from the username.DataFrame.to_pandas()
now by default support data transfer using regular method. Change is carried out for user to allow the data transfer if utility throttles are configured, i.e., TASM configuration does not support data export using FastExport.save_byom()
now notifies if VARCHAR column is trimmed out if data passed to the API is greater than the length of the VARCHAR column.- Standard error can now be captured for
DataFrame.map_row()
andDataFrame.map_parition()
when executed in LOCAL mode. - Vantage Analytic Library - Underlying SQL can be retrieved using newly added arguments "gen_sql"/"gen_sql_only" for the functions. Query can be viewed with the help
show_query()
. - Documentation example has been fixed for
fastexport()
to show the correct import statement.
- CS0733758 -
Fixed [CS0733758] db_python_package_details() fails on recent STO release due to changes in pip and python aliases.
-
-
- Bring Your Own Analytics Functions
The BYOM feature in Vantage provides flexibility to score the data in Vantage using external models using following BYOM functions:
H2OPredict()
- Score using model trained externally in H2O and stored in Vantage.PMMLPredict()
- Score using model trained externally in PMML and stored in Vantage.- BYOM Model Catalog APIs
save_byom()
- Save externally trained models in Teradata Vantage.delete_byom()
- Delete a model from the user specified table in Teradata Vantage.list_byom()
- List models.retrieve_byom()
- Function to retrieve a saved model.
- Vantage Analytic Library Functions
- New Functions
XmlToHtmlReport()
- Transforms XML output of VAL functions to HTML.
- New Functions
- Bring Your Own Analytics Functions
The BYOM feature in Vantage provides flexibility to score the data in Vantage using external models using following BYOM functions:
-
DataFrame.window()
- Generates Window object on a teradataml DataFrame to run window aggregate functions.DataFrame.csum()
- Returns column-wise cumulative sum for rows in the partition of the dataframe.DataFrame.mavg()
- Returns moving average for the current row and the preceding rows.DataFrame.mdiff()
- Returns moving difference for the current row and the preceding rows.DataFrame.mlinreg()
- Returns moving linear regression for the current row and the preceding rows.DataFrame.msum()
- Returns moving sum for the current row and the preceding rows.- Regular Aggregate Functions
DataFrame.corr()
- Returns the Sample Pearson product moment correlation coefficient.DataFrame.covar_pop()
- Returns the population covariance.DataFrame.covar_samp()
- Returns the sample covariance.DataFrame.regr_avgx()
- Returns the mean of the independent variable.DataFrame.regr_avgy()
- Returns the mean of the dependent variable.DataFrame.regr_count()
- Returns the count of the dependent and independent variable arguments.DataFrame.rege_intercept()
- Returns the intercept of the univariate linear regression line.DataFrame.regr_r2()
- Returns the coefficient of determination.DataFrame.regr_slope()
- Returns the slope of the univariate linear regression line through.DataFrame.regr_sxx()
- Returns the sum of the squares of the independent variable expression.DataFrame.regr_sxy()
- Returns the sum of the products of the independent variable and the dependent variable.DataFrame.regr_syy()
- Returns the sum of the squares of the dependent variable expression.
-
ColumnExpression.window()
- Generates Window object on a teradataml DataFrameColumn to run window aggregate functions.ColumnExpression.desc()
- Sorts ColumnExpression in descending order.ColumnExpression.asc()
- Sorts ColumnExpression in ascending order.ColumnExpression.distinct()
- Removes duplicate value from ColumnExpression.- Regular Aggregate Functions
ColumnExpression.corr()
- Returns the Sample Pearson product moment correlation coefficient.ColumnExpression.count()
- Returns the column-wise count.ColumnExpression.covar_pop()
- Returns the population covariance.ColumnExpression.covar_samp()
- Returns the sample covariance.ColumnExpression.kurtosis()
- Returns kurtosis value for a column.ColumnExpression.median()
- Returns column-wise median value.ColumnExpression.max()
- Returns the column-wise max value.ColumnExpression.mean()
- Returns the column-wise average value.ColumnExpression.min()
- Returns the column-wise min value.ColumnExpression.regr_avgx()
- Returns the mean of the independent variable.ColumnExpression.regr_avgy()
- Returns the mean of the dependent variable.ColumnExpression.regr_count()
- Returns the count of the dependent and independent variable arguments.ColumnExpression.rege_intercept()
- Returns the intercept of the univariate linear regression line.ColumnExpression.regr_r2()
- Returns the coefficient of determination arguments.ColumnExpression.regr_slope()
- Returns the slope of the univariate linear regression line.ColumnExpression.regr_sxx()
- Returns the sum of the squares of the independent variable expression.ColumnExpression.regr_sxy()
- Returns the sum of the products of the independent variable and the dependent variable.ColumnExpression.regr_syy()
- Returns the sum of the squares of the dependent variable expression.ColumnExpression.skew()
- Returns skew value for a column.ColumnExpression.std()
- Returns the column-wise population/sample standard deviation.ColumnExpression.sum()
- Returns the column-wise sum.ColumnExpression.var()
- Returns the column-wise population/sample variance.ColumnExpression.percentile()
- Returns the column-wise percentile.
-
Following set of Window Aggregate Functions return the results over a specified window which can be of any type:
- Cumulative/Expanding window
- Moving/Rolling window
- Contracting/Remaining window
- Grouping window Window Aggregate Functions
Window.corr()
- Returns the Sample Pearson product moment correlation coefficient.Window.count()
- Returns the count.Window.covar_pop()
- Returns the population covariance.Window.covar_samp()
- Returns the sample covariance.Window.cume_dist()
- Returns the cumulative distribution of values.Window.dense_Rank()
- Returns the ordered ranking of all the rows.Window.first_value()
- Returns the first value of an ordered set of values.Window.lag()
- Returns data from the row preceding the current row at a specified offset value.Window.last_value()
- Returns the last value of an ordered set of values.Window.lead()
- Returns data from the row following the current row at a specified offset value.Window.max()
- Returns the column-wise max value.Window.mean()
- Returns the column-wise average value.Window.min()
- Returns the column-wise min value.Window.percent_rank()
- Returns the relative rank of all the rows.Window.rank()
- Returns the rank (1 … n) of all the rows.Window.regr_avgx()
- Returns the mean of the independent variable arguments.Window.regr_avgy()
- Returns the mean of the dependent variable arguments.Window.regr_count()
- Returns the count of the dependent and independent variable arguments.Window.rege_intercept()
- Returns the intercept of the univariate linear regression line arguments.Window.regr_r2()
- Returns the coefficient of determination arguments.Window.regr_slope()
- Returns the slope of the univariate linear regression line.Window.regr_sxx()
- Returns the sum of the squares of the independent variable expression.Window.regr_sxy()
- Returns the sum of the products of the independent variable and the dependent variable.Window.regr_syy()
- Returns the sum of the squares of the dependent variable expression.Window.row_number()
- Returns the sequential row number.Window.std()
- Returns the column-wise population/sample standard deviation.Window.sum()
- Returns the column-wise sum.Window.var()
- Returns the column-wise population/sample variance.
-
- New functions
fastexport()
- Exports teradataml DataFrame to Pandas DataFrame using FastExport data transfer protocol.
- New functions
-
- Display Options
display.blob_length
Specifies default display length of BLOB column in teradataml DataFrame.
- Configuration Options
configure.temp_table_database
Specifies database name for storing the tables created internally.configure.temp_view_database
Specifies database name for storing the views created internally.configure.byom_install_location
Specifies the install location for the BYOM functions.configure.val_install_location
Specifies the install location for the Vantage Analytic Library functions.
- Display Options
-
-
-
to_pandas()
-- Support added to transfer data to Pandas DataFrame using fastexport protocol improving the performance.
- Support added for other arguments similar to Pandas
read_sql()
:coerce_float
parse_dates
-
- Vantage Analytic Library Functions
- Support added to accept datetime.date object for literals/values in
following transformation functions:
FillNa()
Binning()
OneHotEncoder()
LabelEncoder()
- All transformation functions now supports accepting teradatasqlalchemy datatypes as input to "datatype" argument for casting the result.
- Support added to accept datetime.date object for literals/values in
following transformation functions:
- Vantage Analytic Library Functions
-
-
- CS0249633 - Support added for teradataml to work with user/database/tablename containing period (.).
- CS0086594 - Use of dbc.tablesvx versus dbc.tablesvx in teradatasqlalchemy.
- IPython integration to print the teradataml DataFrames in pretty format.
- teradataml DataFrame APIs now support column names same as that of Teradata reserved keywords.
- Issue has been fixed for duplicate rows being loaded via teradataml fastload() API.
- VAL - Empty string now can be passed as input for recoding values using LabelEncoder.
- teradataml extension with SQLAlchemy functions:
- mod() function is fixed to return correct datatype.
- sum() function is fixed to return correct datatype.
- New release of SQLAlchemy1.4.x introduced backward compatibility issue. A fix has been carried out so that teradataml can support latest SQLAlchemy changes.
- Other minor bug fixes.
Fixed the internal library load issue related to the GCC version discrepancies on CentOS platform.
-
-
- Vantage Analytic Library
teradataml now supports executing analytic functions offered by Vantage Analytic Library.
These functions are available via new 'valib' sub-package of teradataml.
Following functions are added as part of this:
- Association Rules:
Association()
- Descriptive Statistics:
AdaptiveHistogram()
Explore()
Frequency()
Histogram()
Overlaps()
Statistics()
TextAnalyzer()
Values()
- Decision Tree:
DecisionTree()
DecisionTreePredict()
DecisionTreeEvaluator()
- Fast K-Means Clustering:
KMeans()
KMeansPredict()
- Linear Regression:
LinReg()
LinRegPredict()
- Logistic Regression:
LogReg()
LogRegPredict()
LogRegEvaluator()
- Factor Analysis:
PCA()
PCAPredict()
PCAEvaluator()
- Matrix Building:
Matrix()
- Statistical Tests:
BinomialTest()
ChiSquareTest()
KSTest()
ParametricTest()
RankTest()
- Variable Transformation:
Transform()
- Transformation Techniques supported for variable transformation:
Binning()
- Perform bin coding to replaces continuous numeric column with a categorical one to produce ordinal values.Derive()
- Perform free-form transformation done using arithmetic formula.FillNa()
- Perform missing value/null replacement transformations.LabelEncoder()
- Re-express categorical column values into a new coding scheme.MinMaxScalar()
- Rescale data limiting the upper and lower boundaries.OneHotEncoder()
- Re-express a categorical data element as one or more numeric data elements, creating a binary numeric field for each categorical data value.Retain()
- Copy one or more columns into the final analytic data set.Sigmoid()
- Rescale data using sigmoid or s-shaped functions.ZScore()
- Rescale data using Z-Score values.
- Association Rules:
- ML Engine Functions (mle)
- Correlation2
- NaiveBayesTextClassifier2
- Vantage Analytic Library
teradataml now supports executing analytic functions offered by Vantage Analytic Library.
These functions are available via new 'valib' sub-package of teradataml.
Following functions are added as part of this:
-
- New Functions
DataFrame.map_row()
- Function to apply a user defined function to each row in the teradataml DataFrame.DataFrame.map_partition()
- Function to apply a user defined function to a group or partition of rows in the teradataml DataFrame.
- New Property
DataFrame.tdtypes
- Get the teradataml DataFrame metadata containing column names and corresponding teradatasqlalchemy types.
- New Functions
-
- New functions
- Database Utility Functions
db_python_package_details()
- Lists the details of Python packages installed on Vantage.
- General Utility Functions
print_options()
view_log()
setup_sandbox_env()
copy_files_from_container()
cleanup_sandbox_env()
- Database Utility Functions
- New functions
-
-
-
- Supports all connection parameters supported by teradatasql.connect().
-
test_script()
can now be executed in 'local' mode, i.e., outside of the sandbox.Script.setup_sto_env()
is deprecated. Usesetup_sandbox_env()
function instead.- Added support for using "quotechar" argument.
-
- Updates
- Visit teradataml User Guide to know more about the updates done to ML Engine analytic
functions. Following type of updates are done to several functions:
- New arguments are added, which are supported only on Vantage Version 1.3.
- Default value has been updated for few function arguments.
- Few arguments were required, but now they are optional.
- Visit teradataml User Guide to know more about the updates done to ML Engine analytic
functions. Following type of updates are done to several functions:
- Updates
-
-
-
Model Cataloging - Functionality to catalog model metadata and related information in the Model Catalog.
save_model()
- Save a teradataml Analytic Function model.retrieve_model()
- Retrieve a saved model.list_model()
- List accessible models.describe_model()
- List the details of a model.delete_model()
- Remove a model from Model Catalog.publish_model()
- Share a model.
-
Interface offers execution in two modes:
- Test/Debug - to test user scripts locally in a containerized environment.
Supporting methods:
setup_sto_env()
- Set up test environment.test_script()
- Test user script in containerized environment.set_data()
- Set test data parameters.
- In-Database Script Execution - to execute user scripts in database.
Supporting methods:
execute_script()
- Execute user script in Vantage.install_file()
- Install or replace file in Database.remove_file()
- Remove installed file from Database.set_data()
- Set test data parameters.
- Test/Debug - to test user scripts locally in a containerized environment.
Supporting methods:
-
DataFrame.show_query()
- Show underlying query for DataFrame.- Regular Aggregates
- New functions
kurtosis()
- Calculate the kurtosis value.skew()
- Calculate the skewness of the distribution.
- Updates
New argumentdistinct
is added to following aggregates to exclude duplicate values.count()
max()
mean()
min()
sum()
std()
- New argument
population
is added to calculate the population standard deviation.
- New argument
var()
- New argument
population
is added to calculate the population variance.
- New argument
- New functions
- Time Series Aggregates
- New functions
kurtosis()
- Calculate the kurtosis value.count()
- Get the total number of values.max()
- Calculate the maximum value.mean()
- Calculate the average value.min()
- Calculate the minimum value.percentile()
- Calculate the desired percentile.skew()
- Calculate the skewness of the distribution.sum()
- Calculate the column-wise sum value.std()
- Calculate the sample and population standard deviation.var()
- Calculate the sample and population standard variance.
- New functions
-
- New functions
- Database Utility Functions
db_drop_table()
db_drop_view()
db_list_tables()
- Vantage File Management Functions
install_file()
- Install a file in Database.remove_file()
- Remove an installed file from Database.
- Database Utility Functions
- Updates
create_context()
- Support added for Stored Password Protection feature.
- Kerberos authentication bug fix.
- New argument
database
added tocreate_context()
API, that allows user to specify connecting database.
- New functions
-
- New functions
Betweenness
Closeness
FMeasure
FrequentPaths
IdentityMatch
Interpolator
ROC
- Updates
- New methods are added to all analytic functions
show_query()
get_build_time()
get_prediction_type()
get_target_column()
- New properties are added to analytic function's Formula argument
response_column
numeric_columns
categorical_columns
all_columns
- New methods are added to all analytic functions
- New functions
-
Fixed the DataFrame data display corruption issue observed with certain analytic functions.
Compatible with Vantage 1.1.1.
The following ML Engine (teradataml.analytics.mle
) functions have new and/or updated arguments to support the Vantage version:
AdaBoostPredict
DecisionForestPredict
DecisionTreePredict
GLMPredict
LDA
NaiveBayesPredict
NaiveBayesTextClassifierPredict
SVMDensePredict
SVMSparse
SVMSparsePredict
XGBoostPredict
-
- DataFrame creation is now quicker, impacting many APIs and Analytic functions.
- Improved performance by reducing the number of intermediate queries issued to Teradata Vantage when not required.
- The number of queries reduced by combining multiple operations into a single step whenever possible and unless the user expects or demands to see the intermediate results.
- The performance improvement is almost proportional to the number of chained and unexecuted operations on a teradataml DataFrame.
- Reduced number of intermediate internal objects created on Vantage.
-
-
- New functions
show_versions()
- to list the version of teradataml and dependencies installed.fastload()
- for high performance data loading of large amounts of data into a table on Vantage. Requiresteradatasql
version16.20.0.48
or above.- Set operators:
concat
td_intersect
td_except
td_minus
case()
- to help construct SQL CASE based expressions.
- Updates
copy_to_sql
- Added support to
copy_to_sql
to save multi-level index. - Corrected the type mapping for index when being saved.
- Added support to
create_context()
updated to support 'JWT' logon mechanism.
- New functions
-
- New functions
NERTrainer
NERExtractor
NEREvaluator
GLML1L2
GLML1L2Predict
- Updates
- Added support to categorize numeric columns as categorical while using formula -
as_categorical()
in theteradataml.common.formula
module.
- Added support to categorize numeric columns as categorical while using formula -
- New functions
-
- Added support to create DataFrame from Volatile and Primary Time Index tables.
DataFrame.sample()
- to sample data.DataFrame.index
- Property to accessindex_label
of DataFrame.- Functionality to process Time Series Data
- Grouping/Resampling time series data:
groupby_time()
resample()
- Time Series Aggregates:
bottom()
count()
describe()
delta_t()
mad()
median()
mode()
first()
last()
top()
- Grouping/Resampling time series data:
- DataFrame API and method argument validation added.
DataFrame.info()
- Default value fornull_counts
argument updated fromNone
toFalse
.Dataframe.merge()
updated to accept columns expressions along with column names toon
,left_on
,right_on
arguments.
-
cast()
- to help cast the column to a specified type.isin()
and~isin()
- to check the presence of values in a column.
-
-
- All the deprecated Analytic functions under the
teradataml.analytics module
have been removed. Newer versions of the functions are available under theteradataml.analytics.mle
and theteradataml.analytics.sqle
modules. The modules removed are:teradataml.analytics.Antiselect
teradataml.analytics.Arima
teradataml.analytics.ArimaPredictor
teradataml.analytics.Attribution
teradataml.analytics.ConfusionMatrix
teradataml.analytics.CoxHazardRatio
teradataml.analytics.CoxPH
teradataml.analytics.CoxSurvival
teradataml.analytics.DecisionForest
teradataml.analytics.DecisionForestEvaluator
teradataml.analytics.DecisionForestPredict
teradataml.analytics.DecisionTree
teradataml.analytics.DecisionTreePredict
teradataml.analytics.GLM
teradataml.analytics.GLMPredict
teradataml.analytics.KMeans
teradataml.analytics.NGrams
teradataml.analytics.NPath
teradataml.analytics.NaiveBayes
teradataml.analytics.NaiveBayesPredict
teradataml.analytics.NaiveBayesTextClassifier
teradataml.analytics.NaiveBayesTextClassifierPredict
teradataml.analytics.Pack
teradataml.analytics.SVMSparse
teradataml.analytics.SVMSparsePredict
teradataml.analytics.SentenceExtractor
teradataml.analytics.Sessionize
teradataml.analytics.TF
teradataml.analytics.TFIDF
teradataml.analytics.TextTagger
teradataml.analytics.TextTokenizer
teradataml.analytics.Unpack
teradataml.analytics.VarMax
- All the deprecated Analytic functions under the
- Fixed the garbage collection issue observed with
remove_context()
when context is created using a SQLAlchemy engine. - Added 4 new Advanced SQL Engine (was NewSQL Engine) analytic functions supported only on Vantage 1.1:
Antiselect
,Pack
,StringSimilarity
, andUnpack
.
- Updated the Machine Learning Engine
NGrams
function to work with Vantage 1.1.
- Python version 3.4.x will no longer be supported. The Python versions supported are 3.5.x, 3.6.x, and 3.7.x.
- Major issue with the usage of formula argument in analytic functions with Python3.7 has been fixed, allowing this package to be used with Python3.7 or later.
- Configurable alias name support for analytic functions has been added.
- Support added to create_context (connect to Teradata Vantage) with different logon mechanisms. Logon mechanisms supported are: 'TD2', 'TDNEGO', 'LDAP' & 'KRB5'.
- copy_to_sql function and DataFrame 'to_sql' methods now provide following additional functionality:
- Create Primary Time Index tables.
- Create set/multiset tables.
- New DataFrame methods are added: 'median', 'var', 'squeeze', 'sort_index', 'concat'.
- DataFrame method 'join' is now updated to make use of ColumnExpressions (df.column_name) for the 'on' clause as opposed to strings.
- Series is supported as a first class object by calling squeeze on DataFrame.
- Methods supported by teradataml Series are: 'head', 'unique', 'name', '__repr__'.
- Binary operations with teradataml Series is not yet supported. Try using Columns from teradataml.DataFrames.
- Sample datasets and commands to load the same have been provided in the function examples.
- New configuration property has been added 'column_casesenitive_handler'. Useful when one needs to play with case sensitive columns.
- New support has been added for Linux distributions: Red Hat 7+, Ubuntu 16.04+, CentOS 7+, SLES12+.
- 16.20.00.01 now has over 100 analytic functions. These functions have been organized into their own packages for better control over which engine to execute the analytic function on. Due to these namespace changes, the old analytic functions have been deprecated and will be removed in a future release. See the Deprecations section in the Teradata Python Package User Guide for more information.
- New DataFrame methods
shape
,iloc
,describe
,get_values
,merge
, andtail
. - New Series methods for NA checking (
isnull
,notnull
) and string processing (lower
,strip
,contains
).
teradataml 16.20.00.00
is the first release version. Please refer to the Teradata Python Package User Guide for a list of Limitations and Usage Considerations.
- Python 3.5 or later
Note: 32-bit Python is not supported.
- Windows 7 (64Bit) or later
- macOS 10.9 (64Bit) or later
- Red Hat 7 or later versions
- Ubuntu 16.04 or later versions
- CentOS 7 or later versions
- SLES 12 or later versions
- Teradata Vantage Advanced SQL Engine:
- Advanced SQL Engine 16.20 Feature Update 1 or later
- For a Teradata Vantage system with the ML Engine:
- Teradata Machine Learning Engine 08.00.03.01 or later
Use pip to install the Teradata Python Package for Advanced Analytics.
Platform | Command |
---|---|
macOS/Linux | pip install teradataml |
Windows | py -3 -m pip install teradataml |
When upgrading to a new version of the Teradata Python Package, you may need to use pip install's --no-cache-dir
option to force the download of the new version.
Platform | Command |
---|---|
macOS/Linux | pip install --no-cache-dir -U teradataml |
Windows | py -3 -m pip install --no-cache-dir -U teradataml |
Your Python script must import the teradataml
package in order to use the Teradata Python Package:
>>> import teradataml as tdml
>>> from teradataml import create_context, remove_context
>>> create_context(host = 'hostname', username = 'user', password = 'password')
>>> df = tdml.DataFrame('iris')
>>> df
SepalLength SepalWidth PetalLength PetalWidth Name
0 5.1 3.8 1.5 0.3 Iris-setosa
1 6.9 3.1 5.1 2.3 Iris-virginica
2 5.1 3.5 1.4 0.3 Iris-setosa
3 5.9 3.0 4.2 1.5 Iris-versicolor
4 6.0 2.9 4.5 1.5 Iris-versicolor
5 5.0 3.5 1.3 0.3 Iris-setosa
6 5.5 2.4 3.8 1.1 Iris-versicolor
7 6.9 3.2 5.7 2.3 Iris-virginica
8 4.4 3.0 1.3 0.2 Iris-setosa
9 5.8 2.7 5.1 1.9 Iris-virginica
>>> df = df.select(['Name', 'SepalLength', 'PetalLength'])
>>> df
Name SepalLength PetalLength
0 Iris-versicolor 6.0 4.5
1 Iris-versicolor 5.5 3.8
2 Iris-virginica 6.9 5.7
3 Iris-setosa 5.1 1.4
4 Iris-setosa 5.1 1.5
5 Iris-virginica 5.8 5.1
6 Iris-virginica 6.9 5.1
7 Iris-setosa 5.1 1.4
8 Iris-virginica 7.7 6.7
9 Iris-setosa 5.0 1.3
>>> df = df[(df.Name == 'Iris-setosa') & (df.PetalLength > 1.5)]
>>> df
Name SepalLength PetalLength
0 Iris-setosa 4.8 1.9
1 Iris-setosa 5.4 1.7
2 Iris-setosa 5.7 1.7
3 Iris-setosa 5.0 1.6
4 Iris-setosa 5.1 1.9
5 Iris-setosa 4.8 1.6
6 Iris-setosa 4.7 1.6
7 Iris-setosa 5.1 1.6
8 Iris-setosa 5.1 1.7
9 Iris-setosa 4.8 1.6
General product information, including installation instructions, is available in the Teradata Documentation website
Use of the Teradata Python Package is governed by the License Agreement for the Teradata Python Package for Advanced Analytics.
After installation, the LICENSE
and LICENSE-3RD-PARTY
files are located in the teradataml
directory of the Python installation directory.