Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Preprocessing Bundle to ML_Core #25

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

vzeufack
Copy link

@vzeufack vzeufack commented Sep 10, 2020

The current version of the Preprocessing Bundle includes:

  • LabelEncoder
  • OneHotEncoder
  • StandardScaler
  • MinMaxScaler
  • Normaliz
  • Split
  • StratifiedSplit

@richardkchapman
Copy link
Member

@RogerDev Please review

Copy link
Contributor

@lilyclemson lilyclemson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vzeufack Good code and testing! There are a few merge conflicts and minor typos. Please resolve.

Preprocessing/LabelEncoder.ecl Outdated Show resolved Hide resolved
Preprocessing/LabelEncoder.ecl Outdated Show resolved Hide resolved
Preprocessing/OneHotEncoder.ecl Outdated Show resolved Hide resolved
SHARED numberLayout := Preprocessing.Types.numberLayout;

/**
* Computes averages and stdevs for each feature in baseData.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to void shorthand: stdevs = standard deviation (stdevs)

RETURN Result;
ENDMACRO;

<<<<<<< HEAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove line 59. It looks like a leftover from a merge conflict

$.TestOneHotEncoder.RunOneHotEncoderTests;
$.TestStandardScaler.RunStandardScalerTests;
$.TestMinMaxScaler.RunMinMaxScalerTests;
<<<<<<< HEAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please resolve the merge conflict

$.TestAreEqualRows.TestDifferentRows();

$.TestCompare.TestEqualData();
<<<<<<< HEAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please resolve the merge conflict

END;

/**
<<<<<<< HEAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please resolve the merge conflict

Preprocessing/Types.ecl Show resolved Hide resolved
PROJECT(ROWS(LEFT), XF(LEFT)));

#UNIQUENAME(comparisonResult)
<<<<<<< HEAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please resolve the merge conflict

@vzeufack vzeufack force-pushed the master branch 5 times, most recently from e6dff78 to 92ac0e2 Compare September 23, 2020 22:02
Copy link
Contributor

@lilyclemson lilyclemson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor typos need correction. A few descriptions need more details.
Great job! @vzeufack

* </pre>
*/
EXPORT GetMapping(key) := FUNCTIONMACRO
IMPORT Preprocessing.Utils.LabelEncoder;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may cause broken code. please use relative path.

* <p> Data with categorical values replaced by numbers.
*/
EXPORT Encode(dataToEncode, key) := FUNCTIONMACRO
IMPORT Preprocessing.Utils;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may cause broken code. please use relative path.

* <p> Data with categorical values replaced by their original labels.
*/
EXPORT Decode(dataToDecode, encoderKey) := FUNCTIONMACRO
IMPORT Preprocessing.Utils;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may cause broken code. please use relative path.

IMPORT STD;
IMPORT $.Files;

spray := STD.File.SprayDelimited('192.168.56.101',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please abstract the IP address

IMPORT STD;
IMPORT $.Files;

STD.File.SprayDelimited('192.168.56.101',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please abstract the IP address

*
* @param partialKey: same record structure as the key (see below).
* <p> Mapping between feature names and categories.
* Some names are mapped to empty categories such that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a description for the case where the names are mapped to non-empty categories.

t_FieldReal := MLC.types.t_FieldReal;

/**
* shifts the values in a range [min, max].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo: Shift



/**
* scales the data using the following formula:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Scale

END;

/**
* Produces a mapping between numbers when encoded to numbers when decoded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rephrase this line for better description of the function.

END;

/**
* Determines y stats from full data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain y stats

@gravesee
Copy link

gravesee commented Apr 8, 2021

Is this branch going to be merged? Analytics would like to use the preprocessing module to prepare data for deep learning training. I have cloned @vzeufack 's repository yet run into this error when trying to import the module:

image

Not sure if merging the branch would resolve this issue or not. Please advise!

@lilyclemson
Copy link
Contributor

@Zelazny7 If ML_Core bundle is already installed previously, it may conflict with the downloaded ML_Core bundle. Rename the downloaded bundle should solve the issue. Please let me know if anything is unclear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants