Skip to content

Latest commit

 

History

History
188 lines (113 loc) · 6.04 KB

README.md

File metadata and controls

188 lines (113 loc) · 6.04 KB

English | 中文

Federated Machine Learning

FederatedML includes implementation of many common machine learning algorithms on federated learning. All modules are developed in a decoupling modular approach to enhance scalability. Specifically, we provide:

  1. Federated Statistic: PSI, Union, Pearson Correlation, etc.

  2. Federated Feature Engineering: Feature Sampling, Feature Binning, Feature Selection, etc.

  3. Federated Machine Learning Algorithms: LR, GBDT, DNN, TransferLearning, which support Heterogeneous and Homogeneous styles.

  4. Model Evaluation: Binary|Multiclass|Regression Evaluation, Local vs Federated Comparison.

  5. Secure Protocol: Provides multiple security protocols for secure multi-party computing and interaction between participants.

federatedml structure
Figure 1: Federated Machine Learning Framework

Algorithm List

This component is typically the first component of a modeling task. It will transform user-uploaded date into Instance object which can be used for the following components.

  • Corresponding module name: DataIO

  • Data Input: DTable, values are raw data.

  • Data Output: Transformed DTable, values are data instance define in federatedml/feature/instance.py

Compute intersect data set of two parties without leakage of difference set information. Mainly used in hetero scenario task.

  • Corresponding module name: Intersection

  • Data Input: DTable

  • Data Output: DTable which keys are occurred in both parties.

Federated Sampling data so that its distribution become balance in each party.This module support both federated and standalone version

  • Corresponding module name: FederatedSample

  • Data Input: DTable

  • Data Output: the sampled data, supports both random and stratified sampling.

Module for feature scaling and standardization.

  • Corresponding module name: FeatureScale

  • Data Input: DTable, whose values are instances.

  • Data Output: Transformed DTable.

  • Model Output: Transform factors like min/max, mean/std.

With binning input data, calculates each column's iv and woe and transform data according to the binned information.

  • Corresponding module name: HeteroFeatureBinning

  • Data Input: DTable with y in guest and without y in host.

  • Data Output: Transformed DTable.

  • Model Output: iv/woe, split points, event counts, non-event counts etc. of each column.

Transfer a column into one-hot format.

  • Corresponding module name: OneHotEncoder
  • Data Input: Input DTable.
  • Data Output: Transformed DTable with new headers.
  • Model Output: Original header and feature values to new header map.

Provide 5 types of filters. Each filters can select columns according to user config.

  • Corresponding module name: HeteroFeatureSelection
  • Data Input: Input DTable.
  • Model Input: If iv filters used, hetero_binning model is needed.
  • Data Output: Transformed DTable with new headers and filtered data instance.
  • Model Output: Whether left or not for each column.

Combine multiple data tables into one.

  • Corresponding module name: Union
  • Data Input: Input DTable(s).
  • Data Output: one DTable with combined values from input DTables.

Build hetero logistic regression module through multiple parties.

  • Corresponding module name: HeteroLR
  • Data Input: Input DTable.
  • Model Output: Logistic Regression model.

Wrapper that runs sklearn Logistic Regression model with local data.

  • Corresponding module name: LocalBaseline
  • Data Input: Input DTable.
  • Model Output: Logistic Regression.

Build hetero linear regression module through multiple parties.

  • Corresponding module name: HeteroLinR
  • Data Input: Input DTable.
  • Model Output: Linear Regression model.

Build hetero poisson regression module through multiple parties.

  • Corresponding module name: HeteroPoisson
  • Data Input: Input DTable.
  • Model Output: Poisson Regression model.

Build homo logistic regression module through multiple parties.

  • Corresponding module name: HomoLR
  • Data Input: Input DTable.
  • Model Output: Logistic Regression model.

Build homo neural network module through multiple parties.

  • Corresponding module name: HomoNN
  • Data Input: Input DTable.
  • Model Output: Neural Network model.

Build hetero secure boosting module through multiple parties.

Corresponding module name: HeteroSecureBoost

  • Data Input: DTable, values are instances.
  • Model Output: SecureBoost Model, consists of model-meta and model-param

Output the model evaluation metrics for user.

  • Corresponding module name: Evaluation

Calculate hetero correlation of features from different parties.

  • Corresponding module name: HeteroPearson

Build hetero neural network module.

  • Corresponding module name: HeteroNN
  • Data Input: Input DTable.
  • Model Output: hetero neural network model.

Secure Protocol

  • Paillier
  • Affine Homomorphic Encryption
  • IterativeAffine Homomorphic Encryption
  • SPDZ

4. RSA