Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorFlow Decision Forests #53

Open
szilard opened this issue Jun 4, 2021 · 11 comments
Open

TensorFlow Decision Forests #53

szilard opened this issue Jun 4, 2021 · 11 comments
Labels

Comments

@szilard
Copy link
Owner

szilard commented Jun 4, 2021

https://blog.tensorflow.org/2021/05/introducing-tensorflow-decision-forests.html

@szilard
Copy link
Owner Author

szilard commented Jun 4, 2021

docker run --rm  -ti continuumio/anaconda3 /bin/bash

pip install tensorflow_decision_forests

ipython

@szilard
Copy link
Owner Author

szilard commented Jun 4, 2021

import tensorflow_decision_forests as tfdf

import numpy as np
import pandas as pd
import tensorflow as tf

from sklearn import metrics


d_train = pd.read_csv("https://s3.amazonaws.com/benchm-ml--main/train-1m.csv")
d_test = pd.read_csv("https://s3.amazonaws.com/benchm-ml--main/test.csv")

d_train["dep_delayed_15min"] = np.where(d_train["dep_delayed_15min"]=="Y",1,0)
d_test["dep_delayed_15min"] = np.where(d_test["dep_delayed_15min"]=="Y",1,0)


dtf_train = tfdf.keras.pd_dataframe_to_tf_dataset(d_train, label="dep_delayed_15min")
dtf_test = tfdf.keras.pd_dataframe_to_tf_dataset(d_test, label="dep_delayed_15min")


md = tfdf.keras.GradientBoostedTreesModel(max_depth=10, num_trees=100, shrinkage=0.1)
%time md.fit(x=dtf_train)

y_pred = md.predict(dtf_test)   
print(metrics.roc_auc_score(d_test["dep_delayed_15min"], y_pred))

@szilard
Copy link
Owner Author

szilard commented Jun 4, 2021

m5.2xlarge (8 cores)

In [1]: import tensorflow_decision_forests as tfdf
2021-06-04 16:39:24.583254: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-06-04 16:39:24.583295: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

In [2]:

In [2]: import numpy as np

In [3]: import pandas as pd

In [4]: import tensorflow as tf

In [5]:

In [5]: from sklearn import metrics

In [6]:

In [6]:

In [6]: d_train = pd.read_csv("https://s3.amazonaws.com/benchm-ml--main/train-1m.csv")

In [7]: d_test = pd.read_csv("https://s3.amazonaws.com/benchm-ml--main/test.csv")

In [8]:

In [8]: d_train["dep_delayed_15min"] = np.where(d_train["dep_delayed_15min"]=="Y",1,0)

In [9]: d_test["dep_delayed_15min"] = np.where(d_test["dep_delayed_15min"]=="Y",1,0)

In [10]:

In [10]:

In [10]: dtf_train = tfdf.keras.pd_dataframe_to_tf_dataset(d_train, label="dep_delayed_15min")
2021-06-04 16:39:32.461417: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-06-04 16:39:32.461464: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-06-04 16:39:32.461493: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (78cd809fe258): /proc/driver/nvidia/version does not exist
2021-06-04 16:39:32.461787: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

In [11]: dtf_test = tfdf.keras.pd_dataframe_to_tf_dataset(d_test, label="dep_delayed_15min")

In [12]:

In [12]:

In [12]: md = tfdf.keras.GradientBoostedTreesModel(max_depth=10, num_trees=100, shrinkage=0.1)

In [13]: %time md.fit(x=dtf_train)
2021-06-04 16:39:36.183058: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-06-04 16:39:36.204576: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2499980000 Hz
15625/15625 [==============================] - 15s 780us/step
[INFO kernel.cc:746] Start Yggdrasil model training
[INFO kernel.cc:747] Collect training examples
[INFO kernel.cc:392] Number of batches: 15625
[INFO kernel.cc:393] Number of examples: 1000000
[INFO data_spec_inference.cc:289] 3 item(s) have been pruned (i.e. they are considered out of dictionary) for the column Dest (289 item(s) left) because min_value_count=5 and max_number_of_unique_values=2000
[INFO data_spec_inference.cc:289] 2 item(s) have been pruned (i.e. they are considered out of dictionary) for the column Origin (289 item(s) left) because min_value_count=5 and max_number_of_unique_values=2000
[INFO kernel.cc:769] Dataset:
Number of records: 1000000
Number of columns: 9

Number of columns by type:
        CATEGORICAL: 7 (77.7778%)
        NUMERICAL: 2 (22.2222%)

Columns:

CATEGORICAL: 7 (77.7778%)
        0: "DayOfWeek" CATEGORICAL has-dict vocab-size:8 zero-ood-items most-frequent:"c-5" 147674 (14.7674%)
        1: "DayofMonth" CATEGORICAL has-dict vocab-size:32 zero-ood-items most-frequent:"c-17" 33733 (3.3733%)
        3: "Dest" CATEGORICAL has-dict vocab-size:290 num-oods:3 (0.0003%) most-frequent:"ATL" 58247 (5.8247%)
        5: "Month" CATEGORICAL has-dict vocab-size:13 zero-ood-items most-frequent:"c-8" 88344 (8.8344%)
        6: "Origin" CATEGORICAL has-dict vocab-size:290 num-oods:2 (0.0002%) most-frequent:"ATL" 58796 (5.8796%)
        7: "UniqueCarrier" CATEGORICAL has-dict vocab-size:23 zero-ood-items most-frequent:"WN" 150937 (15.0937%)
        8: "__LABEL" CATEGORICAL integerized vocab-size:3 no-ood-item

NUMERICAL: 2 (22.2222%)
        2: "DepTime" NUMERICAL mean:1343.12 min:1 max:2615 sd:476.663
        4: "Distance" NUMERICAL mean:728.805 min:21 max:4962 sd:574.475

Terminology:
        nas: Number of non-available (i.e. missing) values.
        ood: Out of dictionary.
        manually-defined: Attribute which type is manually defined by the user i.e. the type was not automatically inferred.
        tokenized: The attribute value is obtained through tokenization.
        has-dict: The attribute is attached to a string dictionary e.g. a categorical attribute stored as a string.
        vocab-size: Number of unique values.

[INFO kernel.cc:772] Configure learner
[WARNING gradient_boosted_trees.cc:1532] Subsample hyperparameter given but sampling method does not match.
[WARNING gradient_boosted_trees.cc:1545] GOSS alpha hyperparameter given but GOSS is disabled.
[WARNING gradient_boosted_trees.cc:1554] GOSS beta hyperparameter given but GOSS is disabled.
[WARNING gradient_boosted_trees.cc:1566] SelGB ratio hyperparameter given but SelGB is disabled.
[INFO kernel.cc:797] Training config:
learner: "GRADIENT_BOOSTED_TREES"
features: "DayOfWeek"
features: "DayofMonth"
features: "DepTime"
features: "Dest"
features: "Distance"
features: "Month"
features: "Origin"
features: "UniqueCarrier"
label: "__LABEL"
task: CLASSIFICATION
[yggdrasil_decision_forests.model.gradient_boosted_trees.proto.gradient_boosted_trees_config] {
  num_trees: 100
  decision_tree {
    max_depth: 10
    min_examples: 5
    in_split_min_examples_check: true
    missing_value_policy: GLOBAL_IMPUTATION
    allow_na_conditions: false
    categorical_set_greedy_forward {
      sampling: 0.1
      max_num_items: -1
      min_item_frequency: 1
    }
    growing_strategy_local {
    }
    categorical {
      cart {
      }
    }
    num_candidate_attributes_ratio: -1
    axis_aligned_split {
    }
  }
  shrinkage: 0.1
  validation_set_ratio: 0.1
  early_stopping: VALIDATION_LOSS_INCREASE
  early_stopping_num_trees_look_ahead: 30
  l2_regularization: 0
  lambda_loss: 1
  mart {
  }
  adapt_subsample_for_maximum_training_duration: false
  l1_regularization: 0
  use_hessian_gain: false
  l2_regularization_categorical: 1
}

[INFO kernel.cc:800] Deployment config:

[INFO kernel.cc:837] Train model
[INFO gradient_boosted_trees.cc:480] Default loss set to BINOMIAL_LOG_LIKELIHOOD
[INFO gradient_boosted_trees.cc:1358]   num-trees:1 train-loss:0.952696 train-accuracy:0.806957 valid-loss:0.954296 valid-accuracy:0.807567
[INFO gradient_boosted_trees.cc:1360]   num-trees:2 train-loss:0.930766 train-accuracy:0.806957 valid-loss:0.935331 valid-accuracy:0.807567
[INFO gradient_boosted_trees.cc:1360]   num-trees:28 train-loss:0.759611 train-accuracy:0.837750 valid-loss:0.817667 valid-accuracy:0.827559
[INFO gradient_boosted_trees.cc:1360]   num-trees:56 train-loss:0.695891 train-accuracy:0.852399 valid-loss:0.791661 valid-accuracy:0.834203
[INFO gradient_boosted_trees.cc:1360]   num-trees:86 train-loss:0.650648 train-accuracy:0.864072 valid-loss:0.772629 valid-accuracy:0.838609
[INFO gradient_boosted_trees.cc:1358]   num-trees:100 train-loss:0.633199 train-accuracy:0.868094 valid-loss:0.766061 valid-accuracy:0.839678
[INFO gradient_boosted_trees.cc:319] Truncates the model to 100 tree(s) i.e. 100  iteration(s).
[INFO gradient_boosted_trees.cc:348] Final model valid-loss:0.766061 valid-accuracy:0.839678
[INFO kernel.cc:856] Export model in log directory: /tmp/tmpkhka91x3
[INFO kernel.cc:864] Save model in resources
[INFO kernel.cc:929] Loading model from path
[INFO decision_forest.cc:590] Model loaded with 100 root(s), 93196 node(s), and 8 input feature(s).
[INFO abstract_model.cc:876] Engine "GradientBoostedTreesGeneric" built
[INFO kernel.cc:797] Use fast generic engine
CPU times: user 3min 55s, sys: 6.88 s, total: 4min 2s
Wall time: 2min 6s
Out[13]: <tensorflow.python.keras.callbacks.History at 0x7f88aaaca0d0>

In [14]:

In [14]: y_pred = md.predict(dtf_test)

In [15]: print(metrics.roc_auc_score(d_test["dep_delayed_15min"], y_pred))
0.7612733258837148

@szilard
Copy link
Owner Author

szilard commented Jun 4, 2021

Summary:

m5.2xlarge (8 cores)

Wall time: 2min 6s

In [15]: print(metrics.roc_auc_score(d_test["dep_delayed_15min"], y_pred))
0.7612733258837148

@szilard
Copy link
Owner Author

szilard commented Jun 4, 2021

In comparison XGBoost (m5.2xlarge):

5.696 (time)
0.7478858 (AUC)

(20x faster)

@szilard
Copy link
Owner Author

szilard commented Jun 4, 2021

GPU:

p3.2xlarge

nvidia-docker run -it --rm tensorflow/tensorflow:latest-gpu-jupyter bash

pip install tensorflow_decision_forests sklearn

ipython
In [1]: import tensorflow_decision_forests as tfdf
2021-06-04 19:08:30.923089: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

In [2]:

In [2]: import numpy as np

In [3]: import pandas as pd

In [4]: import tensorflow as tf

In [5]:

In [5]: from sklearn import metrics

In [6]:

In [6]:

In [6]: d_train = pd.read_csv("https://s3.amazonaws.com/benchm-ml--main/train-1m.csv")

In [7]: d_test = pd.read_csv("https://s3.amazonaws.com/benchm-ml--main/test.csv")

In [8]:

In [8]: d_train["dep_delayed_15min"] = np.where(d_train["dep_delayed_15min"]=="Y",1,0)

In [9]: d_test["dep_delayed_15min"] = np.where(d_test["dep_delayed_15min"]=="Y",1,0)

In [10]:

In [10]:

In [10]: dtf_train = tfdf.keras.pd_dataframe_to_tf_dataset(d_train, label="dep_delayed_15min")
2021-06-04 19:08:40.281591: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-06-04 19:08:41.264152: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-04 19:08:41.265175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:00:1e.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-06-04 19:08:41.265220: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-06-04 19:08:41.268516: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-06-04 19:08:41.268583: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-06-04 19:08:41.269670: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-06-04 19:08:41.269984: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-06-04 19:08:41.270925: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-06-04 19:08:41.271691: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-06-04 19:08:41.271938: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-06-04 19:08:41.272066: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-04 19:08:41.273113: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-04 19:08:41.274059: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-06-04 19:08:41.274467: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-06-04 19:08:41.275021: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-04 19:08:41.275996: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:00:1e.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2021-06-04 19:08:41.276119: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-04 19:08:41.277162: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-04 19:08:41.278101: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-06-04 19:08:41.278156: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-06-04 19:08:42.672460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-04 19:08:42.672513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0
2021-06-04 19:08:42.672524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N
2021-06-04 19:08:42.672786: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-04 19:08:42.673838: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-04 19:08:42.674860: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-06-04 19:08:42.675833: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14644 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0)

In [11]: dtf_test = tfdf.keras.pd_dataframe_to_tf_dataset(d_test, label="dep_delayed_15min")

In [12]:

In [12]: md = tfdf.keras.GradientBoostedTreesModel(max_depth=10, num_trees=100, shrinkage=0.1)

In [13]: %time md.fit(x=dtf_train)
2021-06-04 19:09:28.430384: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-06-04 19:09:28.452532: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2300020000 Hz
15625/15625 [==============================] - 27s 1ms/step
[INFO kernel.cc:746] Start Yggdrasil model training
[INFO kernel.cc:747] Collect training examples
[INFO kernel.cc:392] Number of batches: 15625
[INFO kernel.cc:393] Number of examples: 1000000
[INFO data_spec_inference.cc:289] 3 item(s) have been pruned (i.e. they are considered out of dictionary) for the column Dest (289 item(s) left) because min_value_count=5 and max_number_of_unique_values=2000
[INFO data_spec_inference.cc:289] 2 item(s) have been pruned (i.e. they are considered out of dictionary) for the column Origin (289 item(s) left) because min_value_count=5 and max_number_of_unique_values=2000
[INFO kernel.cc:769] Dataset:
Number of records: 1000000
Number of columns: 9

Number of columns by type:
        CATEGORICAL: 7 (77.7778%)
        NUMERICAL: 2 (22.2222%)

Columns:

CATEGORICAL: 7 (77.7778%)
        0: "DayOfWeek" CATEGORICAL has-dict vocab-size:8 zero-ood-items most-frequent:"c-5" 147674 (14.7674%)
        1: "DayofMonth" CATEGORICAL has-dict vocab-size:32 zero-ood-items most-frequent:"c-17" 33733 (3.3733%)
        3: "Dest" CATEGORICAL has-dict vocab-size:290 num-oods:3 (0.0003%) most-frequent:"ATL" 58247 (5.8247%)
        5: "Month" CATEGORICAL has-dict vocab-size:13 zero-ood-items most-frequent:"c-8" 88344 (8.8344%)
        6: "Origin" CATEGORICAL has-dict vocab-size:290 num-oods:2 (0.0002%) most-frequent:"ATL" 58796 (5.8796%)
        7: "UniqueCarrier" CATEGORICAL has-dict vocab-size:23 zero-ood-items most-frequent:"WN" 150937 (15.0937%)
        8: "__LABEL" CATEGORICAL integerized vocab-size:3 no-ood-item

NUMERICAL: 2 (22.2222%)
        2: "DepTime" NUMERICAL mean:1343.12 min:1 max:2615 sd:476.663
        4: "Distance" NUMERICAL mean:728.805 min:21 max:4962 sd:574.475

Terminology:
        nas: Number of non-available (i.e. missing) values.
        ood: Out of dictionary.
        manually-defined: Attribute which type is manually defined by the user i.e. the type was not automatically inferred.
        tokenized: The attribute value is obtained through tokenization.
        has-dict: The attribute is attached to a string dictionary e.g. a categorical attribute stored as a string.
        vocab-size: Number of unique values.

[INFO kernel.cc:772] Configure learner
[WARNING gradient_boosted_trees.cc:1532] Subsample hyperparameter given but sampling method does not match.
[WARNING gradient_boosted_trees.cc:1545] GOSS alpha hyperparameter given but GOSS is disabled.
[WARNING gradient_boosted_trees.cc:1554] GOSS beta hyperparameter given but GOSS is disabled.
[WARNING gradient_boosted_trees.cc:1566] SelGB ratio hyperparameter given but SelGB is disabled.
[INFO kernel.cc:797] Training config:
learner: "GRADIENT_BOOSTED_TREES"
features: "DayOfWeek"
features: "DayofMonth"
features: "DepTime"
features: "Dest"
features: "Distance"
features: "Month"
features: "Origin"
features: "UniqueCarrier"
label: "__LABEL"
task: CLASSIFICATION
[yggdrasil_decision_forests.model.gradient_boosted_trees.proto.gradient_boosted_trees_config] {
  num_trees: 100
  decision_tree {
    max_depth: 10
    min_examples: 5
    in_split_min_examples_check: true
    missing_value_policy: GLOBAL_IMPUTATION
    allow_na_conditions: false
    categorical_set_greedy_forward {
      sampling: 0.1
      max_num_items: -1
      min_item_frequency: 1
    }
    growing_strategy_local {
    }
    categorical {
      cart {
      }
    }
    num_candidate_attributes_ratio: -1
    axis_aligned_split {
    }
  }
  shrinkage: 0.1
  validation_set_ratio: 0.1
  early_stopping: VALIDATION_LOSS_INCREASE
  early_stopping_num_trees_look_ahead: 30
  l2_regularization: 0
  lambda_loss: 1
  mart {
  }
  adapt_subsample_for_maximum_training_duration: false
  l1_regularization: 0
  use_hessian_gain: false
  l2_regularization_categorical: 1
}

[INFO kernel.cc:800] Deployment config:

[INFO kernel.cc:837] Train model
[INFO gradient_boosted_trees.cc:480] Default loss set to BINOMIAL_LOG_LIKELIHOOD
[INFO gradient_boosted_trees.cc:1358]   num-trees:1 train-loss:0.952696 train-accuracy:0.806957 valid-loss:0.954296 valid-accuracy:0.807567
[INFO gradient_boosted_trees.cc:1360]   num-trees:2 train-loss:0.930766 train-accuracy:0.806957 valid-loss:0.935331 valid-accuracy:0.807567
[INFO gradient_boosted_trees.cc:1360]   num-trees:28 train-loss:0.759611 train-accuracy:0.837750 valid-loss:0.817667 valid-accuracy:0.827559
[INFO gradient_boosted_trees.cc:1360]   num-trees:55 train-loss:0.697795 train-accuracy:0.851977 valid-loss:0.792125 valid-accuracy:0.833853
[INFO gradient_boosted_trees.cc:1360]   num-trees:83 train-loss:0.655071 train-accuracy:0.862715 valid-loss:0.774906 valid-accuracy:0.837899
[INFO gradient_boosted_trees.cc:1358]   num-trees:100 train-loss:0.633199 train-accuracy:0.868094 valid-loss:0.766061 valid-accuracy:0.839678
[INFO gradient_boosted_trees.cc:319] Truncates the model to 100 tree(s) i.e. 100  iteration(s).
[INFO gradient_boosted_trees.cc:348] Final model valid-loss:0.766061 valid-accuracy:0.839678
[INFO kernel.cc:856] Export model in log directory: /tmp/tmp4a7ekm_n
[INFO kernel.cc:864] Save model in resources
[INFO kernel.cc:929] Loading model from path
[INFO decision_forest.cc:590] Model loaded with 100 root(s), 93196 node(s), and 8 input feature(s).
[INFO abstract_model.cc:876] Engine "GradientBoostedTreesGeneric" built
[INFO kernel.cc:797] Use fast generic engine
CPU times: user 4min 41s, sys: 8.69 s, total: 4min 50s
Wall time: 2min 22s
Out[13]: <tensorflow.python.keras.callbacks.History at 0x7f6777917048>

@szilard
Copy link
Owner Author

szilard commented Jun 4, 2021

Not using GPU?

dtf_train = tfdf.keras.pd_dataframe_to_tf_dataset(d_train, label="dep_delayed_15min") creates something on GPU:

[0] Tesla V100-SXM2-16GB | 36'C,   0 % |     0 / 16160 MB |
[0] Tesla V100-SXM2-16GB | 36'C,   0 % |     0 / 16160 MB |
[0] Tesla V100-SXM2-16GB | 36'C,   0 % |     0 / 16160 MB |
[0] Tesla V100-SXM2-16GB | 35'C,   0 % |     0 / 16160 MB |
[0] Tesla V100-SXM2-16GB | 36'C,   0 % |   465 / 16160 MB | root(463M)
[0] Tesla V100-SXM2-16GB | 37'C,   0 % |   465 / 16160 MB | root(463M)
[0] Tesla V100-SXM2-16GB | 37'C,   0 % |   465 / 16160 MB | root(463M)
[0] Tesla V100-SXM2-16GB | 37'C,   0 % |   465 / 16160 MB | root(463M)

then md.fit(x=dtf_train)

[0] Tesla V100-SXM2-16GB | 37'C, 0 % | 465 / 16160 MB | root(463M)
[0] Tesla V100-SXM2-16GB | 37'C, 0 % | 465 / 16160 MB | root(463M)
[0] Tesla V100-SXM2-16GB | 37'C, 0 % | 465 / 16160 MB | root(463M)
[0] Tesla V100-SXM2-16GB | 37'C, 0 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C, 0 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C, 0 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C, 0 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C, 2 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C, 2 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C, 2 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C, 2 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C, 2 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C, 2 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C, 2 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C, 2 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C, 2 % | 15111 / 16160 MB | root(15109M)

starts something (calculation of stats etc.) but then when trees are started to be built, not using GPU anymore:

[INFO gradient_boosted_trees.cc:1358]   num-trees:1 train-loss:0.952696 train-accuracy:0.806957 valid-loss:0.954296 valid-accuracy:0.807567
[INFO gradient_boosted_trees.cc:1360]   num-trees:2 train-loss:0.930766 train-accuracy:0.806957 valid-loss:0.935331 valid-accuracy:0.807567
[0] Tesla V100-SXM2-16GB | 37'C,   2 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C,   2 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C,   2 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C,   2 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C,   0 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C,   0 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C,   0 % | 15111 / 16160 MB | root(15109M)
[0] Tesla V100-SXM2-16GB | 37'C,   0 % | 15111 / 16160 MB | root(15109M)

Screen Shot 2021-06-04 at 12 21 18 PM

@Laurae2
Copy link

Laurae2 commented Jun 4, 2021

@szilard
Copy link
Owner Author

szilard commented Jun 4, 2021

Yeah, I was about to post that, quite hilarious.

@szilard
Copy link
Owner Author

szilard commented Jun 5, 2021

Added early_stopping="NONE" to prevent early stopping for small data size:

import tensorflow_decision_forests as tfdf

import numpy as np
import pandas as pd
import tensorflow as tf

from sklearn import metrics


d_train = pd.read_csv("https://s3.amazonaws.com/benchm-ml--main/train-0.1m.csv")
d_test = pd.read_csv("https://s3.amazonaws.com/benchm-ml--main/test.csv")

d_train["dep_delayed_15min"] = np.where(d_train["dep_delayed_15min"]=="Y",1,0)
d_test["dep_delayed_15min"] = np.where(d_test["dep_delayed_15min"]=="Y",1,0)


dtf_train = tfdf.keras.pd_dataframe_to_tf_dataset(d_train, label="dep_delayed_15min")
dtf_test = tfdf.keras.pd_dataframe_to_tf_dataset(d_test, label="dep_delayed_15min")


md = tfdf.keras.GradientBoostedTreesModel(max_depth=10, num_trees=100, shrinkage=0.1, early_stopping="NONE")
%time md.fit(x=dtf_train)

y_pred = md.predict(dtf_test)   
print(metrics.roc_auc_score(d_test["dep_delayed_15min"], y_pred))

@szilard
Copy link
Owner Author

szilard commented Jun 5, 2021

m5.4xlarge (16 cores)

TF-DF:

size time [s] AUC
100K 16 0.704
1M 110 0.761
10M 1400 0.774

XGBoost:

size time [s] AUC
100K 0.6 0.734
1M 3.5 0.748
10M 35 0.754

LightGBM:

size time [s] AUC
100K 2 0.717
1M 4 0.765
10M 20 0.792

How much slower:

size TF-DF/XGBoost TF-DF/LightGBM
100K 25x 8x
1M 30x 27x
10M 40x 70x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants