Seems like an "extend_features" option for CrabNet could be useful for several people #17

sgbaird · 2021-12-03T19:52:03Z

Similar to the CBFV package extend_features option. Does this seem feasible?

Marianne and I are trying to incorporate CGCNN features. Trupti wants to incorporate temperature into the model. Hasan would be able to use his custom mat2vec/robocrystallographer feature vectors.

The text was updated successfully, but these errors were encountered:

anthony-wang · 2021-12-05T08:13:48Z

Hi Sterling, I agree! I can try to work on this in the next days but probably will not be able to finish until after the 9th.

sgbaird · 2021-12-07T03:47:41Z

That sounds great. Thanks @anthony-wang! Let me know what I can do to help.

sgbaird · 2021-12-11T16:22:22Z

I forgot that CrabNet has an extend_features flag (just not at the top-level):

CrabNet/utils/composition.py

Lines 170 to 174 in a5be06f

    
           def generate_features(df, elem_prop='oliynyk', 
        
                                 drop_duplicates=False, 
        
                                 extend_features=False, 
        
                                 sum_feat=False, 
        
                                 mini=False):

@truptimohanty @hasan-sayeed

sgbaird · 2022-01-06T14:46:51Z

After looking back at this, the generate_features function isn't actually used in the main workflow (i.e. train_crabnet.get_model). @anthony-wang any thoughts on where in the code the extra features could get added?

sgbaird · 2022-02-05T17:19:39Z

@anthony-wang I'm guessing a quick implementation to extend features would be directly wherever the element descriptors get loaded, correct? I.e. just patch in an extra column to the element descriptors immediately after loading.

sgbaird · 2022-02-09T22:13:09Z

I took an initial stab at an extend_features flag and will be working with @AndrewFalkowski on fleshing out this and some other changes. See the extend_features branch of my fork. It should be functional (runs without error) and I think it's passing things correctly; however, I haven't tested it to see if it increases performance. A good dataset to try on would be the VickersHardnessPrediction dataset. See hv_prediction.py

@truptimohanty @hasan-sayeed do either of you want to give it a try on a dataset with a state variable or additional features?

fedeotto · 2022-06-10T19:59:21Z

Hi! I was looking at the code now and wondering how you implemented the extend_features add-on in the end. I was pretty interested in this since the beginning, as I'm trying to predict electronic properties with temperature dependence. So is it adding an extra entry in the element descriptor (mat2vec)? Many thanks!

sgbaird · 2022-06-11T04:12:18Z

@fedeotto great question! I worked with @AndrewFalkowski refactoring CrabNet and implementing an extended features capability. We had a basic implementation that splices the state variable (e.g. load, temperature) in-between the end of the transformer and the beginning of the recurrent neural network. So the information goes through the final neural network without any self-attention.

CrabNet vs. XGBoost

... It should be functional (runs without error) and I think it's passing things correctly; however, I haven't tested it to see if it increases performance. A good dataset to try on would be the VickersHardnessPrediction dataset. See hv_prediction.py

I tried out the basic extend_features functionality on the VickersHardnessPrediction dataset and compared it against xgboost.

In particular see extend_features_compare.py

XGBoost did somewhat better on this dataset. It's quite possible there's just not enough data to justify the transformer architecture, as there are only something like 300-500 unique compositions represented in the VickersHardnessPrediction dataset. Or maybe it needed to be trained for longer so the recurrent network could recognize the importance of the 513th column, etc. We haven't probed further, and I've been dragging my feet on getting my fork of CrabNet integrated into the parent repository.

RMSE/MAE

# %% append column (e.g. 512 features --> 513 features)
## with applied load
# crabnet MAE: 3.06177
# crabnet RMSE: 5.12390
# xgboost MAE: 2.34908
# xgboost RMSE: 3.81564

## without applied load
# crabnet MAE: 4.42722
# crabnet RMSE: 6.24093
# xgboost MAE: 3.96865
# xgboost RMSE: 5.22576

CrabNet with basic `extend_features`

(source)

XGBoost

(source)

sgbaird · 2022-06-11T04:12:33Z

Also, @AndrewFalkowski tried out some more sophisticated implementations. IIRC these weren't panning out and he had some ideas for future work. @AndrewFalkowski could you shed some additional light here?

sgbaird · 2023-01-12T21:19:01Z

@lantunes recently put out a codebase and preprint a version of CrabNet that supports extended features.

lantunes · 2023-01-14T18:50:00Z

@sgbaird thanks for including me on this thread! There appears to be a lot interest in adding extended features support to CrabNet. I'd be happy to help where I can. I'm not sure if anyone's tried the approach I adopt. But basically it involves a learned non-linear transformation of the extended features vector, followed by a tiling operation, which results in a matrix which can be added element-wise to the output of the last Transformer block. I found in my experimentation that this method was far superior than simply concatenating the features to the flattened residual net input (the other approach I tried). The approach is described in the preprint you refer to.

ADWilhelm · 2024-08-09T11:28:19Z

@sgbaird
Hello, I hope this is the right place for my question: I am attempting to use the extend_features option. I am always getting the same results wether I include extend_features or not. Can you help me?

Here is what I do. The data set is a pandas dataframe:
Code without extend_feature:

cb_absorber_only = CrabNet(mat_prop="pce",
                           model_name='absorber_only',
                           elem_prop='mat2vec', 
                           learningcurve=True)
cb_absorber_only.fit(train_df, val_df)

Code with extend_feature:
"Bandgap" is an integer variable.

cb_bandgap = CrabNet(mat_prop="pce",
                    model_name='with_bandgap',
                    elem_prop='mat2vec',
                    extend_features=["bandgap"])
cb_bandgap.fit(train_df, val_df)

Both result in the exact same predictions. What could be the reason? Perhaps I am making some silly mistake in passing the extend_features argument. Thanks in advance!

sgbaird mentioned this issue Feb 9, 2022

Extend features functionality sparks-baird/CrabNet#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seems like an "extend_features" option for CrabNet could be useful for several people #17

Seems like an "extend_features" option for CrabNet could be useful for several people #17

sgbaird commented Dec 3, 2021 •

edited

Loading

anthony-wang commented Dec 5, 2021

sgbaird commented Dec 7, 2021

sgbaird commented Dec 11, 2021

sgbaird commented Jan 6, 2022

sgbaird commented Feb 5, 2022 •

edited

Loading

sgbaird commented Feb 9, 2022

fedeotto commented Jun 10, 2022

sgbaird commented Jun 11, 2022

sgbaird commented Jun 11, 2022

sgbaird commented Jan 12, 2023

lantunes commented Jan 14, 2023

ADWilhelm commented Aug 9, 2024

Seems like an "extend_features" option for CrabNet could be useful for several people #17

Seems like an "extend_features" option for CrabNet could be useful for several people #17

Comments

sgbaird commented Dec 3, 2021 • edited Loading

anthony-wang commented Dec 5, 2021

sgbaird commented Dec 7, 2021

sgbaird commented Dec 11, 2021

sgbaird commented Jan 6, 2022

sgbaird commented Feb 5, 2022 • edited Loading

sgbaird commented Feb 9, 2022

fedeotto commented Jun 10, 2022

sgbaird commented Jun 11, 2022

CrabNet vs. XGBoost

RMSE/MAE

CrabNet with basic extend_features

XGBoost

sgbaird commented Jun 11, 2022

sgbaird commented Jan 12, 2023

lantunes commented Jan 14, 2023

ADWilhelm commented Aug 9, 2024

sgbaird commented Dec 3, 2021 •

edited

Loading

sgbaird commented Feb 5, 2022 •

edited

Loading

CrabNet with basic `extend_features`