Efficient first principles based modeling via machine learning: from simple representations to high entropy materials
This is the repo associated for our paper Efficient first principles based modeling via machine learning: from simple representations to high entropy materials (publisher version, arXiv version), which we create a large DFT dataset for HEMs and evaluate the in-distribution and out-of-distribution performance of machine learning models.
Our DFT dataset encompasses bcc and fcc structures composed of eight elements and overs all possible 2- to 7-component alloy systems formed by them. The dataset used in the paper is publicly available on Zenodo, which includes initial and final structures, formation energies, atomic magnetic moments and charges among other attributes.
Note: The trajectory data (energies and forces for structures during the DFT relaxations) is not published with this paper; it will be released later with a work on machine learning force fields for HEMs.
No. components | 2 | 3 | 4 | 5 | 6 | 7 | Total |
---|---|---|---|---|---|---|---|
Alloy systems | 28 | 56 | 70 | 56 | 28 | 8 | 246 |
Ordered (2-8 atoms) | 4975 | 22098 | 29494 | 6157 | 3132 | 3719 | 69575 |
SQS (27, 64, or 128 atoms) | 715 | 3302 | 3542 | 4718 | 1183 | 762 | 14222 |
Ordered+SQS | 5690 | 25400 | 33036 | 10875 | 4315 | 4481 | 83797 |
The legend indicates the number of components.
The data on Zenodo provide the Matminer features of initial and final structures and a demo script to train tree-based models. The results in the paper can be readily reproduced by adapting the demo script for different train-test splits. The codes
folder provides the scripts that we used in the paper.
(a) Normalized error obtained by training on structures with ≤ N atoms and evaluating on structures with > N atoms. (b) ALIGNN prediction on SQSs with > 27 atoms, obtained by training on structures with ≤ 4 atoms. (c) Parity plot of the ALIGNN prediction on SQSs with > 27 atoms, obtained by training on structures with ≤ 8 atoms.
(a) Normalized error obtained by training on structures with ≤ N elements and evaluating on structures with >N elements. (b) Parity plot of the ALIGNN prediction on structures with ≥ 3 elements, obtained by training on binary structures. (c) Parity plot of the ALIGNN prediction on structures with ≥ 4 elements, obtained by training on binary and ternary structures.
(a) Normalized error obtained by training on structures with maxΔc below a given threshold and evaluating on the rest. (b) Predictions on non-equimolar structures (maxΔc>0) by the ALIGNN model trained on equimolar structures (maxΔc=0). (c) Predictions on structures with relatively strong deviation from equimolar composition (maxΔc > 0.2) by the ALIGNN model trained on structures with relatively weak deviation from equimolar composition (maxΔc ≤ 2). maxΔc is defined as the maximum concentration difference between any two elements in a structure.