Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add docs that provide background about encodings #6

Open
kjappelbaum opened this issue Jun 12, 2022 · 1 comment
Open

add docs that provide background about encodings #6

kjappelbaum opened this issue Jun 12, 2022 · 1 comment
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers help wanted Extra attention is needed

Comments

@kjappelbaum
Copy link
Owner

would be nice to link to some papers/further resources

@kjappelbaum kjappelbaum added documentation Improvements or additions to documentation help wanted Extra attention is needed good first issue Good for newcomers labels Jun 12, 2022
@sgbaird
Copy link

sgbaird commented Jun 18, 2022

Element Mover's Distance

(1) Hargreaves, C. J.; Dyer, M. S.; Gaultois, M. W.; Kurlin, V. A.; Rosseinsky, M. J. The Earth Mover’s Distance as a Metric for the Space of Inorganic Compositions. Chem. Mater. 2020, 32 (24), 10610–10620. https://doi.org/10.1021/acs.chemmater.0c03381.

A snippet relevant to scalar elemental featurizers:

We could assign the atomic number as the vector index for each element, then take the difference between indices as a measure of elemental similarity, but this approach loses the natural clustering of chemical properties afforded by the periodic table. An ideal elemental indexing would perfectly capture the chemical trends observed in nature, but ordering the elements in such a manner is problematic. As well as the unclear resolution of how to handle the f-block elements, chemical trends moving down the periodic table tend to be the direct opposite of those moving across. This leads to some elements having greater substitutional feasibility to their diagonal neighbor than their immediate neighbor, making a simple placement of these difficult.

To solve this problem, Pettifor proposed a method of labeling the elemental scale in his seminal paper of 1984,10 drawn from extensive domain knowledge. These numeric labels may form the basis of a coordinate system allowing us to associate patterns in geometric and physiochemical properties, with extensions to this idea continuing to guide practitioners.11,12 This concept of labeling was further developed by analyzing the probability that an element can be substituted for another given the same structural framework on 20,500 compounds of the inorganic crystal structure database (ICSD) by Glawe et al.13 This probability matrix can be reordered to maximize the likelihood that local neighborhoods will contain elements with greater feasibility of stable substitutions, thus possessing inherent chemical similarities.14 We take the associated indices of this final ordering to give each element its modified Pettifor number.

In this report, we define a composition vector by taking the ratio of each element in a compound assigned to the index of its respective modified Pettifor number. By assuming the sample of the set of feasibly stable compounds (although we know this is not strictly the case15), we can see that these indices capture the truly physical similarities between elements from statistical analysis. Using the modified Pettifor scale gives resultant similarities between compounds which align with human judgement but may be substituted with any continuous elemental scale including less equally spaced distributions, for example, Pauling electronegativity

Another snippet:

Even when using simple atomic numbers as the elemental index, the EMD introduces a significant structure to the UMAP generated clusters, leading to clusters with nontrivial shapes, however without the purity of labels observed when using the modified Pettifor scale (Figure S1). Elemental scales such as Pettifor’s original Mendeleev number13 and alternate orderings of this scale33 result in plots with similar cluster shapes and purity to the modified Pettifor scale (Figures S2−S6). An alternative approach to the use of compositional vectors X and Y is the use of recently developed vectors of features which are derived from values of physicochemical properties of the elements present in the composition.34−36

Composition-based property prediction models

(1) Tian, S. I. P.; Walsh, A.; Ren, Z.; Li, Q.; Buonassisi, T. What Information Is Necessary and Sufficient to Predict Materials Properties Using Machine Learning? 18.
(2) Falkowski, A. R.; Kauwe, S. K.; Sparks, T. D. Optimizing Fractional Compositions to Achieve Extraordinary Properties. Integr Mater Manuf Innov 2021, 10 (4), 689–695. https://doi.org/10.1007/s40192-021-00242-3.
(3) Vasylenko, A.; Antypov, D.; Gusev, V.; Gaultois, M.; Dyer, M.; Rosseinsky, M. Element Selection for Functional Materials Discovery by Integrated Machine Learning of Atomic Contributions to Properties; preprint; In Review, 2022. https://doi.org/10.21203/rs.3.rs-1334648/v1.
(4) Chen, C.; Ong, S. P. AtomSets as a Hierarchical Transfer Learning Framework for Small and Large Materials Datasets. npj Comput Mater 2021, 7 (1), 173. https://doi.org/10.1038/s41524-021-00639-w.
(5) Jha, D.; Choudhary, K.; Tavazza, F.; Liao, W.; Choudhary, A.; Campbell, C.; Agrawal, A. Enhancing Materials Property Prediction by Leveraging Computational and Experimental Data Using Deep Transfer Learning. Nat Commun 2019, 10 (1), 5316. https://doi.org/10.1038/s41467-019-13297-w.
(6) Jha, D.; Ward, L.; Paul, A.; Liao, W.; Choudhary, A.; Wolverton, C.; Agrawal, A. ElemNet: Deep Learning the Chemistry of Materials From Only Elemental Composition. Sci Rep 2018, 8 (1), 17593. https://doi.org/10.1038/s41598-018-35934-y.
(7) Meredig, B.; Agrawal, A.; Kirklin, S.; Saal, J. E.; Doak, J. W.; Thompson, A.; Zhang, K.; Choudhary, A.; Wolverton, C. Combinatorial Screening for New Materials in Unconstrained Composition Space with Machine Learning. Phys. Rev. B 2014, 89 (9), 094104. https://doi.org/10.1103/PhysRevB.89.094104.
(8) Ward, L. A General-Purpose Machine Learning Framework for Predicting. npj Computational Materials 2016, 7.
(9) Gupta, V.; Choudhary, K.; Tavazza, F.; Campbell, C.; Liao, W.; Choudhary, A.; Agrawal, A. Cross-Property Deep Transfer Learning Framework for Enhanced Predictive Analytics on Small Materials Data. Nat Commun 2021, 12 (1), 6595. https://doi.org/10.1038/s41467-021-26921-5.
(10) Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Comput Mater 2020, 6 (1), 138. https://doi.org/10.1038/s41524-020-00406-3.
(11) Goodall, R. E. A.; Lee, A. A. Predicting Materials Properties without Crystal Structure: Deep Representation Learning from Stoichiometry. Nat Commun 2020, 11 (1), 6280. https://doi.org/10.1038/s41467-020-19964-7.
(12) Wang, A. Y.-T.; Kauwe, S. K.; Murdock, R. J.; Sparks, D. Compositionally-Restricted Attention-Based Network for Materials Property Predictions. npj Computational Materials 2021, 33. https://doi.org/10.1038/s41524-021-00545-1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants