Lead halide perovskites are well-known family of functional materials for optoelectronic applications. The γ-phase of CsPbI3 retains favorable optoelectronic characteristics, such as direct bandgap and high charge-carrier mobility. However, any large-scale applications of CsPbI3 face difficulties caused by its polymorphic transitions into the undesirable δ-CsPbI3 phase possessing no useful properties. One of many methods for stabilization of the γ-phase is partial substitution of Pb2+ by Cd2+/Zn2+ and I- by Br-. Such chemical modifications lead to a dramatic increase in the complexity of the corresponding compositional/configurational space (CCS) from computational/predictive perspectives. Due to the size of such space, the application of density functional theory (DFT) calculations for thermodynamic properties assessment is accompanied by modern data-driven solutions, e.g. those based on graph neural network (GNN) architectures.
If you are using this dataset in your research paper, please cite us as
@article{none,
title={none},
author={none},
year={2025},
eprint={none},
archivePrefix={none},
primaryClass={none},
url={none},
}
The dataset contains Cd/Zn- and Br-doped CsPbI3 systems in two polymorphic modifications and predictions of their formation energies made using two GNN-models Allegro trained on the DFT derived properties. We considered limiting cases of the Allegro models fine-tuned on the PLS and PHS datasets due to monotonic increasing of the test RMSEs by increasing of the fraction of high-symmetry structures in training data. For each combination of metal dopant (Cd and Zn) and material phase (black γ- and yellow δ-CsPbI3), listed in the table below, we created distinct CCS.
CsPbI3 phase | Cd dopant | Zn dopant |
---|---|---|
Black | CCS_black_Cd | CCS_black_Zn |
Yellow | CCS_yellow_Cd | CCS_yellow_Zn |
Column tag | Content description |
---|---|
CIF_data | Data representing crystal structure in CIF format |
CIF_filename | Structure name (unique within all dataframes) |
Phase | Black/yellow (corresponds to the phase studied) |
Dopant | Cd/Zn (dopant type in the structure) |
Dopant_content | Dopant content (in mol. %) |
Br_content | Br content (in mol. %) |
Space_group_no | Space group number of the doped structure before relaxation |
Space_group_symbol | Space group symbol of the doped structure before relaxation (Hermann-Mauguin notation) |
Weight | Corresponds to the number of symmetrically equivalent structures within combinatorial composition/configuration space |
Pb_4b_position_substitution | Amount of substituted Pb atoms at Wyckoff site 4b in a supercell (relevant for black phase only) |
I_4c_position_substitution | Amount of substituted I atoms at Wyckoff site 4c in a supercell (relevant for black phase only) |
I_8d_position_substitution | Amount of substituted I atoms at Wyckoff site 8d in a supercell (relevant for black phase only) |
Pb_4c_position_substitution | Amount of substituted Pb atoms at Wyckoff site 4c in a supercell (relevant for yellow phase only) |
I1_4c_position_substitution | Amount of substituted I atoms at Wyckoff site 4c (type 1) in a supercell (relevant for yellow phase only) |
I2_4c_position_substitution | Amount of substituted I atoms at Wyckoff site 4c (type 2) in a supercell (relevant for yellow phase only) |
I3_4c_position_substitution | Amount of substituted I atoms at Wyckoff site 4c (type 3) in a supercell (relevant for yellow phase only) |
Formula | Chemical brutto formula of the structure in format {'chemical element symbol': its amount in a supercell} |
Atomic_numbers | Atomic numbers of the chemical elements in the structure |
Cell | Model basis vectors (in angstroms) before relaxation - constant feature for a certain phase |
Pos | Atomic positions (in angstroms) before relaxation (sequence coincides with that of Atomic_numbers) |
Relaxed_cell | Model basis vectors (in angstroms) after DFT relaxation |
Relaxed_pos | Atomic positions (in angstroms) after DFT relaxation (sequence coincides with that of Atomic_numbers) |
Relaxed_pressure | Pressure (in kbar) for the DFT-relaxed structure |
Relaxed_forces | Atomic forces (in eV/angstrom) for the DFT-relaxed structure (sequence coincides with that of Atomic_numbers) |
Relaxed_energy | Relaxed energy per cell (in eV) for the DFT-relaxed structure |
Relaxed_energy_pa | Relaxed energy per atom (in eV/atom) for the DFT-relaxed structure |
Formation_energy_pa | Formation energy per atom (in eV/atom) for the DFT-relaxed structure |
PHS_train | Boolean flag showing whether the structure is in the PHS training subset |
PHS_val | Boolean flag showing whether the structure is in the PHS validation subset |
PHS_test | Boolean flag showing whether the structure is in the PHS test subset |
PLS_train | Boolean flag showing whether the structure is in the PLS training subset |
PLS_val | Boolean flag showing whether the structure is in the PLS validation subset |
PLS_test | Boolean flag showing whether the structure is in the PLS test subset |
PHS_model_fepa_prediction | Structure formation energy per atom (in eV/atom) predicted by GNN-model trained on the PHS training subset |
PLS_model_fepa_prediction | Structure formation energy per atom (in eV/atom) predicted by GNN-model trained on the PLS training subset |
The repository also contains a data_processing.py file with visualization functions and usage examples. You can visualize statistics on CCSs, energy distributions, group-subgroup relations.
Learning Local Equivariant Representations for Large-Scale Atomistic Dynamics (Allegro)