Setup • Usage • Results • Dataset V2.0 • Operational Tutorial • Training • Citation
This repo contains source code for CompanyKG (version 2), a large-scale heterogeneous graph developed for fine-grained company similarity quantification and relationship prediction, crucial for applications in the investment industry such as market mapping, competitor analysis, and mergers and acquisitions. CompanyKG comprises 1.17 million companies represented as graph nodes, enriched with company description embeddings, and 51.06 million weighted edges denoting 15 distinct inter-company relations. To facilitate a thorough evaluation of methods for company similarity quantification and relationship prediction, we have created four annotated evaluation tasks: similarity prediction, competitor retrieval, similarity ranking, and edge prediction.
We recommend using Python 3.8. There are also optional dependencies, if you want to be able to convert the KG to one of the data structures used by these packages:
- DGL:
dgl
- iGraph:
python-igraph
- PyTorch Geometric (PyG):
torch-geometric
The companykg
Python package provides a data structure to load CompanyKG into memory,
convert between different graph representations and run evaluation of trained embeddings
or other company-ranking predictors on three evaluation tasks. To install the comapnykg
package and its Python dependencies, activate a virtual
environment (such as Virtualenv or Conda) and run:
pip install -e .
We recommend to simply follow the tutorial.
Implementations of various benchmark graph-based learning models are provided in this repository. To use them, install the ckg_benchmarks
Python package, along with its dependencies, from the benchmarks
subdirectory. First install companykg
as above and then run:
cd benchmarks
pip install -e .
Further instructions for using the benchmarks package for model training and provided in the benchmarks README file.
The main benchmarking results on SP, SR, CR and EP tasks are presented below.
Cite the paper that corresponds to CompanyKG V2:
@inproceedings{cao2024companykg2,
author = {Lele Cao and
Vilhelm von Ehrenheim and
Mark Granroth-Wilding and
Richard Anselmo Stahl and
Drew McCornack and
Armin Catovic and
Dhiana Deva Cavacanti Rocha},
title = {{CompanyKG2: A Large-Scale Heterogeneous Graph for Company Similarity Quantification}},
booktitle = {Proceedings of the 2024 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
year = {2024}
}
Cite the official release of the CompanyKG dataset V2.0 on Zenodo:
@article{companykg_2024_11391315,
author = {Lele Cao and
Vilhelm von Ehrenheim and
Mark Granroth-Wilding and
Richard Anselmo Stahl and
Drew McCornack and
Armin Catovic and
Dhiana Deva Cavacanti Rocha},
title = {{CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company Similarity Quantification}},
month = May,
year = 2024,
publisher = {Zenodo},
version = {2.0},
doi = {10.5281/zenodo.11391315},
url = {https://doi.org/10.5281/zenodo.11391315}
}