-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved results by 'augmenting' matrix with fastRP algorithm #1
Comments
Hi! It is interesting result from FastRP using msBERT embedding as initial vector. If you manage to find a place to host this result (with reproduction procedures and utilities), we would be more than happy to link to your result from our repo. |
So I wrote the algorithm in python in this repo: https://github.com/Knorreman/fastRP To run msBERT with [1.0, 1.0] weights run this command in the repo
eval_results_fastRP.zip |
Thanks a lot for more results from fastRP. Good to see competitive result on SR and SP task! I now referenced your results in the Readme of our repo. See here: https://github.com/EQTPartners/CompanyKG#external-results |
Thank you! :) I hope it is helpful! Now I will try and incorporate the edge weights somehow... |
Hello!
I tried to use the companyKG graph together with the fastRP algorithm https://arxiv.org/pdf/1908.11512.pdf
I implemented the algorithm in apache spark https://github.com/Knorreman/graphxfastRP/tree/master
Forgive me for the incomplete README etc... :)
Here is the result when using the msBERT 512 dim vector as init vector instead of a randomly initialized one.
{"source": "embed torch.Size([1169931, 512])", "sp_auc": 0.848861754181647, "sr_validation_acc": 0.6195652173913043, "sr_test_acc": 0.6532258064516129, "cr_topk_hit_rate": [0.227659109895952, 0.32893550163287005, 0.4052213868003342, 0.47640123034859877, 0.566618724842409, 0.6384498177261335, 0.7838241436925648, 0.850617072985494]}
I could not get any results from SimCSE and ADA2 due to their large size and I ran into OOM problems on my PC. The msBERT took like 8-10h to run with my spark code... You can easily implement the fastRP algorithm in numpy/torch and get much better performance but I wanted to make the algorithm distributable with spark! :)
I used alpha1 and alpha2 as 1.0 and I also weighted the starting vector to 1.0 in the linear combination.
As you can see the 'sp_auc' and 'cr_topk_hit_rate' @50 and @100 is better than the results presented in the paper. However the 'sr_test_acc' is not quite as good.
GraphMAE has similar results with 'cr_topk_hit_rate' but not as good with 'sp_auc'
I didnt tune any hyper paramters for fastRP since I had so much trouble even getting it to work with that large graph + vector size. So there can potentially be even better results to gain if tune it even more!
I hope you find it interesting! :) And I can share the torch matrix I found if I can figure out a good host to upload it.
The text was updated successfully, but these errors were encountered: