This the second project for the Machine learning Course at EPFL. The project has been done in collaboration with STIP – Chair of Science Technology and Innovation Policy.
Understanding the potential commercial success of patents is vital for research institutions, inventors, and industry stakeholders. This documentation delves into the intersection of machine learning techniques and patent commercialization, presenting a tool designed to forecast the likelihood of a patent's success.
In our pursuit of this goal, we provide a comprehensive examination of the underlying framework and share valuable insights into the essential information required for accurate predictions. By leveraging this methodology, we achieved an impressive accuracy of 0.927 and an F1-score of 0.927.
The authors were not allowed to add the the dataframe to the repository, since it is property of STIP – Chair of Science Technology and Innovation Policy. In order to obtain the dataframe please contact [email protected].
The obtained CSV file needs to be added to the data
folder under the name of modelready_220423.csv
. The folder needs to be created in the repository by the user.
-
Download the necessary requirements differentiating on whether the operating system is iOS or Linux/Windows
- if working on iOS
pip install -r requirements_iOS.txt
- if working on Linux/Windows
pip install -r requirements.txt
- if working on iOS
-
Create the folder named
data
and add the dataframe namedmodelready_220423.csv
inside tha folder. -
To facilitate reproducibility we provide here the parameters of BERT after the finetuning. Please create the folder named
models
and add both trained models to the folder.
utilities.py
contains different utils used for the project;requirements.txt
contains the required python packages;requirements_iOS.txt
contains the required python packages for iOS;main.ipynb
main notebook with analysis and results;data_expl_for_ethics.ipynb
notebook used for evaluating possible ethical risks of the project;data
is the folder that needs to be created by the user. Then place inside the the dataframe namedmodelready_220423.csv
.models
is the folder that needs to be created by the user. Then place inside the parameters of BERT after the finetuning namedbert_trained_no_ge.pth
andbert_trained.pth
(these files where downloaded from the provided link)
- Stefano Viel
- Valerio Ardizio
- Malena Mendilaharzu