This project aims to analyze the relationship between the occurrence of refactorings and the merge effort.
This project requires Python, pygit2, libgit2, Mysql and RefactoringMiner, and it was tested on the following versions:
python==3.6
pygit2==0.27.0
libgit2==0.27.0
mysql==8.0.23
RefactoringMiner==2.1
First of all, if you are using Mac OS or Linux you need to install libgit2. If you have Anaconda installed in your computer, you can simple do:
$ conda install -c conda-forge libgit2
Otherwise, on Mac OS you can install libgit2 using homebrew:
$ brew install libgit2
On Linux you can install the latest version of libgit2 by doing:
$ wget https://github.com/libgit2/libgit2/archive/v0.27.0.tar.gz
$ tar xzf v0.27.0.tar.gz
$ cd libgit2-0.27.0/
$ cmake .
$ make
$ sudo make install
$ sudo ldconfig
For more information http://www.pygit2.org/install.html
- Open the terminal and run the following command:
sudo apt update
-
Enter your password and wait for the update to finish.
-
Next, run:
sudo apt upgrade
- Install
sudo apt install mysql-server
We are using version 2.1 of RefactoringMiner. More details about the tool and installation steps can be found at: https://github.com/tsantalis/RefactoringMiner.
URL: https://github.com/gems-uff/refactoring-merge.git.
The script for creating the database in mysql can be found in the file "script_database.sql" in the output folder of this project.
Mining cloned repositories must be carried out in three steps: collecting branches, collecting code refactorings, and calculating the merge effort. The script that collects the branches must be the first to be executed. The other scripts are independent and can be executed in any order.
./script_1_colllect_branches.py --database [data_base_name] --repo_path [local_repository_path]
Parameters:
--database = (mandatory) database name.
--repo_path = (mandatory) local path to where the Git project repository was cloned.
--log = (optional) boolean parameter indicating the need to print the execution log.
--retry (optional) boolean parameter indicating the need to retry the execution. This retry is only applicable when the repository is updated via git pull.
./script_2_collect_refactorings.py --database [data_base_name] --repo_path [local_repository_path] --refminer_path [refminer_path] --arq_ref_miner [refminer_path_output_file]
Parameters:
--database = (mandatory) database name.
--repo_path = (mandatory) local path to where the Git project repository was cloned.
--refminer_path = (mandatory) Refactoring Miner tool executable code path.
--arq_ref_miner = (mandatory) .json file name that will store the results returned by the Refactorings Miner tool.
--log = (optional) boolean parameter indicating the need to print the execution log.
--retry (optional) boolean parameter indicating the need to retry the execution. This new attempt is applicable in cases of interruptions in script execution or in Refactoring Miner timeout situations.
./script_3_calculate_merge_effort.py --log --database [data_base_name] --repo_path [local_repository_path]
Parameters:
--database = (mandatory) database name.
--repo_path = (mandatory) local path to where the Git project repository was cloned.
--log = (optional) boolean parameter indicating the need to print the execution log.
--retry (optional) boolean parameter indicating the need to retry the execution. This new attempt is applicable in cases of interruptions in the execution of the script or in situations of timeout in the effort calculation.
To build the dataset for the application of the data mining technique (extraction of association rules) just run the script "extract_merge_commits_score.py":
./extract_merge_commits_score.py --branches --selected_refactorings --datasetname ['datasetname.csv']
Parameters:
--branches = (optional) boolean parameter indicating the need to split refactoring attributes into two branches (b1 and b2). When not informed, the script will sum the total of refactorings of each type in the two branches.
--selected_refactorings = (optional) boolean parameter indicating the need to compute only selected refactorings. When informed, the script will only consider the 33 types of refactorings considered in this study.
--datasetname = (optional) name of the produced dataset. When not informed, the script will save in the "output" folder a csv file with the following name: "merge_refactoring_ds.csv"
Hidden due to paper submission.
Copyright (c) 2018 Universidade Federal Fluminense (UFF)
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.