Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random subsets for reproduction #25

Closed
stovecat opened this issue Dec 22, 2023 · 1 comment
Closed

Random subsets for reproduction #25

stovecat opened this issue Dec 22, 2023 · 1 comment

Comments

@stovecat
Copy link

Thank you for your wonderful work!

I just ran run_pipeline.py and got missing file errors:

FileNotFoundError: [Errno 2] No such file or directory: 'datasets/random-api-completion.test.jsonl'

As far as I understand, the random- prefix denotes split subsets for evaluation, as referred to in Section 3 of the original paper:

Eventually, a total of 1600 test samples are generated for the line completion dataset.

and

From these candidates, we then randomly select 200 non-repetitive API invocations from each repository, resulting in a total of 1600 test samples for the API invocation completion dataset.

For the purpose of reproduction , I would like to ask you about the following four subsets in utils.py:

class FilePathBuilder:
    api_completion_benchmark = 'datasets/random-api-completion.test.jsonl'
    random_line_completion_benchmark = 'datasets/random-line-completion.test.jsonl'
    # short version for codegen
    short_api_completion_benchmark = 'datasets/random-api-completion-short-version.test.jsonl'
    short_random_line_completion_benchmark = 'datasets/random-line-completion-short-version.test.jsonl'
@zfj1998
Copy link
Collaborator

zfj1998 commented Apr 8, 2024

please refer to the pull request. For permission reasons I cannot merge the pr.
#20

@zfj1998 zfj1998 closed this as completed Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants