Random subsets for reproduction #25

stovecat · 2023-12-22T03:09:35Z

Thank you for your wonderful work!

I just ran run_pipeline.py and got missing file errors:

FileNotFoundError: [Errno 2] No such file or directory: 'datasets/random-api-completion.test.jsonl'

As far as I understand, the random- prefix denotes split subsets for evaluation, as referred to in Section 3 of the original paper:

Eventually, a total of 1600 test samples are generated for the line completion dataset.

and

From these candidates, we then randomly select 200 non-repetitive API invocations from each repository, resulting in a total of 1600 test samples for the API invocation completion dataset.

For the purpose of reproduction , I would like to ask you about the following four subsets in utils.py:

class FilePathBuilder:
    api_completion_benchmark = 'datasets/random-api-completion.test.jsonl'
    random_line_completion_benchmark = 'datasets/random-line-completion.test.jsonl'
    # short version for codegen
    short_api_completion_benchmark = 'datasets/random-api-completion-short-version.test.jsonl'
    short_random_line_completion_benchmark = 'datasets/random-line-completion-short-version.test.jsonl'

The text was updated successfully, but these errors were encountered:

zfj1998 · 2024-04-08T02:11:30Z

please refer to the pull request. For permission reasons I cannot merge the pr.
#20

zfj1998 closed this as completed Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random subsets for reproduction #25

Random subsets for reproduction #25

stovecat commented Dec 22, 2023

zfj1998 commented Apr 8, 2024

Random subsets for reproduction #25

Random subsets for reproduction #25

Comments

stovecat commented Dec 22, 2023

zfj1998 commented Apr 8, 2024