Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write a test to make sure a new DB is downloaded when an already downloaded DB exists with an old schema #1108

Open
marco-c opened this issue Nov 19, 2019 · 34 comments
Labels
good-first-bug Good for newcomers

Comments

@marco-c
Copy link
Collaborator

marco-c commented Nov 19, 2019

In the download method,

def download(path, support_files_too=False):
, we should handle the case where the DB is already downloaded but is using an older schema version than the current one.

@marco-c marco-c added the good-first-bug Good for newcomers label Nov 19, 2019
@marco-c
Copy link
Collaborator Author

marco-c commented Nov 19, 2019

It's possible that this is already kind of supported, as we check the ETag and the ETag changes when the schema version changes. We need to add a test to confirm.

@Nathan-Wisner
Copy link

Nathan-Wisner commented Nov 30, 2019

For this, would we want to update and download a newer version or just exit the function with false like
if is_old_schema(path): return False

@marco-c
Copy link
Collaborator Author

marco-c commented Dec 1, 2019

We should try to download a new version.

@mohanish2504
Copy link

Hi,
I want to contribute and I am a beginner, I wanted to know is the issue resolved? I am stuck here

if is_old_schema(path): return False

what to do if it is an old schema?

@Luis-gd
Copy link

Luis-gd commented Dec 2, 2019

Can i take this work?

@marco-c
Copy link
Collaborator Author

marco-c commented Dec 2, 2019

I want to contribute and I am a beginner, I wanted to know is the issue resolved? I am stuck here
if is_old_schema(path): return False

No, the issue is still here. If you look at the code, you'll see that is_old_schema is not checking the schema version of the downloaded DB, but the schema version of the remote DB.

what to do if it is an old schema?

Please read all the comments, it is explained above.

@marco-c
Copy link
Collaborator Author

marco-c commented Dec 2, 2019

Can i take this work?

You can work on any issue which is open and has no PR linked to it.
Did you read #1092 or CONTRIBUTING.md?

@avijit1258
Copy link

Hello @marco-c ,

I am thinking to implement a new function download_check_version() similar to download_check_etag() in the bugbug/utils.py which will be called after downloading the current DB. Then return false if that is the case. Am I going to the right direction?

Avijit

@marco-c
Copy link
Collaborator Author

marco-c commented Dec 23, 2019

@avijit1258 the goal is to remove a DB when its schema version is old compared to the current schema. I'm not sure how what you proposed is going to do this, but I'm not sure exactly what you mean. Proposing a PR is the best way to show what you mean ;)

@avijit1258
Copy link

avijit1258 commented Dec 23, 2019

@marco-c How can I reproduce the situation you are telling? Can you give any url to try or way to figure out this? I have installed the repo and trying to figure things out. This is my first time trying to contribute to such a real project.
thanks

@marco-c
Copy link
Collaborator Author

marco-c commented Dec 24, 2019

@avijit1258 the first step is to look at the code I linked at in the first comment, and take a look at all the functions it calls and figure out how it works.

@avijit1258
Copy link

avijit1258 commented Dec 26, 2019

I have three questions.

  1. @marco-c As you suggested I have gone through download(), is_old_schema(), download_check_etag() and extract_file(). I need some clarification. After calling the download_check_etag() function, download() function calls extract_file(). I am wondering where (by which code) download() downloaded the DB.
def download(path, support_files_too=False):
    # If a DB with the current schema is not available yet, we can't download.
    print('Path: ', path)
    if is_old_schema(path):
        return False
    zst_path = f"{path}.zst"

    url = DATABASES[path]["url"]
    try:
        logger.info(f"Downloading {url} to {zst_path}")
        updated = utils.download_check_etag(url, zst_path)

        if updated:
            extract_file(zst_path)
        successful = True
        if support_files_too:
            for support_file in DATABASES[path]["support_files"]:
                successful |= download_support_file(path, support_file)

        return successful
    except requests.exceptions.HTTPError:
        logger.info(f"{url} is not yet available to download", exc_info=True)
        return False
  1. In the first comment of download function, it says

If a DB with the current schema is not available yet, we can't download.

Here current means new or latest? or existing?

  1. Another question is if I am getting the issue right the purpose is to ensure the DB schema is not updated in the server while we are downloading the DB. For that, we are planning to check the DB schema after downloading with the latest schema version of the server?

Thanks

@avijit1258
Copy link

@marco-c I have added a PR with my proposed solution. #1206

@marco-c
Copy link
Collaborator Author

marco-c commented Jan 3, 2020

No, the purpose is to avoid using an already downloaded DB if its schema is not the current schema. The schema of the downloaded DB can be read from the ".version" file, its current schema can be read from the DATABASES dict.
Thinking about this more, I think this is basically already covered by the etag check. So we just need to write a test.

@marco-c marco-c changed the title Remove DBs when the schema of the downloaded DB is older than the current schema Write a test to make sure a new DB is downloaded when an already downloaded DB exists with an old schema Jan 3, 2020
@avijit1258
Copy link

@marco-c Thanks for the clarification. Things are becoming more clear now. By saying test, do you mean checking it in the download function or writing a test case for download function in test file?

@marco-c
Copy link
Collaborator Author

marco-c commented Jan 5, 2020

@marco-c Thanks for the clarification. Things are becoming more clear now. By saying test, do you mean checking it in the download function or writing a test case for download function in test file?

The second: writing a test case.

@abhaykatheria
Copy link
Contributor

Can I just simply assert that new db is not downloaded
assert updated '' new db not downloaded"
And then in test I can write case where the current dB is older but newer can't be downloaded.

@sakshamb2113

This comment was marked as off-topic.

@marco-c

This comment was marked as off-topic.

@tarunjarvis5
Copy link

But isn't line 91 already check if its old schema and therefore if it is then returning false
line88:# Download and extract databases.
line89:def download(path, support_files_too=False):
line90: # If a DB with the current schema is not available yet, we can't download.
line91: if is_old_schema(path):
line92: return False

@marco-c
Copy link
Collaborator Author

marco-c commented Dec 24, 2020

@tarunjarvis5 yes, the code is doing that, but this issue is about adding a test to ensure it works correctly.

@ghost

This comment was marked as off-topic.

@marco-c

This comment was marked as off-topic.

@timalsinab
Copy link

Hello, I am an outreachy applicant. Can I work on it?

@suhaibmujahid
Copy link
Member

Hello, I am an outreachy applicant. Can I work on it?

Yes, please read CONTRIBUTING.md and #1092 for the contributing rules.

@fadebowaley
Copy link
Contributor

I'm interested in this, please confirm if its fine to add a new file for this test. Thanks

@suhaibmujahid
Copy link
Member

I'm interested in this, please confirm if its fine to add a new file for this test. Thanks

As stated in CONTRIBUTING.md, you can work on any issue without asking, as long as there is no open PR linked to it.

@KhadijaKamran
Copy link

Hey @marco-c, Do I need to write assert statements for testing this function?

@suhaibmujahid
Copy link
Member

Hey @marco-c, Do I need to write assert statements for testing this function?

Yes, please read the comments above.

@fadebowaley
Copy link
Contributor

Workng on Bug ID id 1108
I checked the db.py and also i understand the problem . How to write the test file ?. I noticed a tests folder called 'tests' is this where the solution will be? Will I use pytest to run this ?

I also want to discover how to run the test (I have already installed -test-requirements.txt

@suhaibmujahid
Copy link
Member

How to write the test file ?. I noticed a tests folder called 'tests' is this where the solution will be?

The tests directory is where the tests live; you could take inspiration from other files there.

Will I use pytest to run this ?

I also want to discover how to run the test (I have already installed -test-requirements.txt

Yes, you could run all tests using python -m pytest tests/test_*.py.

@Emeka-Onwuepe
Copy link
Contributor

@fadebowaley are you still working on this?

@fadebowaley
Copy link
Contributor

fadebowaley commented Mar 16, 2023

@suhaibmujahid To make sure a new DB is downloaded, I have assertion error while attempting to download the database
Any idea on how to resolve this ?

............
   assert  db.is_different_schema(db_path) 
    assert not db.download(db_path) # the assertion error comes here 

    assert not os.path.exists(db_path)
    assert not os.path.exists(db_path.with_suffix(db_path.suffix + ".zst"))
    assert not os.path.exists(db_path.with_suffix(db_path.suffix + ".zst.etag"))
............................./pytest-35/test_download_newer_schema0/prova.json
=========================================================== short test summary info ============================================================
FAILED tests/test_db.py::test_download_newer_schema - AssertionError: assert False
======================================================== 1 failed, 35 passed in 58.69s =========================================================

............

@fadebowaley
Copy link
Contributor

@fadebowaley are you still working on this?

As stated in CONTRIBUTING.md, you can work on any issue without asking, as long as there is no open PR linked to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good-first-bug Good for newcomers
Projects
None yet
Development

No branches or pull requests