Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GEN-1468] overwrite tier1 variable #156

Merged
merged 13 commits into from
Jan 11, 2025
Merged
10 changes: 7 additions & 3 deletions scripts/table_updates/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
"BLADDER": "syn26721150",
"NSCLC2": "syn51318735",
"CRC2": "syn52943208",
"RENAL": "syn59474241"
"RENAL": "syn59474241",
"OVARIAN": "syn64042751"
},
"irr":{
Expand All @@ -18,10 +18,14 @@
"BLADDER": "syn26721151",
"NSCLC2": "syn51318736",
"CRC2": "syn52943210",
"RENAL": "syn59474249"
"RENAL": "syn59474249",
"OVARIAN": "syn64042773"
},
"main_genie_release_version": "16.6-consortium",
"tier1a_replacement_mapping":{
"patient_characteristics_tier1a_replacement_mapping_table": "syn64331052",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask what these tables are for?

Copy link
Contributor Author

@danlu1 danlu1 Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patient_characteristics_tier1a_replacement_mapping_table and cancer_panel_test_tier1a_replacement_mapping_table track how original bpc codes are mapped to NAACCR code. This is added to address Chelsea's request: "adds to a log if there is a mismatch between the upload and the replaced value".

"cancer_panel_test_tier1a_replacement_mapping_table": "syn64331055"
},
"main_genie_release_version": "17.4-consortium",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Chelsea-Na do we want to keep this at 17.2?

Copy link
Contributor

@Chelsea-Na Chelsea-Na Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! If we are ready to test the output, we should test it out on 17.4-consortium. We will eventually need to also test on 17.6-consortium once its out.

Do we know what happens if a main GENIE value is missing? Or if the sample/patient is missing? Or if it adds to a log if there is a mismatch between the upload and the replaced value?

Copy link
Contributor Author

@danlu1 danlu1 Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Chelsea-Na This is a hard replacement so I didn't check GENIE value missingness. I checked with Rixing that we think all BPC samples/patients should be in Main GENIE referring to BPC project description. I'm happy to add a check for sample/patient missingness. Moreover, it doesn't log for the discrepancies between uploaded and replaced values. I'm happy to add a function for it if it's helpful.

Copy link
Contributor Author

@danlu1 danlu1 Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to address your question: "Do we know what happens if a main GENIE value is missing? Or if the sample/patient is missing?
If the main GENIE sample/patient is missing, the values will be filled with NaN.
If the GENIE value is missing, then the final dataframe shows missingness since we use all main GENIE values to hard replace BPC. However, this might not be the case because no missingness has been found in target columns in the sample clinical table and patient clinical table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Chelsea-Na Two functions have been added to check for sample/patient missingness and log the discrepancies between uploaded and replaced values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's push the version to 17.6-consortium

"main_genie_data_release_files": "syn16804261",
"main_genie_sample_mapping_table": "syn7434273"
}
17 changes: 9 additions & 8 deletions scripts/table_updates/tests/test_utilities.py
danlu1 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
import pandas as pd
import pytest
import synapseclient
from synapseclient import Schema, Table
from table_updates import utilities
from synapseclient import Schema, Table, client
from utilities import *


@pytest.fixture(scope="session")
Expand All @@ -21,6 +21,7 @@ def table_schema():
column_names=["col1", "col2"],
column_types=["STRING", "INTEGER"],
)

return schema


Expand Down Expand Up @@ -63,7 +64,7 @@ def test_download_synapse_table_default_condition(
syn, table_schema, query_return_df, select, query, expected_df
):
syn.tableQuery = MagicMock(return_value=Table(table_schema, query_return_df))
result = utilities.download_synapse_table(syn, "syn123456", select)
result = download_synapse_table(syn, "syn123456", select)

# validate
syn.tableQuery.assert_called_once_with(query)
Expand Down Expand Up @@ -92,7 +93,7 @@ def test_download_synapse_table_with_condition(
syn, table_schema, query_return_df, condition, query, expected_df
):
syn.tableQuery = MagicMock(return_value=Table(table_schema, query_return_df))
result = utilities.download_synapse_table(syn, "syn123456", condition=condition)
result = download_synapse_table(syn, "syn123456", condition=condition)

# validate
syn.tableQuery.assert_called_once_with(query)
Expand Down Expand Up @@ -126,7 +127,7 @@ def test_download_synapse_table_with_select_and_condition(
syn, table_schema, query_return_df, select, condition, query, expected_df
):
syn.tableQuery = MagicMock(return_value=Table(table_schema, query_return_df))
result = utilities.download_synapse_table(
result = download_synapse_table(
syn, "syn123456", select=select, condition=condition
)

Expand All @@ -142,7 +143,7 @@ def test_download_empty_synapse_table_with_condition(
syn.tableQuery = MagicMock(
return_value=Table(table_schema, pd.DataFrame(columns=["col1", "col2"]))
)
result = utilities.download_synapse_table(syn, "syn123456", condition="col2 = 1")
result = download_synapse_table(syn, "syn123456", condition="col2 = 1")

# validate
syn.tableQuery.assert_called_once_with("SELECT * from syn123456 WHERE col2 = 1")
Expand Down Expand Up @@ -213,7 +214,7 @@ def test_download_empty_synapse_table_with_condition(
],
)
def test_remove_backslash(input_df, cols, expected_df):
results = utilities.remove_backslash(input_df, cols)
results = remove_backslash(input_df, cols)
pd.testing.assert_frame_equal(results, expected_df)


Expand All @@ -227,4 +228,4 @@ def test_remove_backslsh_fail(input_df, cols):
with pytest.raises(
ValueError, match="Invalid column list. Not all columns are in the dataframe."
):
utilities.remove_backslash(input_df, cols)
remove_backslash(input_df, cols)
Loading
Loading