Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error if continuous training data contains null values #428

Merged
merged 3 commits into from
Jan 15, 2025

Conversation

rwedge
Copy link
Contributor

@rwedge rwedge commented Jan 14, 2025

Fixes #414

CU-86b2pdt0b

@sdv-team
Copy link
Contributor

Copy link

codecov bot commented Jan 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.46%. Comparing base (6ed1f19) to head (1ca49d5).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #428      +/-   ##
==========================================
+ Coverage   85.25%   85.46%   +0.20%     
==========================================
  Files          10       11       +1     
  Lines         780      791      +11     
==========================================
+ Hits          665      676      +11     
  Misses        115      115              
Flag Coverage Δ
integration 85.33% <100.00%> (+0.20%) ⬆️
unit 68.77% <100.00%> (+20.82%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rwedge rwedge requested review from frances-h and R-Palazzo January 14, 2025 16:37
Comment on lines 301 to 318
Setup:
- Create dataframe with a continuous column
- Create numpy array with same data
- Create dataframe with a discrete column
- Create numpy array with a discrete column

Input:
- train_data = 2-dimensional numpy array or a pandas.DataFrame
- discrete_columns = list of strings or integers

Output:
None

Side Effects:
- Raises error if a continuous column contains a null value.

Note:
- could create another function for numpy array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: this is an old doc style we don't use anymore, you can just have the summary for new tests

Copy link
Contributor

@R-Palazzo R-Palazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

tests/integration/synthesizer/test_ctgan.py Show resolved Hide resolved
discrete_columns = ['discrete']

ctgan = CTGAN(epochs=1)
with pytest.raises(InvalidDataError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add # Run and Assert
You could also check the error message here

@rwedge rwedge merged commit 42ca6f3 into main Jan 15, 2025
49 checks passed
@rwedge rwedge deleted the issue-414-validate-null-values branch January 15, 2025 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Surface error to user during fit if training data contains null values
4 participants