Fix bugs for conditional sampling #236

AndresAlgaba · 2022-07-20T12:27:32Z

Hi everyone, this PR fixes issues #169 and #235 which report bugs concerning the sampling from the conditional generator after training, i.e., the sample method of CTGAN. The details of the proposed changes are described and discussed in the issues, but I give a summary here:

Issue discrete_column_matrix_st from data_sampler class is always 0 #169 concerns the _discrete_column_matrix_st of the DataSampler in CTGAN. It affects the sample_original_condvec and generate_cond_from_condition_column_info methods. Adding self._discrete_column_matrix_st[current_id] = st fixes the issue for sample_original_condvec. To fix the issue for generate_cond_from_condition_column_info, I have replaced _discrete_column_matrix_st with _discrete_column_cond_st. The difference between both fixes is due to creating a conditional vector vs. selecting a conditional vector from the data (which also contains continuous variables and thus requires other indices).
Issue Conditional sampling and cross-entropy loss #235 was only partially fixed by setting _discrete_column_matrix_st to _discrete_column_cond_st. There were still some issues as the generator contains batchnorm layers, and the model was still in train mode. Setting self._generator.eval() fixed the issue here. For performance, I also added the with torch.no_grad().
I have written test_synthesizer_sampling to test the sampling methods. I noticed that test_log_frequency was failing, but after looking into more detail, it seems this test is outdated Expose log_frequency parameter for conditional sampling #20. The generator's sampling during inference time is always set to the empirical frequency (not sure whether this is intentional, and maybe an issue to request the feature to sample with log frequency may be appropriate?). In training, the default option is the log frequency, but this is not what the test is assessing. Therefore, I have changed this test, but it can also be removed.

fix bugs

736ebdd

AndresAlgaba requested a review from a team as a code owner July 20, 2022 12:27

AndresAlgaba requested review from pvk-developer and removed request for a team July 20, 2022 12:27

AndresAlgaba marked this pull request as draft July 20, 2022 12:27

AndresAlgaba mentioned this pull request Jul 20, 2022

Conditional sampling and cross-entropy loss #235

Open

AndresAlgaba added 6 commits July 20, 2022 14:43

add torch.no_grad for performance

3d492e8

proposed fix in #169

8ce3cf7

undo the proposed fix in #169

b401998

new fix based on #169

0bc3409

Change (outdated) test

af46041

add sampling test

abfb7fc

AndresAlgaba marked this pull request as ready for review July 25, 2022 08:37

AndresAlgaba mentioned this pull request Jul 25, 2022

discrete_column_matrix_st from data_sampler class is always 0 #169

Open

AndresAlgaba mentioned this pull request Feb 28, 2023

functions "generate_cond_from_condition_column_info" and "sample_original_condvec" #273

Open

AndresAlgaba closed this by deleting the head repository Sep 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bugs for conditional sampling #236

Fix bugs for conditional sampling #236

AndresAlgaba commented Jul 20, 2022 •

edited

Loading

Fix bugs for conditional sampling #236

Fix bugs for conditional sampling #236

Conversation

AndresAlgaba commented Jul 20, 2022 • edited Loading

AndresAlgaba commented Jul 20, 2022 •

edited

Loading