Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use MBAR bootstrap error #1077

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions openfe/protocols/openmm_utils/multistate_analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import numpy.typing as npt
from openmmtools import multistate
from openff.units import unit, ensure_quantity
from pymbar import MBAR
from pymbar.utils import ParameterError
from openfe.analysis import plotting
from typing import Optional, Union
Expand Down Expand Up @@ -220,13 +221,14 @@
* Allow folks to pass in extra options for bootstrapping etc..
* Add standard test against analyzer.get_free_energy()
"""
mbar = analyzer._create_mbar(u_ln, N_l)

try:
# pymbar 3
DF_ij, dDF_ij = mbar.getFreeEnergyDifferences()
mbar = MBAR(u_ln, N_l, nbootstraps=1000)
DF_ij, dDF_ij = mbar.getFreeEnergyDifferences(compute_uncertainty=True, uncertainty_method="bootstrap")
except AttributeError:
r = mbar.compute_free_energy_differences()
# pymbar 4
mbar = MBAR(u_ln, N_l, solver_protocol="robust", n_bootstraps=1000, bootstrap_solver_protocol="robust")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is most of the cost in the forward & reverse analysis?

Copy link
Collaborator Author

@jthorton jthorton Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah running the bootstrapping on repeat is expensive! One thought on the forward and backward estimates should we be subsampling using g_t calculated for this subset of data? In the industry benchmarking I calculated it 3 ways no subsampling, subsample based on the % of data and subsample using the g_t calculated for the full set of data. https://github.com/OpenFreeEnergy/IndustryBenchmarks2024/blob/fb60d7a971cb5d04787d796b6adcf257d905786a/industry_benchmarks/analysis/1_download_and_extract_data.py#L464-L552

r = mbar.compute_free_energy_differences(compute_uncertainty=True, uncertainty_method="bootstrap")

Check warning on line 231 in openfe/protocols/openmm_utils/multistate_analysis.py

View check run for this annotation

Codecov / codecov/patch

openfe/protocols/openmm_utils/multistate_analysis.py#L230-L231

Added lines #L230 - L231 were not covered by tests
DF_ij = r['Delta_f']
dDF_ij = r['dDelta_f']

Expand Down
17 changes: 10 additions & 7 deletions openfe/tests/protocols/test_openmmutils.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,8 @@ def test_free_energies(self, analyzer):
ret_dict = analyzer.unit_results_dict
assert len(ret_dict.items()) == 7
assert pytest.approx(ret_dict['unit_estimate'].m) == -47.9606
assert pytest.approx(ret_dict['unit_estimate_error'].m) == 0.02396789
# more variation when using bootstrap errors so we need a loser tolerance
assert pytest.approx(ret_dict['unit_estimate_error'].m, rel=1e4) == 0.0251
# forward and reverse (since we do this ourselves)
assert_allclose(
ret_dict['forward_and_reverse_energies']['fractions'],
Expand All @@ -269,23 +270,25 @@ def test_free_energies(self, analyzer):
-48.025258, -48.006349, -47.986304, -47.972138, -47.960623]),
rtol=1e-04,
)
# results generated using pymbar3 with 1000 bootstrap iterations
assert_allclose(
ret_dict['forward_and_reverse_energies']['forward_dDGs'].m,
np.array([0.07471 , 0.052914, 0.041508, 0.036613, 0.032827, 0.030489,
0.028154, 0.026529, 0.025284, 0.023968]),
rtol=1e-04,
np.array([0.077645, 0.054695, 0.044680, 0.03947, 0.034822,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating these - the new error values are expected to be different, so we should update things where we can.

0.033443, 0.030793, 0.028777, 0.026683, 0.026199]),
rtol=1e-01,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little bit loose as a tolerance, but I guess it's fine given the bootstraps are stochastic.

)
assert_allclose(
ret_dict['forward_and_reverse_energies']['reverse_DGs'].m,
np.array([-47.823839, -47.833107, -47.845866, -47.858173, -47.883887,
-47.915963, -47.93319, -47.939125, -47.949016, -47.960623]),
rtol=1e-04,
)
# results generated using pymbar3 with 1000 bootstrap iterations
assert_allclose(
ret_dict['forward_and_reverse_energies']['reverse_dDGs'].m,
np.array([0.081209, 0.055975, 0.044693, 0.038691, 0.034603, 0.031894,
0.029417, 0.027082, 0.025316, 0.023968]),
rtol=1e-04,
np.array([0.088335, 0.059483, 0.046254, 0.041504, 0.03877,
0.035495, 0.031981, 0.029707, 0.027095, 0.026296]),
rtol=1e-01,
)

def test_plots(self, analyzer, tmpdir):
Expand Down
Loading