-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IBMQExperiment and FreeformDesign/CombinedExperimentDesign quality-of-life updates #379
base: develop
Are you sure you want to change the base?
Conversation
The goal is to add checkpointing, which will be facilitated by making this properly serializable.
Avoid pickling and don't write all_circuits_needing_data. Should cut on-disk space in half, and circuits can be reinitialized from keys (should also save on circ construction time)
Checkpointing is facilitated by moving this from a dict to a class that inherits from TreeNode and serializes more "pyGSTi-like". All major IMBQExperiment stages (transpile, submit, retrieve results) are accompanied by internal writes of the ibmqexperiment directory, which can be loaded from as checkpoints.
…rimentDesign The thought process is that in many cases, the all_circuits_needing_data for CombinedExperimentDesign is simply a union of subdesign lists. In this case, a user can opt out of saving this and just regenerate it on serialization (thereby saving 2x disk space and save/load time).
@sserita, I got a ping about reviewing this due to being a code owner. From what I can see this doesn't touch code that I have prior experience with. Maybe a change in my team in the CODEOWNERS file is in order? |
pygsti/extras/ibmq/ibmqcore.py
Outdated
self.auxfile_types['batch_results'] = 'pickle' | ||
|
||
if not disable_checkpointing: | ||
if self.checkpoint_path is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're enabling checkpointing by default I think it would be worthwhile implementing a default path here for when the checkpoint_path kwarg isn't set by a user. This something we currently do in the GST protocol checkpointing code, fwiw, in case referencing how it is handled in that part of the code would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that is fair. One reason I did it this way is that the "checkpoint" here is actually a completely valid IBMQExperiment object that can be loaded using from_dir(), so in my experience so far it has been nice to have it match the dirname for write() so that the user doesn't have to change how they load from checkpoint vs a .write()... but it's a good point that a default path could simplify this for users. I'll think about how I want to incorporate that here.
pygsti/protocols/protocol.py
Outdated
@@ -1268,7 +1313,7 @@ def from_edesign(cls, edesign, name): | |||
raise ValueError("Cannot convert a %s to a %s!" % (str(type(edesign)), str(cls))) | |||
|
|||
def __init__(self, sub_designs, all_circuits=None, qubit_labels=None, sub_design_dirs=None, | |||
interleave=False): | |||
interleave=False, skip_writing_all_circuits=False): | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the general idea of having the option to skip writing all_circuits
is alright, but having this be something that is specified as an attribute of a class instance during the constructor stage feels suboptimal. This feels like something that would fit better as an optional flag passed into the write
method. Of course, doing so would mean you'd need to implement an overriding version of write
specific to CombinedExperimentDesign
. Then again it looks like this is the tactic that was taken for the FreeformDesign
, so maybe that isn't so bad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually tried that first, but it got a little nasty because a) it was doing a lot of manipulation of auxfile_types on the fly, and b) I really want to trigger that option as part of ProtocolData, IBMQExperiment, etc and that would mean plumbing up an option that was only used for CombinedExperimentDesign through everything that could possibly hold an edesign... Any suggestions on avoiding that plumbing are welcome, the class member was my first pass at it but I agree it's pretty awkward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'm not sure why I didn't think of this, but I can just check whether all_circuits_needing_data matches the set of all subdesigns... Duh. (Idea came from your assert comment below.)
Great work, @sserita! I posted a couple of code specific comments above, but had a couple of broader comments to go along with them. I think the main thing that this PR is missing is some unit tests. As far as I can tell we currently have no coverage of the IBMQExperiment in any of our test modules. More and more users are reliant on this code so this would be a good opportunity to at least start to fill that gap by adding some tests for the new checkpointing functionality. Thoughts on this? We obviously don't want to have the runners making a bunch of API calls to IBM's servers as part of testing, but I took a look at the code and at first glance it looks like the Relatedly, while we have some coverage on serialization for experiment designs, it looks like presently we don't have have anything for FreeformDesign (we actually only have one test that touched FreeformDesign period, and that is related to a qubit label mapping method), so this could be a good thing to add. Ditto with the new CombinedExperimentDesign serialization option (for skipping |
pygsti/protocols/protocol.py
Outdated
|
||
# Don't save all_circuits_needing_data, it's redundant with aux_info keys | ||
self.auxfile_types['all_circuits_needing_data'] = 'reset' | ||
# Currently not jsonable, but will be fixed in write() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to add an assertion or other check here that check this is true? That is, that self.aux_info.keys()
is equivalent to all_circuits_needing_data
? This is definitely true initially upon construction, but it also looks like it is possible for these to become out of sync during the course of standard usage. For example, the base ExperimentDesign
class has the method truncate_to_circuits
which is a wrapper around _truncate_to_circuits_inplace
. Looking at the implementation of _truncate_to_circuits_inplace
this operates directly on all_circuits_needing_data
and we never re-sync this with aux_info
. There are a couple other methods where this happens as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think probably other things would break if they got out of sync too, but an assert here to check is very easy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First time using the reviewer interface, so didn't realize my other comments probably should have gone in here, sorry!
See above comments for more.
Also includes some QOL changes for debugging tests with VSCode and pytest.
Still needs some debugging, already fixed a few minor issues.
FYI, this is being delayed until 0.9.13 so that I can also incorporate the new IBMQ session workflow, which will eventually become the default way to run on IBMQ. |
Still not fully tested due to CX/ECR conversion.
Also used that to save transpilation in IBMQExperiment.
Pathos is needed to use dill in multiprocessing, which is needed because some Qiskit obj doesn't like to be pickled.
This PR makes several quality-of-life improvements for
IBMQExperiment
, including checkpointing for IBMQ experiments mentioned in #327, as well asFreeformDesign
andCombinedExperimentDesign
serialization.IBMQExperiment
updates:IBMQExperiment
class is changed from a dict to a full class that inherits fromTreeNode
, making it commensurate with serialization of objects such asExperimentDesign
andProtocolData
.IBMQExperiment
objects. The API follows GST checkpointing wherecheckpoint_path
anddisable_checkpointing
can be passed to the constructor (checkpointing is on by default). Ifcheckpoint_path
is provided, thentranspile()
,submit()
, orretrieve_results
will update the on-diskibmqexperiment
when possible.bson
is installed, we can use thejson_util
hook to handle objects such asdatetime.datetime
which are present insubmit_time_calibration_data
andbatch_results
from the IBMQ server. Depending on the availability ofbson
, we can serialize to all text/json files, but fall back to pickle still.FreeformDesign
updates:aux_info
member ofFreeformDesign
is written as JSON instead of pickle. To do this, first we cast the Circuit keys as strings and then write the now-JSONable dictionary to file. We convert back to Circuits on deserialization.all_circuits_needing_data
forFreeformDesign
. This field should match the keys ofaux_info
, and with the size of these designs, it saves quite a lot of space to not double save them (almost 1 GB for some of my SVB use cases!).CombinedExperimentDesign
updates:skip_writing_all_circuits
flag. The intention is that in cases whereall_circuits_needing_data
is simply the union of subdesigns, this can be regenerated on the fly. This saves space on disk and cuts the write/read times for the largest thing being written (cutting my 1 GB down to ~500 MB for my SVB use cases!).I've done my best to make it backwards-compatible; for example, we can still load old pickle-style
IBMQExperiment
directories into the new object. However, theIBMQExperiment
workflow has to change a little to enable checkpointing - a minor pain point that I hope is well worth it for most users.Remaining Tasks
After a round of feedback, here's the current list of additional tasks: