-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gce channel errors out when there are no VMs (instances) in Google cloud #381
Comments
While migrating to OpenStack, I lost my development virtual machine. In decisionengine_modules/GCE/transforms/GceFigureOfMerit.py
To
Something like this. But looking at the code again more closely, I am not sure what actually needs to be done when this exception takes place, i.e. when there is no VM. Steve said, "Need to handle the error condition correctly.",
so that the code does not reach this 2 lines below:
Or should I make it such that the code returns some null value right away the first this this exception occurs:
I can not tell which because I do not understand which part of DE codes follows this file (GceFigureOfMerit.py). |
I will have to check.. the right behavior is to return a blank GCE_Figure_of_Merit data frame but |
As I have more understanding of the DE source code structure, I will resume working on this issue now. |
I found this evidence from gpde01:/var/log/decision/gp_Gce.log: 2021-10-27T14:29:34-0500 - root - TaskManager - 22421 - GceFigureOfMerit - ERROR - exception from transform GceFigureOfMerit File "/usr/lib/python3.6/site-packages/decisionengine/framework/taskmanager/TaskManager.py", line 422, in run_transform File "/usr/lib/python3.6/site-packages/decisionengine_modules/GCE/transforms/GceFigureOfMerit.py", line 51, in transform File "/usr/local/lib64/python3.6/site-packages/pandas/core/generic.py", line 5141, in getattr AttributeError: 'DataFrame' object has no attribute 'AvailabilityZone' I also did the following in order to start using Gce channel in my testing VM (fermicloud571): Let me see if I can run gce channel |
Okay, I will start debugging GceFigureOfMerit.py tomorrow. |
Okay, finally.. and concluded that the following change in decisionengine_modules/GCE/transforms/GceFigureOfMerit.py
I tested this change in fermicloud435 (Steve's testing VM) and Next: In the meantime, I will clone the source code, make the changes and push. |
Steve said, "We can get zero VMs in GCE by temporarily killing the squid server. Could you do it Steve? |
I will do it tomorrow morning and let you know when I have done it.
Steve
…________________________________
From: Hyunwoo Kim ***@***.***>
Sent: Tuesday, March 22, 2022 3:04 PM
To: HEPCloud/decisionengine_modules ***@***.***>
Cc: Steven C Timm ***@***.***>; Author ***@***.***>
Subject: Re: [HEPCloud/decisionengine_modules] Gce channel errors out when there are no VMs (instances) in Google cloud (Issue #381)
Steve said, "We can get zero VMs in GCE by temporarily killing the squid server.
and then starting it back up."
Could you do it Steve?
I have not connected to GCP/GCE for a long time.
Let me know when you have actually done it.
THanks!
—
Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_HEPCloud_decisionengine-5Fmodules_issues_381-23issuecomment-2D1075586568&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=HDCuDAyWACMC5QYMiQatbItAugkL3Tc2cPzj5mX2kDkOBqZ_gTqb3kcFhjDEsL1H&s=GyijFwp7Lcx05zovtSkWwX54wlDBhnY-qoPOrX5Bk6U&e=>, or unsubscribe<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGG4SOBCP63Q6BQWEKFCDUDVBIRUBANCNFSM5G3G3N6Q&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=HDCuDAyWACMC5QYMiQatbItAugkL3Tc2cPzj5mX2kDkOBqZ_gTqb3kcFhjDEsL1H&s=zw1ZtcumLgxRSbAZtwKq3RMIj9M_C5HgFLvC373IDu4&e=>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Okay, Steve and I conducted a test today by deleting GCP squid VM by Steve |
We can get zero VMs in GCE by temporarily killing the squid server.
and then starting it back up.
Steve
…________________________________
From: Hyunwoo Kim ***@***.***>
Sent: Tuesday, March 22, 2022 2:37 PM
To: HEPCloud/decisionengine_modules ***@***.***>
Cc: Steven C Timm ***@***.***>; Author ***@***.***>
Subject: Re: [HEPCloud/decisionengine_modules] Gce channel errors out when there are no VMs (instances) in Google cloud (Issue #381)
Okay, finally..
I studied these two files
decisionengine_modules/GCE/transforms/GceFigureOfMerit.py
decisionengine_modules/GCE/sources/GceOccupancy.py
and concluded that the following change in decisionengine_modules/GCE/transforms/GceFigureOfMerit.py
should cover the exception (no GCE VMs running)
def transform(self, data_block):
self.logger.debug("in GceFigureOfMerit transform")
performance = self.GCE_Instance_Performance( data_block)
performance["PricePerformance"] = np.where( performance["PerfTtbarTotal"] > 0, (performance["OnDemandPrice"]/performance["PerfTtbarTotal"]), sys.float_info.max )
factory_entries = self.Factory_Entries_GCE( data_block).fillna(0)
gce_occupancy = self.GCE_Occupancy( data_block).fillna(0)
figures_of_merit = []
for i, row in performance.iterrows():
az = row["AvailabilityZone"]
it = row["InstanceType"]
entry_name = row["EntryName"]
<new code>
try:
occupancy_df = gce_occupancy[((gce_occupancy.AvailabilityZone == az) &
(gce_occupancy.InstanceType == it))]
except:
occupancy = 0
else:
occupancy = float(
occupancy_df["Occupancy"].values[0]) if not occupancy_df.empty else 0
<new code>
<original code>
occupancy_df = gce_occupancy[((gce_occupancy.AvailabilityZone == az) &
(gce_occupancy.InstanceType == it))]
occupancy = float(
occupancy_df["Occupancy"].values[0]) if not occupancy_df.empty else 0
</original code>
max_allowed = max_idle = idle = 0
I tested this change in fermicloud435 (Steve's testing VM) and
at least it does not crash with current input.
Next:
We will have to wait until we have zero VM in GCE and see if all other DE instances' GCE channel crashes
and this instance (fermicloud435) does not.
In the meantime, I will clone the source code, make the changes and push.
—
Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_HEPCloud_decisionengine-5Fmodules_issues_381-23issuecomment-2D1075558293&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=qaL1OBustvEYYWpcUQ5ok-3d4_6aVvrB8vcLmTp8ETWcI9R0hVXDJRVZbTZkCEAZ&s=uDX58iewl-bTfGJlqrthFPkHyFwpKGLo2F-qUZ5R3Ps&e=>, or unsubscribe<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGG4SOEP5P262TL35BDNY53VBIOPHANCNFSM5G3G3N6Q&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=qaL1OBustvEYYWpcUQ5ok-3d4_6aVvrB8vcLmTp8ETWcI9R0hVXDJRVZbTZkCEAZ&s=8nmo8unsU3avxsTAyzIfSDpE5SvwJ1YCutuYpgrGCuM&e=>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
The Gce_Figure_Of_Merit transform errors out when the GCE_Occupancy data frame is a null frame.
This happened for the first time today when GCE was completely empty for once.
Need to handle the error condition correctly.
The text was updated successfully, but these errors were encountered: