-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restarted host is not picking up newly-deployed assemblies #2636
Comments
@fabiocav -- FYI, this is what we were chatting about earlier. |
@brettsam This issue is relevant to V1 only, right? |
Yes -- v2 exits the process itself rather than signalling for an environment shutdown. |
I'm seeing more and more cases of this -- another just happened where the Health Monitor is trying to shut down the host over-and-over. We see the log that "Environment shutdown has been triggered," but the Pid never changes. Repeat forever. |
I do not think the process ID is supposed to change in this case - the app domain should recycle and that should be sufficient. Now we're supposed to log the app domain during host start. Can you check the log to see if the app domain was changing after each HostingEnvironment.Shutdown ? cc @pragnagopa |
Yeah, you're right. I see the AppDomain Id changing in this case, yet the site still fails the very first Host Health check, before any functions are executed. I think this may be a separate issue, then. |
Can you send @mathewc a kusto query so that he can take a look. |
I'm re-titling this as I misunderstood how that API worked. It looks like the shutdown is happening -- but rather than recycling the process, it recycles the AppDomain. But that doesn't explain why we see apps occasionally using old assemblies after an AppDomain recycle. We've had one internal report of this as well, and there are likely others that just don't realize it. There's something fishy going on there. |
I have a suspicion that this is due to VSTS default usage of the old kudu ZIP api. Things to check:
|
Hi @brettsam @paulbatum thanks for your help on this one so far. Was there a resolution? I've seen something similar again today, this time 2 different versions of dlls running at the same time for quite a while after a deployment. I've raised #3119 for it. Much appreciated if you could take a look! |
@mathewc / @fabiocav -- I think we've discussed this in the past (#3119 is another instance). In this case, it was a Web Deploy. I can see the host restarts and recycles the AppDomain, yet the previous file version is loaded. We can't see that in the logs, but the customer is reporting it. Could there be some file caching issue upon restart? |
Going through some old threads and noticed this was waiting for a response. Is this still an issue? |
FYI: We have a customer seeing a similar issue here: Azure/azure-functions-durable-extension#512 |
Beyond the issue described in Azure/azure-functions-durable-extension#512, I think I saw it happen again today in a slightly different way. I deployed updated code via web deploy to a pre-compiled c# function app. The deployment output showed the expected dll & pdb update being pushed and a new json file being uploaded for an added function. However, while trying to test out the new functionality and looking at some trace output in AI it appeared that my added durable function activities were not being called at all. In the Azure portal the added function name showed up in the list but when I selected it I got an error about the function name being invalid (I don't have the exact verbage anymore) but based on my searching it appeared to be an error when the function could not be found in the dll. In this case restarting the app service didn't do the trick. What I did was redeploy the exact same binaries again, via web deploy again the same way, and the output here showed as I expected with no updated or new files but it did the synchronizing step. After this, my new durable function activities were being called and working like expected and the function was selectable without error in the Azure portal. I'd be happy to provide any details or ID values regarding the environment this happened in if it would be helpful in debugging. function invocation from environment (West US): First deployment time: 19 Nov 13:25 PST Update
|
@ghills You mentioned that issuing a restart did not resolve this, but when did you issue the restart? The reason I ask is that I'm looking at our logs and I can't see your process ID changing during this sequence, and a restart should absolutely result in a new process ID. In general I'm having trouble matching the timestamps in my logs up with your timeline. What I see roughly (all times UTC): 2018-11-19 17:57:48.8073066 - The app starts seeing filesystem updates and starts shutting down the current host instance and initializing a new one. 2018-11-19 17:57:51.4153313 - New host starts executing functions. 2018-11-19 17:57:51.5755555 - New host sees another file change come in, triggers same process as above 2018-11-19 17:57:52.4594099 - Second new host starts 2018-11-19 17:57:52.8923307 - this host hits an error
2018-11-19 17:58:23.5163803 - this host emits the error you noted above:
The same host keeps running executing your other functions but not your new one. The fact that two new hosts started so close together is likely to represent a problem. It looks like your deployments are slow enough that you are getting host restarts that land in a partial state. I still can't explain why we failed to load your new function after the last host restart. I suspect there is a very subtle race condition here. My best recommendation right now is to move to run from package deployments. VS has native support for these deployments now and they basically eliminate this type of problem. |
@paulbatum Times in UTC second deployment (no changes to files) The first deployment when the binaries were updated to the version with the new function lines up with what you saw in the logs. Regardless your explanation makes sense given the timeline you provided. Deploying this project has averaged 20-30 seconds to do the deployment so that leaves a large enough window to explain the second new host starting up while it was in progress. I tried restarting the app from the portal well after I deployed the updated binaries the first time but before I re-deployed the binaries in the second one. Something like 20:00 UTC probably. Now that I think about it though, this was after I had hit the error in the portal trying to select the added function and after that error parts of the portal seemed to be failing to load for me so I wonder if I actually triggered the restart to happen or not. Either way it sounds like I will be looking at moving away from web deploy as we've had this happen a couple of times now and I would prefer to avoid whatever issue is causing the behavior. I just need to figure out how to integrate our CD workflow to run from package! Thanks! And thank you to @cgillum for getting me redirected here as I didn't know initially where the issue was stemming from. |
Yeah I don't think your restart went through. I can't find a record of it in our logs. I think this deployment shows that our 500ms "wait for more files" logic needs to be relaxed slightly. I filed a separate issue for that: |
@fabiocav I think the only recent activity on this issue is a different case to what is being described (I confirmed that no restart went through). Perhaps we should close this one? |
I apologize for stirring up an old issue, but I did come across this the last few days and was struggling to figure out why the code in my updated assemblies was not executing. Finally decided to do a manual restart and, voila, after a brief cold start the next runs of my durable function executed with the new code. I'm using the Azure Functions deploy task in Azure DevOps, and below is the output from the last execution:
Is there any additional information I can provide to help in the diagnosis? Or perhaps there is something blatantly obvious in those logs that would explain why it may be occurring? |
As an update to my last comment, this seems to be happening pretty consistently on our end (function deployment requires a manual restart in order for new code to run). Just a heads up that we will probably be submitting a formal support request for this today to see if we can get some traction as perhaps it's something we are doing wrong since no one else appears to be having this issue. |
This may be something I have experienced as well. With Pulumi's addition of .NET Core support to their infrastructure-as-code product I have been experimenting with some of their samples, one of which deploys a basic Azure Functions app... https://github.com/pulumi/examples/tree/master/azure-cs-functions-raw
|
@jimmcslim do you know how |
I'm trying to get a more detailed set of logs out of it, but I believe the steps it follows are:
So I guess I kind of thought that step 3 would be the prompt for Azure Functions to reload the new code. And it seems like sometimes it actually does; if I madly reload the page in the browser I see the new response come through... other times it seems to see the new code for a few requests and then goes back to the old code, in which case I do need to restart the app. I appreciate that putting an explicit restart step in is probably for the best - but just trying to understand if there is a reasonable suggestion for the behaviour I was seeing. So that said, what API should I be calling to trigger the restart? I can click the button in the portal but obviously that's not infrastructure-as-code. |
@jimmcslim could you please open a new issue with those details? It would be good to track this separately as it is targeting V2 and the root cause is not related to the what this issue is originally tracking. |
Actually I had realised that the Pulumi sample creates a functions app with the v3 preview host. I can remove all the Pulumi stuff from an issue.; ultimately I would be publishing a new version of an existing functions app to a zip in Azure Storage, updating the WEBSITE_RUN_FROM_PACKAGE configuration setting of the existing functions app to point at the URL of the new zip, and expecting the host to reload the code from the new URL.... without having to do an explicit restart? But maybe that is not supported behaviour? |
sorry I wasn't clear with my previous answer. If pulumi is overwriting an existing zip then an explicit restart is required. If its creating a new zip, then as you said, updating the appsettings to point to that new zip should be sufficient. Anyway as fabio mentioned, I think you should be discussing this behavior in a different issue to this one. |
Hi, we are experience the exact same thing with DevOps releases. Did you find/get a way to resolve this issue? |
@jcd79 Unfortunately no. It stopped occurring for a time, but the issue has resurfaced for us and we are struggling with it again. I haven’t yet had time to open a new issue as requested above by @fabiocav since the original issue in this thread was not having anything to do with V2. If you have an opportunity to provide the details of what you’re seeing in a new issue I can have some of my team members gather details as well. |
Over the last 2 years, I've had several instances of a similar issue where deployed files were actually not being deployed even though VS publish was successful. For me, the solution was to restart the app. The worst thing happened only recently. I made several changes in a single publish and some were deployed while others were not. |
@petro2050 I don't know what the underlying cause was of your partial deployment, but the possibility of this can be avoided completely by using run-from-package: |
My team has seen this occasionally over the last few months and restarting the app usually fixes it. We're using Azure Functions v2, WEBSITE_RUN_FROM_PACKAGE is set to 1, and we're using the Azure DevOps task Azure Functions App Deploy. The ADO Task deployment is set to "Autodetect" and the logs show it's using Zip Deploy and succeeding. We're investigating an issue today that started last night and is behaving like an old version of our code is running but I'm having trouble verifying that's the case. Restarting the app did not fix it. I'm trying to figure out how to examine the content of the package that is running but haven't figured out how to get a look at yet for more information. |
I figured out how to download the bits from the server and verified it did have our latest by comparing timestamps on the dlls. They lined up with our latest deployment from this morning. |
i have seen this issue in my functions, where the running code was different from the "successfully" devops deployed zip, |
@paulbatum I'm on V3 and already running from package. As others mentioned, there are occasional differences between what is supposedly deployed and what is actually deployed. I'll be going live this year and worry about this biting me in the future, especially considering multiple active instances. The process to deploy changes to every worker has to be robust. |
@petro2050 Can you give me more details about what you saw when you said "I made several changes in a single publish and some were deployed while others were not." If you are running from a package, you might have had some instances running the older package, and some instances running the newer package, but I'm not aware of any possible condition where only some of the changes in your new package could be active. If you give me an execution ID, region and UTC timestamp, I should be able to determine which package that execution was running on. |
@paulbatum How can I know how many active workers are running in a given time? I normally look at Kudu's Process Explorer tab. Since I'm still in DEV, I only see one active worker (which is fine). I have yet to see two or more. What happened with me is that I have a common helper class that saves records to an Azure Table. I changed the partition key logic to save a That said, I don't have the problem anymore, and I'm not sure if the logs are there to be useful for you - not to mention I did not save the execution ID and other information at the time. You might be right in that another instance was running the old package, but as I said before, I'm not sure if it's possible for me to be assigned multiple workers in DEV with little workload. Can you comment on this? |
@petro2050 Thanks for this additional info. It is possible for multiple workers to be assigned, even when load is low. This sometimes happens when we are running a platform upgrade. We start running your app on another machine so that we can unload it from the existing machine, and we do this with overlap so there is no downtime. I think what was happening here is as you said, an older version of the app running in parallel with the new version. The fact that even after waiting 10 minutes, the old version was still running and you had to do a restart is suspicious. It sounds like something did not work right. But this is different to the problem I was worried about that can happen very occasionally with deployments that are not based on a package, where the app picks up only some of the file changes. |
I see this issue is 4 years old by now, but it still affects me today (runtime version 4.34.2.2). I deployed a new version of the app via the Azure Functions VS Code extension (reports deployment completed). Then I restarted the app from the Azure Portal. Even after that, when testing the function via "Test/Run" in the Azure portal, the old code remained active. Only after 'some time' the request went to a different worker (according to worker id shown in the test/run logs in the Azure portal) and the new code took effect. What is the recommended way to check whether a deployment of an Azure function has actually taken effect for users of the function? |
This just cost us a full day, not understanding why our code changes were not working in the function while deployment was successful. What solved the issue at our end was to use a different GitHub Actions deployment, this one: https://learn.microsoft.com/en-us/azure/azure-functions/functions-how-to-github-actions?tabs=linux%2Cpython&pivots=method-template#example-workflow-configuration-file This deployment yaml is different then the one which is generated through deployment center when you link your GitHub repo. |
Spawning this from #1690, as it is a separate issue.
While investigating #1690, I noticed that a customer was seeing ~20 simultaneous requests to shutdown the host, yet the process never recycled. The newly deployed assembly was never loaded so the site had to be restarted to pick up the changes.
Details on finding site name:
And then for seeing the full timeframe of this occurrence:
You'll see that an environment restart was requested around 11:05, yet the process id never changed. That didn't happen until the next request around 12:05, at which point the host shuts down and the process id changes.
@simonness, FYI
The text was updated successfully, but these errors were encountered: