-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All sensors unavailable for one minute, same time each day. #126
Comments
Let me explain: Envoy_2_Core_Custom_Test was my attempt to propose @briancmpbll 's custom integration with the recent additions by @posixx as the successor of the current HA Core that doesn't support the token mechanism. In that process I had to apply some corrections to the code to pass pretty tight HA Core tests on code layout and components used. As result a minor modification to the code and communications was applied. These may play role here, not sure. I ended the proposal effort with the announcement of the 2023.09 HA release will (most probably) contain an updated Enphase Envoy based on a different enphase communication library as used before and will support the new firmware (but drop support for very old firmware with only html pages). I plan to offer functional changes in the proposal effort to @briancmpbll version as well if deemed beneficial and archive/make read-only that Envoy_2_Core_Custom_Test in the future as I see no benefit in having a second one available. So your issues are probably present in this custom integration current as well, but to be fair we should verify that using @briancmpbll code and not the one on it's way to become new HA Core and is in dead end street now. Having said that, the new HA Core Enphase Envoy using that new communication library may do better, to be seen. |
This triggered me remembering I've seen this before in another custom integration by @vincentwolsink using installer accounts. They increased timeout on getting data from Envoy to 60 seconds. You can try this by going into the /config/custom_components/enphase/envoy folder and change envoy_reader.py. Look for '_async_fetch_with_retry' and then couple of lines down for 'timeout=30' and change it to timeout=60 or so. Then cycle HA. |
@Steve2017 as for Riemann integration you mentioned, mine is doing the same at a late evening drop. It has to wait for new timestamp/value before it picks up again when solar starts producing again. It's still working ok, but it shows gaps because of it. It's how integration works, root issue is no new timestamp/value after the drop of communication. Potential solution, larger timeout or retry on timeout!? |
Would that be the v 0.1.3 version here? I'll try the timeout change. There are four references to timeout=30. Do I change all four or just the first?
and GET attempt is where the error appears in the log???? |
That's where I was hoping the use of 'available' might work - especially if it replaced the earlier 0kw data with 1w for that one attempt. Didn't help. Restarting the Reimann Sum sensor might help, because a HA restart puts the Reimann Sum sensor back to its pre-dropout state. |
@Steve2017 it's the one below
where de debug log for attempt is indeed |
Seen this one? |
No - but that is the same issue and I have seen other attempts to fix it. While the drop-out causes the initial problem, the ongoing problem seems to lie with the Reimann Sum and as Tom_I indicates there were attempts to fix it...unsuccessfully. Tom has made some other suggestions, but none worked for me. I might have to wait until the integration is back in the official version, then try raising an issue in the core. I'm not sure how to implement the last suggestion in that thread, but no-one is saying it works. |
Maybe something like
|
I get "invalid key: " when I try to use this format {{ sensor.envoy_122302045041_inverter_122220009431 | float(default=0) }} The message is: |
You can try such an expression in developers tools / template. Could be some {} are wrongly positioned. That is not my stronghold for sure. |
I tried this in my templates test section
and it yielded the value |
And found this calc to make kwh for wh
so run your input signal through this and then integrate? |
I also get a value in my test template section, but when I use that format as the source for the Reimann Sum sensor, it throws its hands up. I'll wait until after 11pm (10 minutes away) to see if the longer timeout works. Given the 'last_seven_days_energy_production' sensor also drops out, I am guessing the Reimann Sum will still have issues there, but worth a try. |
The longer timeout did not make any difference. It still drops out. I forgot to turn on debug which is not helpful. Why does the integration use the ID and password method to obtain a token? Why not just enter a token directly in the configuration/setup and skip those steps? |
Hi, As I could see, HA will announce a new release of Official Enphase integration? Could you confirm? I have the same disconnection problems, but it not make sense to go deeper if we will have a need integration Could you confirm? |
Yes, not sure which version it will be. All the PR for Enphase changes in core are here (added to development version and here (pending) This is the most important one: 'Refactor enphase_envoy to use pyenphase library' |
to allow for an automatic token refresh if it's expired. |
Too bad. Only option I can think of is adding a retry on timeout. Considering the next collection cycle 1 minute later works (right?) just retrying may work as well. |
Too bad there is not an option to just use the token because it has a 12-month expiry. I understand the reasoning behind regular token updates/checks, but the Enphase website is very slow and non-responsive at times. I wonder if that might be related to the issue. Having a choice between ID/Password and Token would be nice. There is already a routine to check whether the token is still valid. Even better would be having Reimann Sum fixed by the HA dev team. |
The envoy_reader_py looks to already have retries at line 233 in its code???
My debug log shows failure after one attempt, where the code indicates retries occur after a 401, so maybe I am not even getting to a 401. |
I doubt it. Enphase website is only contacted when HA starts. Checking if the token is expired is using the date embedded in the token and comparing it to current time. If current time is beyond token expiration time then and only then it reaches out to Enphase website to get a new token. The 2.10.0 dev-2-core one we tested for some time even stores the token locally so does not need to contact Enphase website when HA starts, but that's not when your issue happens and the issue happened with that one as well. |
Yes, but that's only for the 401 to trigger cookies/token check. You are getting a timeout that takes it to the bottom of the where a test is present for httpx.Transporterror. That should include timeout as well but that doesn't seem to function.
You can try removing httpx.Transport error from that line (leave the :) that should let it retry 3 times on any error and see it that captures the timeout.
|
Still no joy. It might be time to concentrate on how to make the Reimann Sum sensor become available again. As a test I set one up through the Helper GUI (so has unique ID) and it let me switch it back from unavailable to available. Thanks for your patience. I think mine has run out. There is still something fundamentally wrong with those sensors. When solar production was zero, I created a GUI one to measure solar energy export and instead of returning 0kWh, it returned 'unknown'. From programming language logic this might make sense, but from real world logic we knew that solar power = 0.... therefore solar energy export = 0. It was not "unknown"! |
Understand @Steve2017 , though I feel I'm not done with it yet. There are indeed 2 separate issues, the Riemann and the communication issue. As for your attempt with changed error handling, did you happen to have some log around that can help me how to really implement a retry at timeout? As for the returned unknown, I'm not sure I fully comprehend what you did. |
No - it was late at night and I forgot to turn on debug again. I'll try tonight.
Just to make sure there wasn't something different in the way Reimann Sum sensors were processed in the Helper GUI, I created a new energy sensor that duplicated the ones I had made in YAML in config.yaml. (I think the only real difference is the GUI assigns a unique ID) At the time I created the new GUI sensor there was no solar production, so solar_export_power was 0.0kw. This meant the energy sensor should have been showing 0.0kWh exported. I gather a Reimann Sum sensor will normally show 'unknown' until the first data reading is received, then instead of "unknown", it will show 0.12kWh (or whatever the calculation returns). Instead, it showed 'unknown' up until the Envoy Integration sensors went off-line for a minute. When the Envoy source sensors came back on line, it changed to 'unavailable', just like the YAML versions. I can only assume the power sensor had been sending "no data" because there had been no new readings for several hours - and so the Reimann Sum sensor recorded that as unknown. BTW: No dropout early this morning - only the regular 11pm event last night. |
I fully understand you kinda had it with this, so no need to stay up late to just get it.
Thanks for explaining, I think I know what is happening:
Hope it makes sense. |
I had been hoping using the available function to return 1.0 or 2.0 when it drops out would help if it then reverted to zero on the next report. That was not the case. I'll turn debug on now. It'll be a large file, but at least it'll be on. |
You can set your watch by it. All is good, then GET attempt starts at 23:00:01 with 3 attempts. Times out 30 seconds later. 20230810 Debug Log Enphase Dev2.txt The Reimann Sum sensor for solar export and child sensors are now unavailable. |
Thanks for providing this @Steve2017, I realize it was late (or already early) over there.
The error catching worked as it looped 3 times. But as you can see from timing attempt 2 and 3 returned right away with time-out. Think we need to do some additional action there to start like a real new connection. Need to give that a bit of taught. Since it happens like clockwork I would think it's something in the Envoy going on. But it has an ugly effect downstream. Al tough I don't think the resulting numbers are wrong, it's that seeing all those gaps makes one think there some other problem. |
|
Thanks @Steve2017, well it confirms the change in timing will allow for actual retry. Too bad the retry failed as well. It was done with the 2 attempts in 31 seconds, but neither the first attempt nor the retry spit out the actual error. Both took the allotted 15 secs. In between I tried closing the connection, but that clearly did fix it. Next try will be @cddu33 recommended 30 sec wait in between 2 attempts. |
Just released 0.0.14+2 in another attempt. Maybe went a little overboard by making all timing configurable... By default this version will use a timeout of 12 seconds, then wait 30 seconds and try once more. To fine tune to what may work use the settings option for the envoy. Be aware that the bottom item, Overall timeout, is the time after which the whole data collection is cancelled. I,e, if you increase the retry count or wait time between tries and exceed the overall timeout all data will be considered timed-out by this timer. I simulated a timeout by unplugging my envoy from the network using an 8 sec timeout and 30 sec wait time, it's not an exact test for the problem but it will show what debug logs will show:
|
Installed - will leave the config at the default: Time between entity updates [s]. |
Still off-line for 60 seconds. Riemann Sum sensors now unavailable. Lost cause???
|
Yes, I'm out of ideas now. Whatever we try within a single collection cycle seems fruitless. Since it will get data on the next cycle the effect is limited but ugly with the Riemann until new values come in. I'm open to any good suggestion, but feel we exhausted our options. |
Close it as unresolved. |
Any new ideas to try @Steve2017 ? What you might try is increase the wait time between tries. Make sure to increase the overall time accordingly. |
I can try, but I keep wondering if we are trying to solve a problem that is programmed into the Envoy. (When I say we I really mean you and others because my programming skills = 0.0) Until we see whether it is possible for this integration to take data from "ivp/meters/readings" that thought will remain. I just wonder whether Enphase resets the processing unit compiling the "/production" data each night. It might have always done this, but this only became apparent when the HA developers changed something in the Riemann Sum code a few months ago? |
I think you are right with your analysis. I was just checking if you had an Ah-moment as you re-opened the issue. For me no need to retry but if you had courage again I wanted to suggest some options. I hope to add the ivp/meters/readings soon. Plan to release 0.0.15 today or tomorrow with a fix for battery issue in 0.0.14 and configurable timing (need it for slow Envoy as well, I probably change the defaults to more generic) and probably one other item. Then on with 0.0.16 with the ivp meters by @cddu33, hope to have test versions in a few days. |
No ah-ha. Just wanted to make sure people could see it is not a new problem, and that it is not critical. Their system will still work, but Riemann Sum sensors (& the child sensors) will go off-line until fresh data arrives. |
@Steve2017, meters/readings is out for test. In HACS install from https://github.com/catsmanac/home_assistant_custom_envoy, enable beta versions and select V0.0.14-ImportExportMeters. Readme there is updated with new entities. this version also has configurable timeouts set to more relaxed defaults, but if you specified these it will use those found in your config. |
Hi @catsmanac I'm assuming from one of the other posts I should switch over to: I might have a workaround on the 23:00 dropouts. Two days ago, I changed the scan period from 60 seconds to 150 seconds. I have just checked the history for the past two nights and the longer scan period seems to overcome the dropouts on all Envoy-supplied sensors. Instead of the dropout on production (which caused the Riemann Sum issue), it reverted to showing phantom production. What I don't know is whether this avoids the Riemann Sum sensors issue, because I have disabled them. I have used the REST sensors with a utility meter sensor. I have just re-enabled a Rieman Sum test sensor so will see tonight. Those two peaks last for about 2 minutes so a shorter scan period might also work. Of course, the downside of the 150 second interval is you don't get as up-to-date net energy data. |
I might have spoken too soon. While I checked all the Envoy sourced sensors, the child sensors of those entities are still showing a dropout. The source sensors look okay, but not the ones downstream. EDIT: Update at 23:25. The Riemann Sum sensor relying on solar power data did not become unavailable - as it would have in the past.. So a longer scan period overcomes the problem of dropouts and Riemann Sum. I’ll try with a 120 scan interval and see how that goes tomorrow. |
Hi @Steve2017, it remains a puzzle what really is going on. The blips make no sense either. The 150 sec could avoid that danger minute completely, but then why the blip. And the first blip seems at 23:04 why later than the usual danger minute. Are those values actually coming from the Envoy? That would require a debug log to inspect. IF the scan interval is 150, then why are the blips 2 min wide and 1 min in between? The v0.0.17 version has the import/export added from /ivp/meteres/readings. But as far as timimg is concerned it has the same options as v0.0.15 and v0.0.16. Be aware that HA Core will be releasing their updated version in September and it uses a rewritten communication that may or may not help with this too. The V0.0.17 import export is not part of HA core yet and when it gets added in a future version it may differ from this one and may cause history import/export to start fresh on new entities/variables. So depending how important that continuation is you may hold back on installing. On the other getting some data to see what it can bring can be considered as better then no data at all. Just that you are aware of this. |
I’m not sure if your question is rhetorical, but that is one of the native sensors in the integration. I’ll run debug tomorrow. I was surprised at the two. I got a single blip from the reading half an hour ago. The 1-watt equates with the normal power drain I see on the Envoy home page after sunset (hence phantom production) |
Back to the two minute scan and I can see a dropout and the Riemann Sum sensor is unavailable.
I'm giving up on this one. |
Fully understand, this remains mind-blowing... . Only thing currently I can think of to determine if it's the overall Envoy or just the production page is to swap the order of the collection in envoy_reader.py from:
to:
and see if it bumps on ensemble inventory or still on production. But I understand it if abandon all attempts for now, I have given up too before. |
@Steve2017 with installing v0.0.18 you now can opt to not use the production page anymore using the configure option on the envoy integration page and reload the envoy. It will make today's and last 7 days numbers unavailable, but might solve the daily 11 PM dropout by avoiding the production page. |
Tks. I have v 0.0.18 installed using default settings - and I still got the 23:00 dropout. I’ve ticked the check-box and reloaded so I’ll see how it goes tonight. If I recall V 0.0.17 had the daily export and import data? (Called “index”) I liked that. |
In v0.0.18 the import and export is renamed to 'Lifetime Net Energy Consumption' and 'Lifetime Net Energy Production'. You can use these in the energy dashboard for grid import and export. |
Tks They are not the most intuitive labels (at least for me). Lifetime Net Energy Production is not producing any data for me - it is flat-lined at zero. I am getting data for the Lifetime Net Energy Consumption so it will work with a utility meter. I’ve tried with the “do Not Use Production” checkbox ticked, but I found the loss of today’s consumption and production is inconvenient, so I’ll revert to using “Production”. That means the above sensors will no be available, so not an issue for me. |
Labels names are indeed not always intuitive the same way for everyone I've learned with this one. HA allows you to assign your own labels that will show. The lifetime net energy production and consumption come from the Consumption CT in Solar included mode. If you don't have it installed it will be flat I guess, or even bogus data. If you have a consumption CT installed then we may have an issue unless all solar production is consumed directly by the house. The net production and consumption number are available also when |
There's more reports of net production being zero, seems like newer firmware behaves different as one of the original author. |
Right, that's correct. The issue was reported for a metered Envoy with a production CT and a consumption CT with solar included. The ivp/meters/readings page reported as before, but the ivp/meters/reports page changed it behavior over firmware upgrade. |
I've updated the code to get the net-production and net-consumption data from ivp/meters/readings so they show both as positive numbers to use with energy dashboard and not as negative net consumption. #164. No need to use that |
I mentioned this in another issue and it has been suggested it should be separated.
The issue is that all sensors created by the integration become unavailable for a short period - usually a minute - at the same time each day. In my case, it is 23:00 local time (13:00 GMT). On some days I also experience a dropout at approximately 06:35, also for a very short period.
I am using v 0.2.10 of Enphase Envoy Envoy_2_Core_Custom_Test which I gather is being folded into this repo.
(Happened with earlier versions too)
This is on Envoy S Metered with the Software Version: D7.6.175 (f79c8d)
HA 2023.8.xx (was happening in July too)
Although the times might vary a little, I am not the only one reporting this behaviour:
#122 (comment)
#122 (comment)
#122 (comment)
My dropouts are similar to the one in the first link above. Magnified view:
All sensors, including micro-invertors demonstrate the same behaviour.
This is the log from an occurrence a few days ago, showing the system attempts a GET at 23:00:28 and times out about 30 seconds later. A new attempt then works:
From those production and consumption sensors in the graph, I use a template sensor to calculate excess production (ie power available for export). That too drops out (with a slight lag) for a short period.
The issue for me is not so much the original sensor values and the template sensor, because they come back after a minute. The issue is the template sensor is the source for a Reimann Sum sensor calculating export energy. It also goes off-line, but does not return until solar production resumes at about 06:40. (Bear in mind this is at 11pm - so zero solar production)
Downstream from there I have a number of child sensors to give me data on daily energy exports, export earnings, net energy costs etc, etc. All of those become unavailable after the Reimann Sum sensor is unavailable for one minute. They return when solar production resumes.
The dropouts at 06:35 are happening prior to solar production resuming, so I see nothing different there.
I have a similar set-up for measuring grid import power - just with sensors reversed in the formula. It experiences a similar Template sensor and Reimann Sum dropout, but the Reimann Sum sensor comes back on - I am assuming because the template sensor is returning new and changing values.
I have tried a number of suggested solutions for the downstream issue, such as removing state_class and device_class entries from the Template Sensor YAML or using the "available" function in the sensor YAML. None of those things worked.
To try to isolate something, I set up the REST/RESTful sensor integration for the Envoy. It works without dropout. It uses the token not the Enphase User-Name and Password method to gain access to the Envoy.
The text was updated successfully, but these errors were encountered: