Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting ERR_EMPTY_RESPONSE on Chrome when saving settings #5229

Open
fmuntean opened this issue Jan 20, 2025 · 38 comments
Open

Getting ERR_EMPTY_RESPONSE on Chrome when saving settings #5229

fmuntean opened this issue Jan 20, 2025 · 38 comments

Comments

@fmuntean
Copy link
Contributor

Describe the bug
Updated the ESP easy to 2024-12-29 and now I can't use Chrome to configure anymore.
Each time I save I get ERR_EMPTY_RESPONSE and the update is not saved.
The logs do not show any errors either.

However using the Firefox to configure does work.

To Reproduce
Steps to reproduce the behavior:

  1. Update firmware to mega-20241222 tag
  2. Use Chrome to configure the board
  3. Saving settings in most of the pages results in ERR_EMPTY_RESPONSE

Expected behavior

I would like to be able to use Chrome Browser to configure my boards.

Used platform (please complete the following information):

  • ESP type: ESP8266
  • Build version: self built, mega-20241222
  • Build set: custom

Platform Specifics (when applicable, please complete the following information):

  • Flash size: 4M
  • Brand/Model: ESP12F
  • Browser + OS: Windows 10/Chrome
@tonhuisman
Copy link
Contributor

What (exact) version of Chrome do you have installed?

Can you also try with Chrome Develop and/or Chrome Canary?

@TD-er
Copy link
Member

TD-er commented Jan 20, 2025

Also do you have JavaScript blocked?
Are the JS files loaded?

Do you fetch the JS files from an external CDN or from the ESPEasy node?

Also, what ESP are you using (e.g. ESP32-C3 for example...)
Do you use a LittleFS or SPIFFS build?

@TD-er
Copy link
Member

TD-er commented Jan 20, 2025

Another one to check... Do you have Chrome extensions active?
If those are loaded before JS from the ESP is loaded, then this can also cause issues if one of those is generating some error.
So please also have a look at the inspect console, to see if there is some error reported there.

@fmuntean
Copy link
Contributor Author

Version 131.0.6778.266 (Official Build) (64-bit)
I have extensions installed in chrome.
Using CDN for JS and JS is not blocked.
I have a SPIFF build.
using ESP12F (ESP8266)

Note: I am also getting errors using the Firefox but less often.
Not able to put an older FW using the OTA page in any browser as of now.
I did a Factory Reset on the board and now I am not even able to change the board name.
On the same page un/checking "Append Unit Number to hostname:" saves correctly but changing the device name is erroring in all browsers.
RSSI is -50 which is fine. (getting web pages open on the device gives me no issues so I don't believe is a wifi issue)

Another issue found is that sending WifiAPMode command in the browser does not seem to work either.

@TD-er
Copy link
Member

TD-er commented Jan 20, 2025

Can you make a backup of the settings etc. of that node and please show the file system info from the sysinfo page.
It sounds like there might be some file system corruption going on here.

Did you by any chance use a different flasher for this node? Sounds like the file system partition may have changed and/or the SPIFFS file system got too fragmented to operate without errors.

@fmuntean
Copy link
Contributor Author

fmuntean commented Jan 20, 2025

I used the browser flash update page as the device is not easily accessible.
I thought that because I jump like two years that the file system format might have changed thus I did the Factory Reset.

Image

@fmuntean
Copy link
Contributor Author

after many tries was able to revert the firmware back to: Build: [ESP_Easy_mega_20221104_MFD_ESP8266_4M1M_80MHz_QIO_VCC Nov 4 2022]

I did not do "Factory Reset" after restore and had no problems setting the "Unit name" using Chrome.

@TD-er
Copy link
Member

TD-er commented Jan 20, 2025

There should not be a change in file system layout.
However SPIFFS does have issues with fragmentation and not actually reporting write errors.
At a reboot, the garbage collector is called a few times to make sure as many blocks as possible can be erased.
This could be just enough to get it going again.

You don't seem to have a lot of files on the file system, so it is less likely fragmentation is the main cause here.

Honestly I have no idea what may have caused this.
Could still be that the old build is serving JavaScript from the node and not from a CDN.

@fmuntean
Copy link
Contributor Author

the jquery is from internet but everything else is from the node.

@fmuntean
Copy link
Contributor Author

it looks that the custom-sample.h is missing instructions on how to disable CDN.

@tonhuisman
Copy link
Contributor

JQuery is always fetched from CDN, we don't currently have code to load that from the node. All other .js files can be uploaded to the file system.

@tonhuisman
Copy link
Contributor

Checked my Chrome version, and it auto-updated to 132.0.6834.84, so far all is working as expected. Didn't test really much, though Tools/Log, Rules and the Devices pages are working normal (seems the browser-cache was partially flushed).

@TD-er
Copy link
Member

TD-er commented Jan 20, 2025

You can also upload the files as listed here to the ESP node.
Almost all files (except when noted otherwise) can be uploaded without renaming.

@fmuntean
Copy link
Contributor Author

kind of hard if the upload is not working:

Image
I tried with caching enabled and disabled in case you wonder ;)

@fmuntean
Copy link
Contributor Author

Could it be a problem with dispatching HTTP POST requests?
I added debug 4 and build a custom bin with debugging including adding lots of debug logs into the UploadPage.cpp however none of them were triggered including the ones at the beginning of the functions.

ex:
void handle_upload_post() {
addLogMove(LOG_LEVEL_INFO, F("handle_upload_post: before login"));

void handleFileUpload() {
addLogMove(LOG_LEVEL_INFO, F("handleFileUpload"));

None of these were showing in the logs.

@TD-er
Copy link
Member

TD-er commented Jan 21, 2025

The upload isn't using any Javascript, it is just a HTTP POST.
Can you try any of the official builds to see if it may be related to your specific build?
For example an incomplete PlatformIO install.

If the official build is working, you should remove the .platformio folder in the ESPEasy project dir and these 3 folders in C:\Users<name>.platformio\

  • .cache
  • packages
  • platforms

Make sure you do not have any VS code open while removing those folders and when you open VS code again, you should have the latest platformio.ini (and other .ini files) of the latest code base on your PC.
So better make sure those are present before cleaning the folders in .platformio

@TD-er
Copy link
Member

TD-er commented Jan 21, 2025

Oh and how did you flash the build, as it seems to be running at 80 MHz flash speed?
I don't think this is the speed I set it to in the board definitions as this is often causing issues.
Also I only use DIO or DOUT as mode, just to be as compatible as possible.

@fmuntean
Copy link
Contributor Author

fmuntean commented Jan 21, 2025

I have removed both the .platformio and the .pio folders for good measure.
Have rebuild the custom flash and still having the same issues.

I have rebuild the normal_ESP8266_4M1M_VCC and still same issue.

I also added:
void handle_devices() {
addLogMove(LOG_LEVEL_INFO, F("handle_devices"));

which gets call fine on the GET for both the device list and the edit of a device but then for the POST from Chrome I do not get this log.

Regarding the 80MHz and QIO I use those settings from 2018 on the same board and had no issues.
I ran across few boards before that did not worked with these settings and remove them from the setup or replaced the flash with a better one.

here is an example of the POST body from Chrome that is not working: (I personally see no issue with it)
TDNUM=1&TDN=BUTTON&TDE=on&remoteFeed=0&TDPI=on&taskdevicepin1=16&type=0&button=1&sw_debounce=250&sw_dc=0&sw_dcmaxinterval=1000&sw_lp=1&sw_lpmininterval=500&TDT=0&TDVN1=Switch&edit=1&page=1

I have also disabled all the Chrome extensions and still no luck.

@tonhuisman
Copy link
Contributor

I have rebuild the normal_ESP8266_4M1M_VCC and still same issue.

Can you also try to install a downloaded from Github Release page copy of that build, to exclude any possible influence from your local build environment?

@fmuntean
Copy link
Contributor Author

ok. uploaded the ESP_Easy_mega_20241222_normal_ESP8266_4M1M_VCC.bin and getting the same issue in Chrome

@tonhuisman
Copy link
Contributor

Has your Chrome been updated to 132.x yet? Maybe this update improves things?

@TD-er
Copy link
Member

TD-er commented Jan 21, 2025

... or breaks things as I am still on 131.x

@fmuntean
Copy link
Contributor Author

I just reinstall Chrome and still have the issue.
Is strange that is only me. I do not understand.
Chrome version: Version 132.0.6834.84 (Official Build) (64-bit)

@fmuntean
Copy link
Contributor Author

on the custom builds each time when was not working the POST calls were not transferred to the code by the ESP lib as I could not see the Logs I added. What was strange is that usually after a reboot would work for like once or so then it stops.
I even found the same issue with Firefox. Once the board gets in a bad state stops working in Firefox too.
I have been using these boards for years so is not a hardware issue. I also checked now with multiple boards including one that I have locally over serial port now.
The strange thing is that if I revert back to the 2022 firmware (bin that was built in 2022) the board works just fine.

I saw that currently the ESP SDK is much newer than what ESPEASY is using. Any reason why that much of discrepancy ?
I wonder if they have done some updates to the closed source WebServer class as currently for me looks that the issue could be there as that is the point that receives the requests then calls the page hooks. I will try to see if the beta build of ESPEasy brings any changes.

@fmuntean
Copy link
Contributor Author

fmuntean commented Jan 22, 2025

uploaded: ESP_Easy_mega_20241222_custom_beta_2ndheap_ESP8266_4M1M.bin and still the same issue for me.
have build the custom_312_ESP8266_4M2M and same issue.

However important finding: I install Fiddler to capture the traffic and the pages started to work. I build my custom firmware again and the pages are still working with fiddler in the middle of the traffic. The moment I close Fiddler the pages stop working.
When the POST works I can see the handle_devices in the logs and when is not working that log is not present either thus the method is never called.
I still believe that is something wrong on how POST is handled by the ESP8266 WebServer.

@TD-er
Copy link
Member

TD-er commented Jan 22, 2025

The "312" and "beta" build each use the latest ESP8266 code.
The "274" builds or those without "312" or "beta" in the name use SDK 2.7.4, which is way less 'quirky'.

Could you try one of those "274" builds to see if this 'fixes' it? Then we know it is in the SDK.

@fmuntean
Copy link
Contributor Author

I was using the 274 type of build.

@TD-er
Copy link
Member

TD-er commented Jan 22, 2025

OK, then I have no idea what may have changed since.

@fmuntean
Copy link
Contributor Author

No idea here either. I might need your help to identify the source code for the ESP8266WebServer as I can't find it anywhere.
I found another implementation on the Github here: https://github.com/esp8266/ESPWebServer but is very old.
I guess it is hidden somewhere in a .a file somewhere.

Based on the logs I see I wonder if there is a character that throws off the parser thus not calling the handler. (at least this is what I see or not see in the logs).
And the only difference between me and others could be the headers sent by the browser automatically.

I see that the https://github.com/esp8266/Arduino/tree/master/package is using the NONOSDK but it does not make sense that the lib will be there as I expect that SDK to be pure C (but could be wrong)
I know before there was a way to specify which SDK to use but I can't find that anymore.
I would like to try and use one of the many SDKs available.

Any help identifying the libs would be appreciated.

@fmuntean
Copy link
Contributor Author

fmuntean commented Jan 23, 2025

I have installed Wireshark and this is what I found:
Firefox => single package of 730 bytes
Chrome => single package of 841 bytes
Fiddler => splits the package in multiple ones 691 + 212 (sends headers in one package and then the body in the next)

In chrome if I override the User Agent with a custom 3 characters one the POST works.

Based on what I see there is a size limit on the POST request somewhere between 730 and 841 bytes. (800bytes ???)

@TD-er
Copy link
Member

TD-er commented Jan 23, 2025

Hmm sounds like a free memory issue.
Can you disable all tasks, reboot and see if this makes a difference?

@fmuntean
Copy link
Contributor Author

Disabling all tasks bakes no difference.

Here are the differences between the two firmware:
Build: 20221104 - Mega |System Libraries: ESP82xx Core 2843a5ac, NONOS SDK 2.2.2-dev(c0eb301), LWIP: 2.1.2 PUYA support
Build: 20241229 - Mega |System Libraries: ESP82xx Core 2843a5ac, NONOS SDK 2.2.2-dev(38a443e), LWIP: 2.1.2 PUYA support

I can't find any of the checkins for the NONOS SDK and there is no branch, release or tag named 2.2.2 in this repo: https://github.com/espressif/ESP8266_NONOS_SDK

@fmuntean
Copy link
Contributor Author

This is what I found so far:
The MSS for both the old FW and the new one is 1460
It seems that the problem with the new firmware is the WIFI AP "Fragmentation Threshold" if it is under the package size then is not working.
The moment I go and change it to 1500 it works with no problems and does explain why the issue is not seen so easy.
I believe that something changed in the way that the WIFI network packages are handled and if the package gets fragmented in any way is not handled correctly thus the log not even showing the method call.

Note: the file upload is influenced by the AP "RTS Threshold" which again needs to be set over 1460 for the upload to work.

If someone could change those and validate the results would be great.

As I am not able to track what exactly changed in between I will let the experts to continue if you want otherwise we can close this issue and I will have to deal with the AP changes required to make this work.

@TD-er
Copy link
Member

TD-er commented Jan 24, 2025

What brand/model of access point do you have?
And did you change this setting before?

I find the MSS of 1460 a bit high to be honest, as the MTU is probably 1492.
Maybe it is best to set the MSS to 1360 or something similar as you may otherwise see similar issues when communicating over a VPN or via a NAT router.

@fmuntean
Copy link
Contributor Author

I have DD-WRT as the AP. It was configured based on max throughput and best stability given the noisy environment I am in.
So the numbers were lower than the 1460 which is fine as that AP is only used for MQTT mainly.

The MSS is actually defined inside the NONOSKD as a constant and I can't change it.
#define TCP_MSS 1460
inside the: https://github.com/espressif/ESP8266_NONOS_SDK/blob/master/third_party/include/lwipopts.h

The way it works is that the server sends that out in the first TCP connection saying this is the max package that I can accept.
The AP is free to split it up as needed and a good WiFi client would package it back up into the original size.
Fiddler was splitting it nicely right between the headers and the body thus the device was processing them as two separate pieces.
However the AP will just blindly split them by that setting thus the device was failing to process the data in the first.
I would argue that this is a bug in the way that the Espressif NONSDK is handling the packages as it could have just put both together before processing.

As I could not find the two checkins mentioned above I could not do a diff to see what changed between 2022 and 2024

@TD-er
Copy link
Member

TD-er commented Jan 25, 2025

The webserver does use a buffer system I wrote. (Web_StreamingBuffer)
Maybe the buffer size was changed there not that long ago which may now trigger some issues with your specific setup?

#ifdef ESP8266
#define CHUNKED_BUFFER_SIZE         512
#else 
#define CHUNKED_BUFFER_SIZE         1200
#endif

This used to be 400 bytes for ESP8266

@fmuntean
Copy link
Contributor Author

seems that it was 512 in 2022 so is not that.

@TD-er
Copy link
Member

TD-er commented Jan 28, 2025

Can you test one more thing in the Web_StreamingBuffer.cpp ?

See line 224 here:
https://github.com/letscontrolit/ESPEasy/blame/mega/src/src/DataStructs/Web_StreamingBuffer.cpp#L224

Can you comment out the line with web_server.client().setNoDelay(true); to see if this fixes your issues? (or set it to false)

More info on this subject: https://www.extrahop.com/blog/tcp-nodelay-nagle-quickack-best-practices

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants