出现gzip.BadGzipFile: Not a gzipped file (b'<!') 的解决办法。一处bug #14

legend-zl · 2022-05-09T06:52:21Z

如何爬去的一个网站返回的response里面的headers包含了 content-encoding: "gzip"的话，那么就会报上述错误，虽然作者在 downloadermiddlewares.py 的代码段中去掉了这个属性：

Necessary to bypass the compression middleware

        # 这个地方只能去掉 headers 中的content-encoding，但是response.headers中的依然存在，所以下面应该直接改为  headers=headers,
        headers = response.headers
        headers.pop('content-encoding', None)
        headers.pop('Content-Encoding', None)

        response = HtmlResponse(
            page.url,
            status=response.status,
            headers=response.headers,    # 解决办法就是改为： headers=headers, 
            body=content,
            encoding='utf-8',
            request=request
        )

但是很可惜的是，去不掉，只有把 headers=response.headers, 改为headers才可以。

The text was updated successfully, but these errors were encountered:

legend-zl · 2022-05-09T06:53:06Z

注释是我添加上去的

tangyuanba · 2022-05-26T04:24:58Z

感谢你的解决方案，我发现在调用HtmlResponse之后进行删除操作，就可以返回正确的response

response = HtmlResponse(
page.url,
status=response.status,
headers=response.headers,
body=content,
encoding='utf-8',
request=request
)

headers.pop('content-encoding', None)
headers.pop('Content-Encoding', None)

yswtrue · 2022-06-29T21:56:04Z

我把scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware这个中间件去了也可以

yyyy777 mentioned this issue Mar 23, 2022

playwright._impl._api_types.Error: Browser closed. #11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

出现gzip.BadGzipFile: Not a gzipped file (b'<!') 的解决办法。一处bug #14

出现gzip.BadGzipFile: Not a gzipped file (b'<!') 的解决办法。一处bug #14

legend-zl commented May 9, 2022

legend-zl commented May 9, 2022

tangyuanba commented May 26, 2022

yswtrue commented Jun 29, 2022

出现gzip.BadGzipFile: Not a gzipped file (b'<!') 的解决办法。 一处bug #14

出现gzip.BadGzipFile: Not a gzipped file (b'<!') 的解决办法。 一处bug #14

Comments

legend-zl commented May 9, 2022

Necessary to bypass the compression middleware

legend-zl commented May 9, 2022

tangyuanba commented May 26, 2022

yswtrue commented Jun 29, 2022

出现gzip.BadGzipFile: Not a gzipped file (b'<!') 的解决办法。一处bug #14

出现gzip.BadGzipFile: Not a gzipped file (b'<!') 的解决办法。一处bug #14