We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
如何爬去的一个网站返回的response里面的headers包含了 content-encoding: "gzip"的话,那么就会报上述错误,虽然作者在 downloadermiddlewares.py 的代码段中去掉了这个属性:
# 这个地方只能去掉 headers 中的content-encoding,但是response.headers中的依然存在,所以下面应该直接改为 headers=headers, headers = response.headers headers.pop('content-encoding', None) headers.pop('Content-Encoding', None) response = HtmlResponse( page.url, status=response.status, headers=response.headers, # 解决办法就是改为: headers=headers, body=content, encoding='utf-8', request=request )
但是很可惜的是,去不掉,只有把 headers=response.headers, 改为headers才可以。
The text was updated successfully, but these errors were encountered:
注释 是我添加上去的
Sorry, something went wrong.
感谢你的解决方案, 我发现在调用HtmlResponse之后进行删除操作,就可以返回正确的response
response = HtmlResponse( page.url, status=response.status, headers=response.headers, body=content, encoding='utf-8', request=request )
headers.pop('content-encoding', None) headers.pop('Content-Encoding', None)
我把scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware这个中间件去了也可以
scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware
No branches or pull requests
如何爬去的一个网站返回的response里面的headers包含了 content-encoding: "gzip"的话,那么就会报上述错误,虽然作者在 downloadermiddlewares.py 的代码段中去掉了这个属性:
Necessary to bypass the compression middleware
但是很可惜的是,去不掉,只有把 headers=response.headers, 改为headers才可以。
The text was updated successfully, but these errors were encountered: