User-agent checking in robots.txt issue #130

jishnug007 · 2021-04-16T11:53:18Z

I need to validate https://facebook.com/robots.txt (I have seen the below issue in most of the websites which I cannot disclose)

I'm using reppy==0.4.14

When I try to validate the above with Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

It is giving me a False response.

Small reproducible sample given below

from reppy.robots import Robots

useragent = "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
# useragent = "Googlebot"
robot_url = "https://facebook.com/robots.txt"
url = "https://facebook.com/"

robots = Robots.fetch(robot_url)
res = robots.allowed(url, useragent)
print("RESPONSE | ", res)

According to https://developers.google.com/search/docs/advanced/crawling/overview-google-crawlers

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) is Googlebot
This should give a True response, instead I'm getting False

When I tried user agent as Googlebot in above code, it gave me a True response.

I have also tried downloading robots.txt content and parsing method too.
Giving the same issue as above.

Reproducible sample

from reppy.robots import Robots

useragent = "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
# useragent = "Googlebot"
robot_url = "https://facebook.com/robots.txt"
url = "https://facebook.com/"

import requests
payload = {}
headers = {}
response = requests.request("GET", robot_url, headers=headers, data=payload)
robots_content = response.text

print("ROBOTS.TXT CONTENT | ", robots_content)

robots = Robots.parse(robot_url, robots_content)
res = robots.allowed(url, useragent)

print("RESPONSE | ", res)

Why giving the full string of user agent instead of token is not working?

Giving Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) is the same as giving Googlebot and should return True response.
Ref: https://developers.google.com/search/docs/advanced/crawling/overview-google-crawlers

Thank you!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User-agent checking in robots.txt issue #130

User-agent checking in robots.txt issue #130

jishnug007 commented Apr 16, 2021 •

edited

Loading

User-agent checking in robots.txt issue #130

User-agent checking in robots.txt issue #130

Comments

jishnug007 commented Apr 16, 2021 • edited Loading

jishnug007 commented Apr 16, 2021 •

edited

Loading