find_robots_txt check does not work with redirects or reflect the test description

The `find_robots_txt` check does not work properly with redirects.

The description states
```
        Iterates through all service_sites in each RwsSet provided, and makes
        a get request to site/robots.txt for each. This request should return
        an error 4xx, 5xx, or a timeout error. If it does not, and the page 
        does exist, then it is expected that the site contains a X-Robots-Tag
        in its header. If none of these conditions is met, an error is appended
        to the error list.
```

However the [code](https://github.com/GoogleChrome/related-website-sets/blob/main/RwsCheck.py#L485) only makes a request to the root, it does not first check for a 4xx/5xx against the service domain.

```
r_service = requests.get(service_site, timeout=10)
```

If a redirect ([per the guidelines](https://github.com/GoogleChrome/related-website-sets/blob/main/RWS-Submission_Guidelines.md#:~:text=Must%20have%20a%20homepage%20that%20redirects%20to%20a%20different%20domain%20or%20results%20in%204xx%20(client%20error)%20or%205xx%20(server%20error).)) is in place then the headers being checked are for the destination domain, not the service one.

I'm working to set up for our service domains to meet the requirements but running into issues because of the structure of this test and am waiting to finalize the setup until I know the expectations of the check.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

find_robots_txt check does not work with redirects or reflect the test description #246

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

find_robots_txt check does not work with redirects or reflect the test description #246

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions