Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • Trac Trac
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Issues 246
    • Issues 246
    • List
    • Boards
    • Service Desk
    • Milestones
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
  • Wiki
    • Wiki
  • Activity
  • Create a new issue
  • Issue Boards
Collapse sidebar
  • Legacy
  • TracTrac
  • Issues
  • #29565

Closed (moved)
(moved)
Open
Created Feb 23, 2019 by David Fifield@dcf

Fix broker robots.txt to disallow crawling

From comment:11:ticket:28848 and https://github.com/ahf/snowflake-notes/blob/fb4304a7df08c6ddeeb103f38fc9103721a20cd9/Broker.markdown#the-robotstxt-handler:

  • Was the question about crawling ever answered? I can't think of a very good reason not to allow it. Even if censors were crawling the web for Snowflake brokers, they could get this information much more easily just from the source code.

I believe the intention behind the robots.txt handler is to prevent search engines from indexing any pages on the site, because there's no permanent information there, not for any security or anti-enumeration reason.

ahf points out that the current robots.txt achieves the opposite: it allows crawling of all pages by anyone. Instead of

User-agent: *
Disallow:

it should be

User-agent: *
Disallow: /
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking