Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • CAPTCHA-Monitor CAPTCHA-Monitor
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 31
    • Issues 31
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Artifacts
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Barkin Simsek
  • CAPTCHA-MonitorCAPTCHA-Monitor
  • Merge requests
  • !19

Rename URL table to Domain

  • Review changes

  • Download
  • Patches
  • Plain diff
Merged Barkin Simsek requested to merge domain into master Jun 08, 2021
  • Overview 0
  • Commits 2
  • Pipelines 1
  • Changes 7

I decided to rename the URL table to Domain and add the url field to the fetcher queue because the original intention of the URL table was to store only the domain names and their related information (HTTP, HTTPS, IPv4, IPv6 support, etc.). At some point, I forgot about this fact and started feeding the domain names from that table into the fetchers as if they were complete URLs, for example, asking Tor Browser fetcher to fetch torproject.org. However, we should ask Tor Browser fetcher to fetch https://torproject.org, http://torproject.org, https://check.torproject.org, etc

To achieve this, I added a new url field to the fetch queues. Now, the job scheduler can decide which exact URL to fetch instead of a generic domain name without any protocol prefix.

I also renamed update_websites to update_domains to keep the naming consistent.

Edited Jun 08, 2021 by Barkin Simsek
Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: domain