Rename URL table to Domain
I decided to rename the URL
table to Domain
and add the url
field to the fetcher queue because the original intention of the URL
table was to store only the domain names and their related information (HTTP, HTTPS, IPv4, IPv6 support, etc.). At some point, I forgot about this fact and started feeding the domain names from that table into the fetchers as if they were complete URLs, for example, asking Tor Browser fetcher to fetch torproject.org
. However, we should ask Tor Browser fetcher to fetch https://torproject.org
, http://torproject.org
, https://check.torproject.org
, etc
To achieve this, I added a new url
field to the fetch queues. Now, the job scheduler can decide which exact URL to fetch instead of a generic domain name without any protocol prefix.
I also renamed update_websites
to update_domains
to keep the naming consistent.