Sitemapping the current site layout
= Objective =
- generate a digraph representation of the current sitemap of www.torproject.org.
We are doing this to visualize how the website is currently laid out, analyze the pros and cons of the layout, and to see user paths throughout the website.
= Definitions =
- a sitemap is a list of pages of a web site accessible to crawlers or users, typically organized in hierarchical fashion. It can be a document in any form used as a planning tool for Web design.
- a directed graph (or digraph) is a graph with a set of vertices connected by edges, where the edges have a direction associated with them.
= Methodology =
To generate the sitemap digraph, one person (linda) manually visited the website and manually wrote the code for generating the visualization. The manual crawl began by visiting all the pages reachable with one click from the front page, then visiting all the pages reachable with two clicks from the front page, and so on. This continued until there were no additional pages to visit.
External pages (any site that wasn't www.torproject.org/stuff, so donations.torproject.org would be considered an external link) and duplicate paths (if one page was reachable from the header, and also from the footer, for instance) were noted along the way.
There was existing work done to sitemap the website (#10591 (moved)), and this was taken into consideration. The previous work was used to check that there were not any sites that were not accounted for, but since the digraphs were not generated in the same way (the old method did add nodes for external links, whereas this digraph does, for instance), they do not look identical.
= Results and observations=
- a digraph sitemap of www.torproject.org (key: black = webpage, grey = external webpage, pink = duplicate link to a webpage).
The three main observations about the structure were that it was abnormally structured, too flat, and messily interlinked. More details about this below:
- **The current structure of the website does not follow any of the standard design patterns: **
An example of a hierarchy pattern, additional ones are here.
Currently, the website structure is asymmetrical, and of various depths. This can be irritating to users where some pages just "end" whereas other pages require 2-3 clicks to find the information that they need.
- **the website structure is very flat. **
Content is more discoverable when it's not buried under multiple intervening layers. Users can become overwhelmed with cluttered menus. Hierarchies can be helpful if categories are specific and do not overlap, which I do think is the case for many of the content in torproject.org.
- **there is a lot of inter-linking and duplicate links to various pages. **
You can get to to a pages' subpage from another page's subpage. There are links with different text ("learn more" and "about tor" both lead to the same place) that lead to the same place. On one page, there are multiple ways to get to the same page (you can get to the donate page from the header, subheader, and footer, and occasionally a side bar tip). All of these things are confusing, and we should find out where the best placement for something is, and keep it there.