Tutorial: How to visualize connections between web domains with RAWGraphs

Karl Blumenthal

Updated June 23, 2023 20:03

<<< Back to the guide, "Sample ARCH datasets and how to explore them."

Introduction

Unlike titles on a shelf, web resources are organized and interdependent in ways that can be hidden from the reader. This tutorial gives a sense of scale and contents by plotting the domains in a web archive collection as sources and targets.

Used in this tutorial:

Dataset: Domain graph from the Art Galleries web archive collection
Tools: RAWGraphs
Time: ~10 minutes to complete

Instructions

In this section:

Get to know your data
Create an alluvial diagram
Interpret the results

Get to know your data

Locate the .domain-graph.csv. file in the ARCH workshop archive and open it with your preferred spreadsheet program (Excel, Calc, Numbers, Sheets, etc.).
Take note of the four attributes included in each Domain graph extraction from ARCH. Each row in the spreadsheet represents the number of times that a selected site or page links to another web domain when it was collected for the archive:
1. .crawl_date.: a timestamp representing when each link between domains was collected.
2. source.: each domain that hosts web content selected for the collection.
3. target.: each host domain to which a source domain in the collection above links.
4. .count.: the sum of links collected from each source to each target host domain.

Create an alluvial diagram

Open RAWGraphs in your preferred web browser here: https://app.rawgraphs.io/
Select the “Upload your data” option from the menu on the left-hand side of the screen, click the “Browse” button in the editor pane at the center, and open the downloaded CSV from your local storage. You should now see your data represented as a spreadsheet in the center pane:
Scroll down to the “2. Choose a chart” heading. RAWGraphs enables you to represent your data in dozens of visual modes. Let’s employ the first one, .Alluvial Diagram., so we can plot each link from its source to its target and then weight it by the number of times that the link is made in the archive. Select this chart option and scroll down to the next section.
Under “3. Mapping,” drag and drop the relevant dimensions from your spreadsheet into the buckets for this chart type’s .Steps. (source and target) and .Size.:
1. Begin by mapping the .Source. attribute to be your first .Step. dimension.
2. Map the .Target. attribute to the second open slot for .Steps..
3. Map .Count. to .Size. and, if it is not selected automatically, choose the .Sum. option from the drop-down menu that appears for this dimension.
Now we’re ready to visualize the data. Scroll down to the “4. Customize” heading on the page and prepare your data to plot legibly on the provided artboard pane.
Use the “Artboard” menu at the left-hand side of your screen to make space for your data visualization:
1. Set your view’s .Width. to .800..
2. Set the .Height. to .1000..
Use the “Chart” menu to plot your data on the artboard legibly:
1. Set the .Sort nodes by. value to .Minimize overlaps. in order to make the connections between sources and targets render most clearly.
2. Set the .Flow alignment. to .Top. to orient your diagram to the list of seed sources in this archive.

Interpret the results

What can we see? Depending upon your screen size and resolution, your alluvial diagram might still be a little too large to browse easily, so let’s download it too:
1. Scroll down to the “5. Download” header, give your visualization a filename, select an image format (we recommend ..png.), and click the “Download” button to create a local copy on your machine.
Use a web browser or file viewer to open, browse, and zoom around the image file on your machine.
Follow the streams from source to target. What can you find?
1. Which sites reference the most secondary sources? And which look like they might be rich archives themselves?
2. How does this change or affect your perception of the collection’s strengths? Is there anything that you would contribute? Do shared secondary sources suggest anything specific?