<<< Back to the guide, "Sample ARCH datasets and how to explore them."
Introduction
Follow these instructions in order to map a collection of web domains and trace their intersections in a web archive collection. This tutorial demonstrates how domains distribute, connect, and overlap spatially by plotting sources as hubs and targets as spokes on a network graph.
Used in this tutorial:
- Dataset: Domain graph from the Art Galleries web archive collection
- Tools: Palladio
- Time: ~10-15 minutes to complete
Watch
Follow along with a video demonstration of the instructions below:
Instructions
In this section:
Get to know your data
- Locate the .domain-graph.csv. file in the ARCH workshop archive and open it with your preferred spreadsheet program (Excel, Calc, Numbers, Sheets, etc.).
- Take note of the four attributes included in each Domain graph extraction from ARCH. Each row in the spreadsheet represents the number of times that a selected site or page links to another web domain when it was collected for the archive:
- .crawl_date.: a timestamp representing when each link between domains was collected.
- source.: each domain that hosts web content selected for the collection.
- target.: each host domain to which a source domain in the collection above links.
- .count.: the sum of links collected from each source to each target host domain.
- .crawl_date.: a timestamp representing when each link between domains was collected.
Create a network graph
- Open Palladio in your preferred web browser here: https://hdlab.stanford.edu/palladio-app
- Drag the CSV file from your local storage into the editor pane at the center of the page.
- Confirm that the data from your spreadsheet appears in the editor as comma-separated values (through line 146 in the case of this sample data file) and click the “Load” button:
- Palladio enables you to represent your data as a map, graph, table, or gallery. For this tutorial, let’s select the “Graph” option in the top navigation bar.
- Locate the hamburger menu icon (≡) at the top-right corner of the screen and click on it to expand your graph’s settings. Configure the graph’s dimensions to match the following attributes from your data:
- Click on the .Source. box and select the .source. option from your data in the ensuing dialog.
- Check the .x. box next to the “Highlight” option for this dimension so you can distinguish the sources from the targets.
- Set the .Target. dimension to .target..
- If it is not checked already by default, ensure that the .x. Box next to .Show links. is checked so you can see the connections between source and target domains.
- Now you should see the nodes and links begin to appear and align on your graph:
- Click on the .Source. box and select the .source. option from your data in the ensuing dialog.
Interpret the results
- Use the .+. and .-. buttons at the top-left corner of the graph to zoom and drag the artboard pane with your mouse to move around the graph. You can click and drag the nodes to move them around the artboard. What can you see?
- Are the biggest hubs of connections to secondary sources the same ones that you would expect from your alluvial diagram above?
- Are the biggest hubs of connections to secondary sources the same ones that you would expect from your alluvial diagram above?
- Now let’s take a closer look at the secondary sources that our seeds reference and the links between them:
- Click on the “Facet” button at the bottom-left corner of the screen to filter your view by one of the dimensions you’ve set from the dataset. In the .Description. field, name this view “Sites.”
- Set the .Dimensions. value at the right side of the screen to .source. in order to filter your current view by the seed sites’ domains.
- Find the .source. values at the bottom-left corner of the screen, listed in descending order of frequency. Which site links out to the most secondary sources?
- Re-organize your values alphabetically by clicking on the “↓AZ” button at the top-right corner of the table. Scroll through the results until you find the .currentspace.com. and .springsteengallery.com. domains and click on both. What can we infer about these galleries and what might they have in common based upon their links?
- Click on the “Facet” button at the bottom-left corner of the screen to filter your view by one of the dimensions you’ve set from the dataset. In the .Description. field, name this view “Sites.”
Comments
0 comments
Please sign in to leave a comment.