Introduction
This tutorial introduces graphing web domains as a network with Gephi Lite, a limited version of the popular desktop visualization program Gephi that runs in a web browser. It requires no prior knowledge of either to get started.
Use this tutorial to familiarize yourself with tools to organize, visualize, and interpret domains in a web archive collection as a community with nodes and edges. Read the tutorial here when you are ready to get started with the full desktop version of Gephi.
Used in this tutorial:
- Dataset: Domain graph
- Tools: Gephi Lite
- Time: ~10-15 minutes to complete
Dependencies
NB: at the time of writing it is necessary to use a dataset in GEXF format to follow the instructions for the browser application below. Use the GEXF file provided in the same ARCH workshop archive or load your own CSV file from ARCH into Gephi, then export the project immediately as GEXF.
Instructions
In this section:
Get to know your data
- Locate the .domain-graph.csv. file in the ARCH workshop archive and open it with your preferred spreadsheet program (Excel, Calc, Numbers, Sheets, etc.).
- Take note of the four attributes included in each Domain graph extraction from ARCH. Each row in the spreadsheet represents the number of times that a selected site or page links to another web domain when it was collected for the archive:
-
.crawl_date.: a timestamp representing when each link between domains was collected.
-
source.: each domain that hosts web content selected for the collection.
-
target.: each host domain to which a source domain in the collection above links.
- .count.: the sum of links collected from each source to each target host domain.
-
.crawl_date.: a timestamp representing when each link between domains was collected.
Create a network graph
- Open Gephi Lite in your preferred web browser here: https://gephi.org/gephi-lite/
- Click the "Open a local file" button on the "Open graph file" options and select or drag-and-drop the domain-graph.gexf. file from your local storage.
- Click "Open" to load the domain graph dataset with Gephi Lite:
Interpret the results
- Let’s calculate some statistics in order to interpret the graph attributes more visually. Click on the "Statistics" clipboard icon on the left-hand sidebar.
- Select the "Louvain community detection" option from the drop-down menu, leave the configurations as they are for now, and click the "Compute metric" button.
- Select the "PageRank" option from the drop-down menu, leave the configurations as they are for now, and click the "Compute metric" button.
- Select the "Louvain community detection" option from the drop-down menu, leave the configurations as they are for now, and click the "Compute metric" button.
- Now let’s color and resize the "nodes" (points) on our graph to reflect their statistical values. Click the "Appearance" color palette on the left-hand sidebar.
- Under the "nodes" heading, select the "modularityClass" option from the "Set color from…" drop-down menu to color the nodes according to the community of interlinking nodes to which they most closely relate.
- Select the "degree (dynamic" option from the "Set size from…" drop-down menu to resize the nodes by their incoming and outgoing links. Set the "Min" value to .5. and the "Max" value to .50..
- Under the "nodes" heading, select the "modularityClass" option from the "Set color from…" drop-down menu to color the nodes according to the community of interlinking nodes to which they most closely relate.
- Now let’s arrange the nodes and "edges" (lines) on our graph to better understand how these domains relate to one another. Click on the "Layout" nodes icon on the left-hand sidebar.
- Select the "ForceAtlas2" option from the "Select a layout…" drop-down menu and check the box next to the "Adjust sizes?" option.
- Click "Start" to begin arranging communities by the gravity of their nodes and "Stop" when the nodes are in place:
- Use the .+. and .-. buttons at the top-left corner of the graph to zoom and drag the artboard pane with your mouse to move around the graph. You can click and drag the nodes to move them around the artboard. What can you see?
- Which domains are the most connected and which the most self-contained?
- What distinguishes the kinds of "communities" that form among these domains? Can you find a regional and a thematic community?
- Are the biggest hubs of connections to secondary sources the same ones that you would expect?
- Which domains are the most connected and which the most self-contained?
Comments
Please sign in to leave a comment.