Tutorial: Explore web archive data from the command line with Jupyter Notebooks

<<< Back to the guide, "Sample ARCH datasets and how to explore them."

Introduction

Browser-based tools like those included in the above tutorials can help you to examine and visualize relatively small samples of data. Analyzing full ARCH datasets from web archive collections at scale can require more computing power, command line tools, and custom code refinements.

Jupyter Notebooks provide an opportunity to demonstrate and even modify these manual processes quickly. Follow the instructions below to practice using these notebooks to run popular command line tools written in the Python programming language for text analysis and visualization, named entity recognition, and sentiment analysis.

Used in this tutorial:

Thanks to Nick Ruest and the Archives Unleashed Project for authoring this notebook and all others linked to each ARCH user's dataset detail view automatically for guided exploration.

Instructions

Click the Open in Colab button to see and execute sample code step-by-step and/or preview the results with the completed version of the notebook below:

Was this article helpful?
1 out of 1 found this helpful
Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.