Overview
The Internet Archive is excited to release Archives Research Compute Hub (ARCH), version 2.0. This version includes significant interface and infrastructure improvements driven by ARCH user input. Browse the summaries below to learn more about each update and plans for future releases.
In this release:
ARCH version 2.0 introduces Keystone, a Python Django web UI and PostgreSQL database, to support account administration and manage research dataset generation. Additionally, this release includes improvements to dataset generation as well as bug fixes.
Account administration
- Add new users to ARCH accounts.
- Create and populate teams to share working datasets internally.
Dataset generation
- Use a dynamic menu to learn about, select, and configure datasets prior to generation.
- Details about dataset provenance are now visible on dataset detail and custom collection pages.
- Version datasets as source collections grow over time.
- Share working datasets with internal account team members.
- Create named entities datasets in English or Chinese.
Screenshot of the ARCH 2.0 dataset menu
Bug fixes
- Missing datasets from Safari web browser downloads.
- False error reporting on custom collection mimetype filters.
- Failures to publish datasets with special characters in title fields.
More information
Open sourcing
ARCH is available as a standalone application here: ARCH Github repository. The ARCH team plans to open source the Keystone database API in a future release.
Development roadmap
For the latest information about features planned, in research, and in development, see: ARCH development roadmap.
Comments
Please sign in to leave a comment.