Bringing Digital Preservation Home: Using Permafrost Remotely

by Grant Hurley, Digital Preservation Librarian, Scholars Portal

Permafrost logoAs a hosted digital preservation service available to members of the Ontario Council of University Libraries, Scholars Portal’s Permafrost service has always been guided by the idea that digital preservation work can be more efficiently distributed beyond the boundaries of a single institution. Permafrost hosts a suite of tools and infrastructure on behalf of subscribing members, who access them remotely from their home institutions to perform their work: processing valuable and unique digital collections for long-term preservation. The service is designed to complement member capacity in digital preservation by taking responsibility for some of the labour involved in this work, such as installing and maintaining complex tools and infrastructure, and providing training, documentation, advice, and other resources. 

Data in the Ontario Library Research Cloud (OLRC), which is the destination for the preservation and access copies of the collections processed using the service, is distributed among 5 nodes located at university libraries across the province. Scholars Portal also hosts individual instances of the preservation processing tool Archivematica for each subscriber. But other aspects of the service are distributed as well. As the main contact for all aspects of the service, I offer training and consultations through Zoom and email. I develop guides and documentation for our users to follow. I triage issues with our fantastic systems team. And I try to encourage a sense of collaboration through community webinars and other opportunities for information-sharing. 

The COVID-19 pandemic has rapidly reshaped the work of archivists. No longer able to access the physical collections held by their institutions, they are taking the opportunity to delve more fully into digital archives work from home. It’s been exciting to see many Permafrost users interested in connecting from home so they can continue to steward their collections during this uncertain time. In addition, some users are actively documenting the experience of the pandemic in their communities and using Permafrost to preserve these materials.  

Case example: Documenting the response to COVID-19 in Thunder Bay

Screenshot of interview update on the pandemic between Tracie Smith, the Senior Director of Communications, Indigenous Affairs and Engagement and Stewart Kennedy, the Executive Vice President of Medicine and Academics, Thunder Bay Regional Hospital at the Thunder Bay Regional Health Sciences Centre on March 21, 2020. This video forms part of the collection of materials being collected and preserved about the response to the COVID-19 pandemic in Thunder Bay by the Lakehead University Archives.
Screenshot of interview update on the pandemic between Tracie Smith, the Senior Director of Communications, Indigenous Affairs and Engagement and Dr. Stewart Kennedy, the Executive Vice President of Medicine and Academics, Thunder Bay Regional Hospital at the Thunder Bay Regional Health Sciences Centre on March 21, 2020. This video forms part of the collection of materials being collected and preserved about the response to the COVID-19 pandemic in Thunder Bay by the Lakehead University Archives.

One fantastic — and highly topical — case example of doing distributed digital preservation is being driven by Sara Janes, the University Archivist at Lakehead University in Thunder Bay. Sara has been actively working with the communications unit of the Thunder Bay Regional Health Sciences Centre, the major hospital for the area, to capture and preserve their response to the pandemic. Sara’s response is part of a global effort to document the pandemic as it is unfolding, particularly regarding experiences at the regional level.

Sara is using Permafrost to process and store the hospital’s pandemic response communications. Each package consists of a week’s worth of daily updates in text and video regarding the COVID-19 situation in the region that are being produced for healthcare providers and the general public. The updates include interviews between communications and senior hospital staff involved in pandemic planning, recordings of longer press events including the City of Thunder Bay’s Mayor and City Manager, and other documentation.

Sara worked with the hospital to ensure adequate permissions were received to capture and preserve this information, including downloading materials from web-based sources such as YouTube. Lakehead is not the official repository for the hospital, but is undertaking these efforts as an important document of the local experience of the pandemic. Sara is also engaged with using the Archive-It web archives capture tool to document additional local healthcare and government responses on the web. The Permafrost service ensures that these important records of responses to the pandemic will be safely preserved and accessible in the future. But how do Permafrost users connect to the service from home?

Connecting from home

The first requirement is ensuring remote access to our services. As a key security provision, the main components of the Permafrost service are restricted based on a member’s known institutional Internet Protocol (IP) address range. The restricted components are:

  • Access to the OLRC as a place to upload transfers for processing, as well to store final archival packages and access copies.
  • Access to individual hosted instances of Archivematica for preservation processing.

A second requirement is ensuring that the user has access to the materials they want to process for preservation: this would normally be on a network drive (such as the location where the master copies of a digitization project or a born-digital donation are temporarily stored), but could also be on their work computer or some external source, like the case at Lakehead discussed below. Now is not a time in which the processing of digital media like floppy disks or the digitization of analogue materials is likely possible, so collections destined for preservation processing need to be accessible already. 

Both of these considerations necessitate some way of connecting to the service such that a user’s IP address is represented as coming from their home institution’s network, as opposed to the address belonging to their home internet service provider. This can be achieved in several ways: 

Logging in through a remote desktop application: remote desktop applications enable a user to directly access their work computer from afar. Just like when they are at work, they would be accessing Archivematica and the OLRC via their institutional IP address, so no other configuration is needed. This assumes that they have a computer at their work they can log into, and another computer at home to log in from. If the employee has a single laptop they work from in both places, then a VPN must be used.

Using a virtual private network (VPN): most VPNs used by IT services will be set up with something called “split tunneling,” which decides how to direct network traffic based on the address of the site a user is trying to access. If a user is accessing an address associated with their internal institutional network, like a intranet site or shared drive location, their connection is run through the VPN and they can access the resource as if they were at work. If the user is accessing something external like a public website to browse the internet, the VPN directs their home-based IP network to that site. The idea behind this is to limit the use of the VPN and internal network traffic to only what is required for internal purposes. You can read more about this subject on Wikipedia. The trick is that the VPN needs to recognize Scholars Portal’s resources as internal and direct the user’s traffic accordingly. 

There are two ways to use a VPN with the Permafrost service. Both of these would require the assistance of the member institution’s IT services unit:

  • Request that the IP addresses associated with Archivematica and the OLRC get added to the VPN list. This way, the VPN knows that these addresses are to be contacted via the institutional network. Each instance’s IP address is available to users via their Permafrost project page.
  • Request a full-tunnel VPN that directs all of a user’s connections through their institutional IP address. 

Finally, working from home with a VPN also requires adequate connectivity. Some users have not had to work with large data from home before, and transfers consisting of hundreds of megabytes or gigabytes, which is common when dealing with digital collections, may be difficult to load. Users working with remote desktop should not have this issue, as their work computer would have access to their institution’s full network connectivity. 

As a workaround for one user whose VPN was causing much slower connection speeds, we enabled a shared folder on NextCloud (a Dropbox-like service for Scholars Portal staff hosted on the OLRC) where they could upload a transfer for processing without using the VPN, which I then copy to the proper place on the OLRC for staging and eventual processing. It’s a manual process for now but gets the job done!


Visit the Lakehead University Archives to explore their collections online, including digital collections and web archives

Scholars Portal is the information technology service provider for the Ontario Council of University Libraries. Founded in 2002, Scholars Portal provides a shared technology infrastructure and shared collections for all 21 university libraries in the province, including platforms for journal articlesbooksgeospatial datanumerical data; a shared platform for research data managementcloud storage services, and more. In 2013, the Scholars Portal journals platform was certified as a Trustworthy Digital Repository by the Center for Research Libraries.

To learn more about Permafrost, visit our website or contact permafrost@scholarsportal.info. You can also read our guide about processing digital archives – Handling Digital Archives Before Ingest – on this site.