A Guide to User-Submitted Metadata in Archivematica

Introduction

Welcome! This guide gives an overview of how user-submitted metadata works in Archivematica. “User-submitted” refers to three main types of metadata that users can optionally include in a transfer package to be processed in Archivematica, and all of which ends up in the final Archivematica METS file: descriptive metadata, rights metadata, and bag-based metadata. The broad goal for each of these types is to ensure that this metadata is correctly included and structured in the final METS file produced about the contents of the AIP by Archivematica. 

The guide also includes instructions on how descriptive metadata can be transferred to AtoM with a DIP. It attempts to give as accurate instructions as possible about the various ways in which these kinds of metadata can be ingested alongside files being processed for preservation, but there may be other use cases not anticipated here. If you have further questions that are not addressed in this document, please let us know by contacting us at permafrost@scholarsportal.info!

This guide does not replace Archivematica’s official documentation, but is supplementary to it.

Glossary

AIP: An Archival Information Package (AIP) is a package of data intended for long-term preservation. It is commonly considered the canonical copy of that data. It contains archival copies of files (including originals, and if chosen, normalized copies in preservation-friendly formats) alongside metadata about those files and any additional documentation supporting their preservation or contextualization. A key component of the AIP is a METS XML file describing the package’s files, the relationships between these files, and the preservation-supporting actions performed on the files as part of the Archivematica workflow. All user-submitted metadata is added to the METS file during processing in Archivematica. See also “METS” for more details. 

API: Stands for “application programming interface.” APIs consist of a set of protocols, commands, functions and tools that enable communications between different applications, typically via the internet, and can initiate actions or retrieve information. For example, querying the Archivematica APIs can retrieve information about stored packages, or initiate an action in Archivematica itself, like starting a transfer or reingesting an AIP.

CSV file: Comma Separated Values file. This is a plain text file that enables the writing and transfer of structured spreadsheet data. The comma serves as the typical separator between data elements and entries. (Hint: try opening a CSV in Notepad to see its true structure). This file type allows for ease of use between different applications, especially those using different operating systems. The easiest way to create a CSV file for metadata in Archivematica is to create a spreadsheet file using your preferred software application such as Excel or LibreOffice and save it in CSV, UTF-8 format. See also “UTF-8”. 

DFXML file: Digital Forensics XML file. This is a metadata file created as part of a disk imaging process. Information created as part of this process can include the names and locations of files on the disk, and provenance information about the imaging process. This information can be automatically parsed into the METS file using the XML import workflow.

DIP: A Dissemination Information Package (DIP) is intended for access by the archives’ user community. It may include some or all of the contents of an AIP, and in formats more appropriate for access. For example, a DIP could contain copies of files normalized according to rules set in Archivematica (such as a medium-sized JPG file for access created from a large preservation master in TIFF format). The DIP also includes a copy of the METS file.

EAD XML file: Encoded Archival Description XML file. This is an XML file (and related standard) for storing structured metadata describing archival records. EAD is the basis for many digital systems for archival descriptions, and can be exported from systems like AtoM. This information is not parsed into the METS file, but can be added by a user as part of the transfer package.

Metadata: any information describing a set of data. For example, metadata about a digital file or collection of files can include descriptive metadata like the title, creator, a written description, subject tags, etc. Technical metadata might include file format information, file sizes, and checksums. 

METS XML file: Metadata and Encoding Transmission XML file. The METS file records hierarchical metadata associated with the files in the AIP, including structural, administrative, and descriptive metadata. In digital preservation, METS files are used to structure and store metadata about digital files to help inform their preservation and record the results of any preservation actions conducted. They typically include PREMIS metadata as one component. The METS file is automatically created by Archivematica as part of a transfer process, and any user-generated metadata entered is parsed into the METS file. The data contained in them are considered the key record of information about the contents and context of the AIP. See the METS documentation to learn about METS files in more detail.

OAIS: Open Archival Information System standard, “a comprehensive and consistent framework for describing and analysing digital preservation issues, provide a sound footing for future standards-building activity, and serve as a point of reference for vendors interested in building digital preservation products and services.” The current OAIS reference model is ISO Standard 14721:2012. Please see Brian Lavoie’s excellent introductory guide to OAIS for a detailed explanation of what OAIS is and its role.

PREMIS: stands for Preservation Metadata: Implementation Strategies, which was the name of the original working group that defined a suite of core terms for preservation metadata. These terms are commonly adopted for structuring information about the preservation of digital files in  a package. PREMIS is organized around intellectual entities (the content is being described, such as a book or dataset), objects (the individual digital objects that make up the intellectual entity, such as individual files), events (actions that involve objects), agents (people, organizations, or software which perform events), environments (hardware of software required to process or interpret objects), and rights (asserts what actions can be taken by whom). In Archivematica-created METS file, PREMIS metadata is mostly contained within the Administrative section of the METS. 

UTF-8: 8-bit Unicode Transformation Format (UTF-8) is a character encoding standard that is used by Archivematica and many other applications to create and read CSV files and other text files. Saving files with UTF-8 encoding ensures that any metadata, including accents and special characters, is properly parsed and transferred into the Archivematica-created METS file. Operating systems like Windows use their own internal character encoding systems and therefore UTF-8 usually must be specifically selected as the intended export format when using applications like Excel.

XML file: eXchangeable Markup Language file. This is a plain text file that enables the writing and transfer of structured data. Tags are used to structure the data, define its purpose, and explain how data should be stored and transferred. This file type allows for ease of use between different applications, especially web-based tools. The easiest way to create an XML file for metadata in Archivematica is to create a text file using your preferred XML editor such as Oxygen or text editor such as Notepad++ or Sublime and save it in XML, UTF-8 format. Populated XML files can often also be exported from other systems such as AtoM or Dataverse. See also “UTF-8”.