In order to work with born-digital materials in an archival environment, the process must begin by the transfer of the materials onto the archives’ systems, usually via a donor agreement or some other transfer arrangement, such as a records transfer. The goal of transfer is to gain physical control of the digital materials you intend to bring into the archives, while verifying that the materials are not altered during this process. This step is the most technologically intensive, as there are many different storage media types you may encounter. This section will lay out the basic differences between different methods of transfer, a list of the steps applicable to any kind of media transfer, and then tips pertaining to specific classes of media.
There are two broad categories of transfer: a direct file transfer over a network or transfer from storage media. Network transfers can be conducted either via the internet or an internal network environment. Network transfers are often conducted directly between donors or records creators/managers and the archives when the originating body has direct access to files on local computers or networks.
Conducting a network transfer
The first step for any kind of transfer, once all digital materials are ready to be sent by the donor, is to establish fixity. Transferring files always carries a risk that they may be damaged in the process, or that the technical metadata may be changed. Therefore, it is important to establish fixity both before and after the transfer where feasible.
An essential format for file transfer is the BagIt specification. Like a zip file, BagIt-based applications create a package called a “bag” which packages file directories together and automatically includes a manifest with both filenames and checksums. It also includes the ability to manually enter other relevant metadata about the bag. This package can then be safely transferred and validated by the receiver. There are a number of programs that can be used to create and open bags, such as Bagger and Exactly. Exactly also has the ability to conduct direct File Transfer Protocol (FTP) transfers.
Transfers may be contained within a single bag or multiple bags, depending on the physical size of the transfer. Understanding the upload limits and network capacity of the systems in use will determine the appropriate size of the bag. Bags may be sent zipped, but note that the unzipping process can sometimes change technical metadata like file dates depending on the systems and tools used.
Some commonly-used means of transferring files are as e-mail attachments, through cloud-based hosting services like Dropbox or OwnCloud, or using a File Transfer Protocol (FTP) which directly transfers files between two computers. There is also the option to transfer files using a local network or shared drive environment, rather than over the internet.
The final step consists of post-transfer activities. It is a good idea to immediately verify the integrity of the transfer. Many BagIt-based programs have the capability to verify a transferred bag by re-calculating the files’ checksums and comparing them to the checksums calculated before the transfer. This can also be done by manually calculating and comparing checksums using a program like Fixity. If this verification process shows that the files are unchanged, the files can be unpacked and arranged according to your institution’s particular practices. If you find that the files have somehow changed, then you can ask that they be sent again. It’s a good idea to keep a fixity-verified copy of the original bag as a backup during processing.
Transferring born-digital materials from physical media is a more involved process than a direct network transfer. Media transfer occurs after physical media have been donated to an archives. Different kinds of physical media have unique issues to keep in mind when transferring their contents. You will sometimes have to work with older media, such as floppy disks or older hard drives, which you may not be used to handling or which require specialized hardware or software to access. This section contains tips about working with specific kinds of media, but it is also important to make the same initial preparations no matter what kind of media you’re working with.
Procedures applicable to all transfer media
The first step is to set up a clean workstation that has been thoroughly virus-scanned. An extra precaution you may want to take is to leave your workstation disconnected from the internet or your institution’s network. This prevents external viruses from damaging the media you will be accessing on this workstation, as well as keeping any viruses that might be on the transfer media from spreading to the rest of the network.
Some institutions use a specifically designed forensic workstation called a “FRED”, or “Forensic Recovery of Evidence Device.” A FRED is a computer with a high level of processing power, and which is equipped with hardware and software used for digital media imaging and transfer from a wide variety of formats. Institutions that need to process large volumes of digital storage media tend to invest in a FRED, but for smaller archives a regular workstation with the right kinds of peripherals and software is enough.
The next step is connecting the transfer media. If a piece of transfer media is able to be directly written to – for example a USB drive or a hard drive – you should never connect it directly to the workstation. Simply by connecting to a computer, there is a danger of files being changed and important metadata being lost. Instead, you should use a write blocker, which is hardware or software that prevents any accidental changes being made to the connected digital media, while still allowing the files to be viewed and transferred. Not all media requires a write blocker. Audio CDs and Video-format DVDs, for example, have their own built-in write-blocking.
After the media is connected, you can start transferring the files to the workstation. The most complete method of transfer involves making a disk image, which creates a complete bitstream copy of that disk. There are a number of software programs which create disk images, such as FTK Imager or Guymager, which is part of the suite of tools provided by Bitcurator. Sometimes, special hardware is needed for imaging certain kinds of media. Creating a disk image is useful if the media you are working with is both valuable and at risk. Because a disk image completely replicates a disk, they are often used to transfer files from legacy or degraded media and hardware. They are also useful for completely replicating the original organization of the disk’s contents and associated file metadata. The older and more fragile the hardware you are transferring from is, the greater the likelihood of the files becoming damaged or changed as you continue to work with that hardware. However, depending on the needs of your institution and what you choose to do with the digital storage media after transfer, imaging might not be necessary. Donors should also be aware that disk images can capture files they may have not intended to donate, such as deleted files.
A less intensive method of transfer is to package files using BagIt-compatible software, similar to the Network Transfer example. Once a write blocker is connected, files can be bagged from the source media, along with relevant metadata and checksums, and then copied to a secure place on your workstation. This kind of workflow might, for example, be used for recent acquisitions received from donors on a USB or external hard drive, whereas a full disk image workflow might be used for an older collection of floppy disks. The kind of transfer workflow you use should depend on the types of digital media you are working with and the priorities and policies of the archives.
Tips for specific media types
The first thing to do when preparing an optical disc for transfer is to identify what format it is. Data CDs and DVDs are used to store files, and can be interacted with much like a hard drive. Audio CDs and Video-format DVDs, on the other hand, are more difficult to transfer as the data they contain cannot be as easily accessed. Like a hard drive, they can be imaged, or you can use dedicated audio or video extraction software like Exact Audio Copy or Handbrake.
Generally, write blockers are not needed for optical discs because the disc has its own write protection. You should keep in mind, however, that Data CDs can sometimes be written to, such as CD-RWs. To access valuable files on a Data CD, it is possible to connect an external CD drive through a USB-compatible write blocker.
Read errors during the transfer process may be the result of damage to the disc, the quality of the disc drive you are using, the speed the disc is spinning, or the format of the disc. For example, CD-DA format discs sacrifice quality for added storage. More information about read errors and troubleshooting suggestions can be found in the Resources section.
For modern computers to access floppy disks, they usually need a USB-compatible floppy disk drive. If you are planning on working with a large number of floppy disks, you may want to invest in a floppy disk controller card. These allow floppy disks to be connected via USB, are compatible with many different formats of floppy disk, and come with software that reduces the number of read errors during transfer. The most widely-used of these controller cards is the Kryoflux, which was developed to aid in software preservation. For more information on the issues with transferring data from floppy disks, see “A Dogged Pursuit: Capturing Forensic Images of 3.5” Floppy Disks” in the Case Studies section.
Floppy disks in both 3.5” and 5.25” formats come with a physical write blocking tab that can be slid up to protect its files. Here is an example of a floppy disk that has had its write blocking tab turned on.
Regardless of the type of transfer being performed, you should create and retain backups of the files that have been transferred in case of data loss or other unforeseen circumstances during processing. As noted in the network transfer section, BagIt-based programs are useful for creating backups as they keep the files in one place and automatically create checksums and metadata for the files being bagged. As an extra precaution for valuable materials, it is recommended to have multiple backups stored in different places to mitigate the risk of data loss. These backups, however, may be deleted at a later date once the materials have been successfully processed and stored, according to your institution’s policies.
This is an example of how Emory University’s Archives developed a workflow for imaging floppy disks using a number of different methods, starting from the simplest and ending with the use of a kryoflux.
Case study discussing the creation of Yale’s born-digital archiving service and workflows from 2014 to 2018, as well as challenges and next steps for the service.
A write-up by Jess Whyte at the Fisher Rare Book Library about the development of a workflow for rapidly imaging and bagging preservation copies of floppy disks, as well as automatically generating a CSV metadata manifest for each image.
A detailed blog post by the Princeton University Library Rare Books & Special Collections Technical Services on setting up a digital archives transfer and processing station with FRED, Kryoflux and BitCurator working together.
This case study from Yale’s digital archiving service demonstrates how Exact Audio Copy was used to overcome the difficulties in transferring CD-DA format discs.
A popular package of many specific tools for digital forensics and processing/analysis tasks, including Guymager below. Most institutions install BitCurator in a dedicated environment (i.e. as the main operating system) on a PC used primarily for digital archives transfer, or in conjunction with a FRED.
A tool for enabling and managing remote file transfers while monitoring fixity and allowing for the pre-transfer input of metadata.
An audio file extraction, file format migration and metadata extraction tool. An additional benefit to EAC is that it designed to be able to extract the high-quality audio tracks on CD-DA discs, as detailed here. There is also an EAC Wiki.
This program allows you to calculate checksums and monitor fixity outside the structure of Archivematica. Regular fixity checks can be scheduled, or it can be used on particular files.
Automatically calculates MD5 and SHA-1 checksums. Creates a CSV file with file names and data paths on the original disk.
Linux-based open-source forensic imaging software contained within the suite of tools in the Bitcurator virtual machine.
Free cross-platform software for extracting video files from DVD and Blu-ray discs.
This guide was written specifically for archivists who wish to use the Kryoflux floppy disk controller card as a tool for preservation but have limited knowledge of digital forensics. The guide provides easy-to-use instructions as well as a deeper introduction to the use of the Kryoflux and floppy disk preservation.
This document provides a short overview of five of the most common types of optical media and recommendations for how to approach transferring their contained data.
Instructions on how to source and put together an affordable “data rescue kit” which includes explanations on what kind of peripherals are needed to transfer data from various media formats as well as tips on what kind of workstation is needed.
This is a wiki containing information on tools used for digital preservation. The tools can be grouped by functional category, or by which of the DCC Curation Lifecycle stages they fall under.
A DPC report examining techniques used for the transfer and preservation planning of digital sound and moving image content.
A short set of instructions by the Online Computer Library Center of first steps to take when preparing to transfer digital content from its original transfer media.
A more detailed companion piece to the above document which provides instructions for the entire process of media transfer.