Chapter 33 Data Management for Recordings

When making recordings, you have to be very diligent in your data management. This is because recorded data are often considered personal data (in the meaning of the GDPR). This chapter describes de default template you can use and adjust to your case.

33.1 Software

This list of software contains a selection of open source applications; the type of application is also described in case you want to use an alternative.

  • KeePass2: Password Manager, where you store the passwords.
  • 7-zip: Archiving, used to encrypt and archive personal data
  • Any cloud service; this does not have to be secure as it will only have encrypted files.

33.2 Preparations

  1. Make sure everybody who needs access to the data installs the required software (i.e. KeePass2 and 7-zip).
  2. Create a directory for the project on the cloud sync service.
  3. Share that directory with everybody in the project.
  4. In that directory in the cloud sync service, create a KeePass2 database for this project. Use a password with at least 128 bits of entropy to secure it.
  5. Send that password to all project members using a safe channel. Safe channels are Signal and OneTimeSecret.
  6. Make sure they store that password in their own password managers. If any of them don’t yet use a password manager, support them in installing KeePass2. Ensure that they use a sufficiently strong password for that database.

33.3 Steps after a recording was made

When a recording was made, you take the following steps:

  1. Create the unique identifier for this recording. This unique identifier consist of the date (in the ISO 8601 standard, three dashes (-), and a digit. THe digit is the number of the interview on that day (e.g. always ‘1’ if there was only one interview; ‘1’ and ‘2’ for the first and second interviews on that day, etc). An example identifier is 2024-01-12---1. You will use this identifier in the filenames of the files you will create in subsequent steps.
  2. Open your personal password manages (e.g. KeePass2) and get the password for the project KeePass2 database.
  3. Navigate to the project’s cloud sync service directory and open the project’s KeePass2 database using that password.
  4. Create a password for this recording (e.g. by pressing CTRL and I simultaneously).
  5. As a username for the password, specify the unique identifier.
  6. Make sure the password has at least 128 bits of entropy.
  7. Save the database so it can sync to the cloud service.
  8. Copy the recording from the recording device to the PC.
  9. Open 7-zip and navigate to the directory where you stored the file with the recording.
  10. Select the file and click the “Add” button to add it to a new archive.
  11. In the dialog that appears, make the following selections:
    1. As a filename, type in the unique identifier for this recording with the extension .zip, for example 2024-01-12---1.zip.
    2. As Archive Format, select .zip or .7z.
    3. As Encryption Method, select AES-256.
  12. Save the archive and store is in the project’s cloud service directory.
  13. Contact another researcher to have them verify that they can access the data:
    1. They have to retrieve the password from the project’s KeePass2 database;
    2. Open the archive;
    3. Specify that password;
    4. Extract the data file somewhere;
    5. Open the recording and play it to ensure it works.
  14. Once you have confirmation that the recording can be opened by somebody else, delete the unarchived version from the recording device as well as from your PC, so that only the version in the project’s encrypted file that you created in the previous steps remains.

33.4 Steps to transcribe

  1. Extract the file with the recording from the encrypted archive using the password stored in the project’s KeePass2 database.
  2. Upload it to the transcription service (e.g. AmberScript).
  3. When the transcription is done, save the transcript (e.g. 2024-01-12---1---transcript--raw.txt).
  4. Add it to the recording’s encrypted archive.
  5. Delete the unencrypted version.

33.5 Steps to anonimize

  1. Extract the file with the raw transcript (e.g. 2024-01-12---1---transcript--raw.txt) from the encrypted archive using the password stored in the project’s KeePass2 database.
  2. Anonimize the transcript, and dave the anonimized version as 2024-01-12---1---transcript--anonymous.txt

33.6 Steps to prepare for coding

  1. Open the anonymized transcript (e.g. 2024-01-12---1---transcript--anonymous.txt).
  2. Clean it using the {rock} package, and save the cleaned version (e.g. 2024-01-12---x---transcript--clean.rock).
  3. Add Utterance Identifiers (UIDs) using the {rock} package, and save the cleaned version (e.g. 2024-01-12---x---transcript--UIDs.rock).

33.7 Steps to code

  1. Open the version of the transcript with UIDs (e.g. 2024-01-12---x---transcript--UIDs.rock)
  2. Save it with suffix “coded” (e.g. 2024-01-12---x---transcript--coded.rock)

33.8 Steps to publish

  1. Place the anonimized transcript in the project’s git repository under the Open Data Commons Open Database License (ODbL; https://opendatacommons.org/licenses/odbl/).