Transcription of Handwritten Manuscripts

This document outlines the standards with which Africa Media Online metadata capturers transcribe handwritten manuscripts. When a typed manuscript it digitised the pixels representing the letters on the page can be turned into machine-readable text through Optical Character Recognition (OCR). This enables the text of the manuscript to be searched by search engines. OCR, however, has not advanced to the stage yet where it can reliably turn hand-written text into machine-readable text. For this reason, then, in order to make a hand-written document searchable, one needs to create a transcription. This can be done by “keying-in” the text (typing it out) or by using voice to text software such as Dragon or an open source voice to text system such as that from Mozilla.

Until such time as we have the technology to capture transcriptions per page, this tutorial will simply make use of the Caption / Description field in the MEMAT Metadata App. How to access and use that App is adequately described elsewhere. Below are the steps to take in transcribing into that field.

Keying-In

For keying-in (typing the text) our recommendation is to do the transcription into Grammarly.com or a word processor programme with Grammarly enabled. Make sure your Grammarly is set to British English spelling and not American spelling. Once you have transcribed the whole manuscript, then copy and paste it into the Caption / Description field in the General Panel of the Metadata App and click Save. Below are the steps to creating a transcription.

Step 1: Title Field

In the Title field write a brief (if possible one word) description of the type of document e.g. Cheque, Letter, Note, Secret Code.

Step 2: Description Field

The Caption / Description field is the primary field for capturing transcriptions. This is not just the transcription of the text itself, but also structural metadata such as a table of contents.

Table of Contents

If the handwritten document has sections with headers to it, then capture a list of contents with associated page numbers. Do this in a way that reflects the headers and numbering of the original document.

Transcription

There are a number of rules to keep to in undertaking the actual transcription of documents:

Indicate Page Breaks

Handwritten documents should be captured in full. We also need to indicate the page number. Do this by writing on its own line in square brackets like this: [Page 1]. There should be a blank line above it and a blank line below it. It should be somewhat like this:

[Page 1]
The text of page 1…

[Page 2]
The text of page 2…

Crossed Out Text

Deleted text: record what you see, then cross it out e.g. Strikethrough.

Illegible Text

Indistinct / illegible text: record what can be seen of each word in separate square brackets, with an underscore for each missing letter e.g.   [S_l]    [Dav_ _ s]  [ _ _ _].

Monetary Numbers

Use a standard way of transcribing figures relating to money with the currency symbol and a comma between hundreds and thousands, between hundreds of thousands and millions and so on. Also, include a full stop between the full currency number and the cents. e.g. R2,354.22 or $7,984,832.00

Accents

Accents and umlauts: please insert e.g. née, Thöle etc, followed by the same word, in square brackets, without umlauts [Thole].

Spelling Errors

Spelling errors: type the error as you see it e.g. Nudesburg, followed by the correct spelling of the word, in square brackets e.g. [Noodsberg].

Text on Pages Behind

Text viewable on a page behind the page you are transcribing whether due to the transparency of the page or due to ragged-edges to the page: do not record text visible on a page behind the one on which you are working.

Derogatory Words

Words change in meaning over time. Words that were acceptable in the past, are no longer acceptable in the present day. In order to avoid offence, we record these words in such a way that does not reproduce them as they appear in the original. This is how they should appear: k*fir(s) [one f] , k*ffir(s) [two fs], n*gger(s), c*olie(s), n*gro(es). This also applies to swear words such as:sh*t and f*ck.

Authority List of Names

We need a constant way of writing people’s names. This reference list is found here. Use the name written in red.

Step 3: Keywords Field

Keywording standards are here, even though this refers to images.

Step 4: Date Field

How to do date formats can be found here if you search for “Date”.