This document outlines the standards with which Africa Media Online metadata capturers transcribe handwritten manuscripts. When a typed manuscript it digitised the pixels representing the letters on the page can be turned into machine-readable text through Optical Character Recognition (OCR). This enables the text of the manuscript to be searched by search engines. OCR, however, has not advanced to the stage yet where it can reliably turn hand-written text into machine-readable text. For this reason, then, in order to make a hand-written document searchable, one needs to create a transcription. This can be done by “keying-in” the text (typing it out) or by using voice to text software such as Dragon or an open source voice to text system such as that from Mozilla.
Until such time as we have the technology to capture transcriptions per page, this tutorial will simply make use of the Caption / Description field in the MEMAT Metadata App. How to access and use that App is adequately described elsewhere. Below are the steps to take in transcribing into that field.
For keying-in (typing the text) our recommendation is to do the transcription into Grammarly.com or a word processor programme with Grammarly enabled. Make sure your Grammarly is set to British English spelling and not American spelling. Once you have transcribed the whole manuscript, then copy and paste it into the Caption / Description field in the General Panel of the Metadata App and click Save. Below are the steps to creating a transcription.
Step 1: Title Field
In the Title field write a brief (if possible one word) description of the type of document e.g. Cheque, Letter, Note, Secret Code.
Step 2: Description Field
The Caption / Description field is the primary field for capturing transcriptions. This is not just the transcription of the text itself, but also structural metadata such as a table of contents. This must be written in the present tense, e.g. “The committee sends this letter for immediate review.” Note “sends” is used and not “sent” although the document was written sometime ago. Another example is, “Mandela makes a speech”, note “makes” is used instead of “made”, although he is deceased.
Initially, we will not transcribe multipage handwritten manuscripts. We want to leave these for a time when we have voice-to-text software in place. As we go through the manuscripts, however, we do want to indicate a document that requires multi-page transcription. If you come across a multipage hand-written manuscript, or multipage both handwritten and printed then, please place the following statement in the caption/description field: “Needs multi-page transcription”. Later on, then, we will be able to search for that phrase and that will return all such manuscripts that require transcription.
Table of Contents
If the handwritten document has sections with headers to it, then capture a list of contents with associated page numbers. Do this in a way that reflects the headers and numbering of the original document.
There are a number of rules to keep to in undertaking the actual transcription of documents:
Indicate Page Breaks
Handwritten documents should be captured in full. We also need to indicate the page number. Do this by writing on its own line in square brackets like this: [Page 1]. There should be a blank line above it and a blank line below it. It should be somewhat like this:
The text of page 1…
The text of page 2…
Layout of Manuscript
To assist the reader of the transcript it may be necessary to guide the reader across the document. Where columns can be identified as in the example above, insert [column 1:] and transcribe the text within that column. If there are two columns needed indicate the number as previously advised. E.g. using the above as an example you would first transcribe the column to your left and then to the right and then state [body of text:] to transcribe the subject matter above.
Where there are annotations on the page, for example indicate this by inserting [annotation on left of page:]. For consistency work in a clockwise manner.
Where the document includes an image with in the body of the text indicate this with the following, [image]. This is used when the author has drawn a picture or diagram and this needs to be referenced in the document. This is particulary useful where the author has annotated the image with labels or text.
Crossed Out Text
Deleted text: record what you see within square bracket with stikethrough in front of the text indicting it has been struckthrough in the text, e.g. [strikethrough: Sarah]
Indistinct / illegible text: record what can be seen of each word in separate square brackets, with an underscore for each missing letter e.g. [S_l] [Dav_ _ s] [ _ _ _].
Abbreviated words should be written as found within text, but written out in full in the keyword field.
Use a standard way of transcribing figures relating to money with the currency symbol and a comma between hundreds and thousands, between hundreds of thousands and millions and so on. Also, include a full stop between the full currency number and the cents. e.g. R2,354.22 or $7,984,832.00
Accents and umlauts: please insert e.g. née, Thöle etc, followed by the same word, in square brackets, without umlauts [Thole].
Where you find a signature, write out the word “signture” in square brackets to indicate where the person has signed in the text, e.g. [signature]
Spelling errors: type the error as you see it e.g. Nudesburg, followed by the word sic, in square brackets e.g. [sic]. Than place correct spelling in keyword field.
Text on Pages Behind
Text viewable on a page behind the page you are transcribing whether due to the transparency of the page or due to ragged-edges to the page: do not record text visible on a page behind the one on which you are working.
Words change in meaning over time. Words that were acceptable in the past, are no longer acceptable in the present day. In order to avoid offence, we record these words in such a way that does not reproduce them as they appear in the original. This is how they should appear: k*fir(s) [one f] , k*ffir(s) [two fs], n*gger(s), c*olie(s), n*gro(es). This also applies to swear words such as:sh*t and f*ck.
Authority List of Names
We need a constant way of writing people’s names. This reference list is found here. Use the name written in red.
For forms, write a brief summary of what the form is about (not more than two sentences), and then only transcribe the hand-written text separating fields with a colon “;”.
Separating Sentences or Paragraphs
At the end of each sentence or paragraph use two forward slashes to indicate the spacing within the text, e.g.
1) Office. staff: Personnel.//
2) Interviews – more staff
3) Guarding //
The reason for this is, because the metadata field does not recognise spacing.
Step 3: Keywords Field
Keywording standards are here, even though this refers to images.
Step 4: Date Field
How to do date formats can be found here if you search for “Date”. The date the manuscript was intially written, for example a letter, this is the date that should be placed in the date field. Other dates to be considered are conference dates as they can strech out over a number of days, we record the first day of the conference, e.g. 27-29 November 1989, write 1989/11/27.
Step 4: Location Field
Write the location which the letter was sent from. Manuscripts that are memos, minutes, or invitations the location which this event took place should be filled into the location field.
Step 5: Copyright Classification Field
Ensure that this field is completed and corresponds to the date of the document. For transcripts select the literary field, where you will find a list of dates. This is very important, because of copyrights embargoes.