How are texts made available?
1. Stages in text processing
Every text in the Corpus of Electronic Texts goes through the
- Capture: by scanning or keyboarding. Some texts are donated
to CELT already captured and proofed.
- First proofing.
- Creation of TEI header.
- Structural markup.
- Content markup (optional, depending on
expertise available to the project).
- Second proofing (files with very dense markup might require
a third proofing).
- Parsing: software used is NSGMLS.
- Finalising of header with full bibliographic and editorial
details, resulting in the finished SGML/XML TEI-conformant file.)
- Conversion to HTML: software used is OmniMark.
- Publication in SGML and HTML.
- Concatenation of files for searching: software used is PAT.
2. File formats
Texts are available in the following formats:
- SGML/TEI: the basis for all CELT texts: usually one file
per text with extension .sgml. Files can be read either using any pain
text editor, or using SGML browsers, e.g. Panorama, MultiDoc Pro
- HTML: all CELT texts are converted to HTML for the
convenience of users. See here how SGML
markup is represented in HTML
3. SGML mark-up examples
These links are to SGML files (file extension .sgml)
- Embedded texts
- Languages: Irish, French, Classical Greek
- Names and epithets
- Manuscript: damage, marginalia, glosses
- Poetry: bardic, stanzas, blank verse
- Prose: novels, short stories
- Drama: plays, parliamentary debates