How texts are made available at CELT

FAQ

How are texts made available?

1. Stages in text processing

Every text in the Corpus of Electronic Texts goes through the following stages:

Capture: by scanning or keyboarding. Some texts are donated to CELT already captured and proofed.
First proofing.
Creation of TEI header.
Structural markup.
Content markup (optional, depending on expertise available to the project).
Second proofing (files with very dense markup might require a third proofing).
Parsing: software used is NSGMLS.
Finalising of header with full bibliographic and editorial details, resulting in the finished SGML/XML TEI-conformant file.)
Conversion to HTML: software used is OmniMark.
Publication in SGML and HTML.
Concatenation of files for searching: software used is PAT.

2. File formats

Texts are available in the following formats:

SGML/TEI: the basis for all CELT texts: usually one file per text with extension .sgml. Files can be read either using any pain text editor, or using SGML browsers, e.g. Panorama, MultiDoc Pro
HTML: all CELT texts are converted to HTML for the convenience of users. See here how SGML markup is represented in HTML

3. SGML mark-up examples
These links are to SGML files (file extension .sgml)