CELT Project (MS image source: CPG 359 copyright Uni-Bibl. Heidelberg)
CELT - Corpus of Electronic Texts.
Documents of Ireland
Home About News FAQ Published Captured Search Languages Contact Resources People


How are texts made available?

1. Stages in text processing

Every text in the Corpus of Electronic Texts goes through the following stages:

  1. Capture: by scanning or keyboarding. Some texts are donated to CELT already captured and proofed.
  2. First proofing.
  3. Creation of TEI header.
  4. Structural markup.
  5. Content markup (optional, depending on expertise available to the project).
  6. Second proofing (files with very dense markup might require a third proofing).
  7. Parsing: software used is NSGMLS.
  8. Finalising of header with full bibliographic and editorial details, resulting in the finished SGML/XML TEI-conformant file.)
  9. Conversion to HTML: software used is OmniMark.
  10. Publication in SGML and HTML.
  11. Concatenation of files for searching: software used is PAT.

2. File formats

Texts are available in the following formats:

  1. SGML/TEI: the basis for all CELT texts: usually one file per text with extension .sgml. Files can be read either using any pain text editor, or using SGML browsers, e.g. Panorama, MultiDoc Pro
  2. HTML: all CELT texts are converted to HTML for the convenience of users. See here how SGML markup is represented in HTML

3. SGML mark-up examples
These links are to SGML files (file extension .sgml)

University College Cork

© 1997–2021 Corpus of Electronic Texts
Email CELT: b.faerber(at)ucc.ie

University College Cork