The ENCRRC Project currently enriches its texts using SGML (Standard Generalized Markup Language), according to the TEI-Lite version of the guidelines prepared by the TEI (Text Encoding Initiative). And, as noted on our project home page, we also attempt to follow the Level 4 (Basic Content Analysis) recommendations endorsed by the Digital Library Federation. But for ease of encoding we subdivide our Basic Content Analysis into (1) Structural and (2) Basic Content encoding. We also perform (3) extensive analytical encoding:
NB: See below for a summary of our Attribute ValuesSTRUCTURE
When considered appropriate, ENCRRC makes sparing use of the following structural elements (besides <text> and <body>):
<front>: used for prefaces, tables of contents; <back>: used for afterwords, appendices, endnotes, apparatus (when included); <titlepage>: including verso if present, divided by < pb N="verso" >; <list>: used with <item> to reflect tables of contents, errata, subcription lists, "other titles by the same author," cast lists, etc.; <div1, etc.>: used with N= attribute to record sequence; <head>; <argument>; <epigraph>; <opener>; <dateline>; <salute>; <signed>; <closer>; <trailer>; <q>: used only for quotations that are set off typographically (ie, not used for inline quotations, or for direct speech in prose fiction); <q>: used for letters quoted in text as follows: q/text/body/div1 type=letter, including "opener, "dateline," "salute," "signed," "closer" as appropriate; <p>; <lg>: used within "div" for all verse of more than one line--even wihout stanzas-- to assist retrieval; <l>: include use of the REND attribute to record indentation; <milestone>: used with UNIT="typography" N="****" to represent divisions within poems so marked; <pb>: the page break is placed at the beginning of the page; <figure>: also used to encode frontispieces, within a separate div/p.
NB:
*Regarding <note>: the ENCRRC project does not currently reproduce notes (although this policy is being re-examined). BASIC CONTENT
When considered appropriate, ENCRRC makes sparing use of the following basic content elements:
<foreign lang=xx> using 3-character language abbreviations. If appropriate, this tag also includes <rend=ital>; <title>; <emph>:
(a) used for for words that are emphasized linguistically or rhetorically, rather than only typographically;
(b) easiest to spot in dialog; <hi>:
(a) used for ambiguous and/or typographically emphasized text that is not "foreign," "title," "emph";
(b) often used in texts with multiple instances of italics;
(c) used--instead of <q>--for inline quotations, but only when italicized; <sic>: used to indicate typographic errors, with the CORR attribute to note corrections; <reg>: used in preference to <orig>, <corr>, etc., to regularize unusual forms of names in text, together with the ORIG attribute to indicate form in source text; <add>; <delete>; <unclear>; <sp>: used to encode speeches, with speakers identified within < speaker > elements;