As described in our Note on Encoding, WTW enriches its texts using SGML (Standard Generalized Markup Language) according to the TEI guidelines. This markup can be divided into three types: Structure (Paragraphs, etc.), Basic Content (Dates, etc.), and Advanced Content (Analytical Categories). Owing to time constraints, many e-text projects limit their markup to type #1 and sometimes #2. But WTW has decided to add a certain amount of type #3 markup, our key feature. Thus, besides some structural and basic markup we are encoding our texts with content-based tags grouped within four broad categories: Ethnicity, Gender Marking, Transportation and Women's Occupations (see Note on Encoding). We are able to facilitate searches on these non-standard categories (and subcategories) by attaching our tags to the SGML elements "InterpGrp" and "Interp." These tags may then be used for metadata searches at a level beyond full-text searching.
Sample Analysis rather than Comprehensive Critical Editions
WTW does not encode its texts with every conceivable category of interest to women's travel scholars, since we do not aim to create definitive critical editions. We seek only to provide sample categories that are likely to interest most users of the collection. Our goal is two-fold:
--To provide students and researchers with an additional avenue to some part of our texts, one that allows for more extended comparison and interpretation;
--To demonstrate how to use SGML for purposes of analysis. We hope that users will build on our sample categories. We can show them how to take our standardized SGML-based analytical structure, fill in their own categories, and use them to analyze copies of these texts or others.
The Pitfalls--and Strengths--of Analytical Tagging
Our categories have been developed by the WTW steering committee in consultation with our Advisory Board. We attempt to apply them consistently (see Note on Encoding) and we believe that this kind of markup significantly expands the value of our texts. But like all scholarly interpretations our categories are open to question. While two of them (Transportation and Women's Occupations) are relatively easy to tag in an objective fashion, the other two (Ethnicity and Gender Marking) are often reflected in subtle ways that are not easy to encode--and some might say that they ought to be avoided. But we believe that the research potential of SGML-based analytical markup far outweighs possible objections. The markup underlies the text invisibly; users need not consider our subjective categories if they do not wish to do so. Unlike the printed page, electronic texts are easy to view in different ways--and easy to change. Thus, although we welcome comments and are prepared to make adjustments, we think that our limited analytical structure enhances our corpus considerably. Its limitations arise not so much from questions of accuracy, but of time (the time required for encoding).