Data Formats
DiML and XDiML
Marking up documents with DiML-DTD
DiML (Dissertation Markup Language) is a markup language to which dissertations which exist as word processing files are converted. Specifically designed for the archiving of dissertations, DiML offers a range of markup elements specially tailored for the individual structures of this type of document. The more precisely an archived text and its main components are classified and individually defined, the better targeted a machine-based search can be made of these discrete elements, and results obtained within a matter of seconds.
All files marked up using DiML are divided into three basic sections
- Title page (front)
- Main part (body)
- Appendix (back)
The elements which structure the title page (front) also constitute the metadata of the document. All important bibliographical information is recorded and preserved here, including title, name of author, examiner and submission date.
The text of the doctoral thesis itself is framed by the tag (body). Besides important structural characteristics such as chapters, paragraphs, headings, tables and illustrations, key semantic elements and indexed items such as quotations, definitions, names of people and places and key terms can be marked up in the main part (body); these can then be used as the basis for a targeted search. Also, multimedia elements (marked by the tag (mm)) can be described as an independent element class and individually marked up with supplementary information. The following abbreviations are used specifically for distinguishing between different multimedia elements
- MATH for mathematical formulae
- CHEMISTRY for chemical structural formulae
- IMAGE for image data
- DIAGRAM for diagrams in external formats
- TABLE for tables in the form of a graphic file
A third and final section (back) is included for the appendices typically added to a dissertation. This section is for the bibliography, the list of publications, the mandatory CV, the declaration of independent work and, if appropriate, any acknowledgements. This section also covers all possible types of supplementary or sample materials such as tables, lists of illustrations, experiment records or program source codes.
For users of word processing programs, the computer centre of the Humboldt University in Berlin has compiled a list of the main format templates used for marking up digitally published doctoral theses. At the Humboldt University, all dissertations which are to be published electronically must be marked up using this format template catalogue. The mark up catalogue for the particular software, and the related instructions, can be downloaded from the Document and Publication Server of Berlin HU.
DiML has now been further developed into XDiML. The new format corresponds to the latest XML standards, including the definition in the form of the DTD/schema. XDiML is structured in individual modules, enabling authors to use additional modules or delete any modules not required. XDiML also contains standard DTDs such as MathML or, for tables, CALS-Table.DTD.
XDiML supports word processing in StarOffice7 format. This file format is fully XML compatible and is used by free word processing programs such as Koffice and OpenOffice. The integration of self-defined filters allows extensions to be made (saved as XDiML).
DiML - DTD of HU Berlin
in XML
XDIML -
Documentation standard for dissertations in XML (available
only in German)
Document
template of HU Berlin for Microsoft Word (available only in
German)
Document
template of HU Berlin for StarOffice or
OpenOffice (available only in German)