Glossary Term
Standard Generalized Markup Language
Standard Generalized Markup Language (SGML)
- SGML is an ISO standard: ISO 8879:1986 Information processing– Text and office systems– Standard Generalized Markup Language (SGML).
- There are three versions of SGML: Original SGML, SGML (ENR), and SGML (ENR+WWW or WebSGML).
- SGML is part of a trio of enabling ISO standards for electronic documents developed by ISO/IEC JTC 1/SC 34.
- SGML was reworked in 1998 into XML, a successful profile of SGML.
- DSSSL (ISO/IEC 10179) and HyTime are the other two ISO standards related to electronic documents.
History and Terminology
- SGML descended from IBM's Generalized Markup Language (GML).
- Charles Goldfarb, Edward Mosher, and Raymond Lorie developed GML in the 1960s.
- Goldfarb coined the GML term using their surname initials.
- Goldfarb also wrote the definitive work on SGML syntax in The SGML Handbook.
- SGML was originally designed to enable the sharing of machine-readable large-project documents in government, law, and industry.
- Tag-validity was introduced in SGML (ENR+WWW) to support XML.
- Fully tagged refers to documents with no DOCTYPE declaration or with a DOCTYPE declaration that makes no XML Infoset contributions.
- Integrally stored reflects the XML requirement that elements end in the same entity in which they started.
- Reference-free reflects the HTML requirement that entity references are for special characters and do not contain markup.
- SGML validity commentary before 1997 mainly covers type-validity.
Document Validity and Syntax
- SGML (ENR+WWW) defines two kinds of validity: type-valid SGML document and tag-valid SGML document.
- A type-valid SGML document has an associated document type declaration (DTD) to which it conforms.
- A tag-valid SGML document is fully tagged, and it may or may not have a document type declaration.
- Users may enforce additional constraints on a document, such as integrally-stored or reference-free requirements.
- SGML validity supports the requirement for rigorous markup.
- An SGML document consists of the SGML Declaration, the Prologue (containing a DOCTYPE declaration), and the instance itself.
- SGML documents can be composed of many entities.
- The SGML Declaration specifies the entities, element types, character sets, features, delimiter sets, and keywords used in the document.
- XML documents have both a logical and physical structure, indicated by explicit markup.
- SGML syntax has optional features that can be enabled in the SGML Declaration.
Markup Minimization and Formal Characterization
- SGML has features for reducing the number of characters required to mark up a document.
- SGML processors need not support every available feature.
- XML is intolerant of syntax omissions and does not require a DTD for checking well-formedness.
- Omitting start and end tags is allowed in SGML if certain conditions are met.
- The OMITTAG feature in the SGML Declaration enables the omission of tags.
- SGML has features that are difficult to describe using formal automata theory.
- There is no definitive classification of full SGML against a known class of formal grammar.
- XML is generally parsable like a two-level grammar for non-validated XML.
- The SGML productions in the ISO standard are reported to be LL(3) or LL(4).
- The class of documents conforming to a given SGML document grammar forms an LL(1) language.
Derivatives and Applications
- XML is a profile (subset) of SGML designed to ease implementation.
- XML does not use the grammar (DTD) to change delimiter maps or inform parse modes.
- XML validation of elements is not active in the same sense as SGML validation.
- XML without a DTD is a grammar or a language.
- XML with a DTD is a metalanguage.
- There are other derivatives of SGML, such as HTML and XHTML.
- HTML is an application of SGML and has its own set of rules and syntax.
- XHTML is an XML-based version of HTML.
- XML-based derivatives provide stricter syntax rules and well-formedness requirements.
- Derivatives like HTML and XHTML have simplified and specific use cases compared to SGML.
- Document markup languages defined using SGML are called applications.
- The Text Encoding Initiative (TEI), DocBook, CALS, and HyTime are examples of SGML-based markup languages.
- Significant open-source implementations of SGML include ASP-SGML, ARC-SGML, SGMLS, and Project YAO.
- SP and Jade, maintained by the OpenJade project, are common parts of Linux distributions.
- The second edition of the Oxford English Dictionary is marked up with an SGML-based markup language.
- The third edition of the Oxford English Dictionary is marked up as XML.
- Some document markup languages related to SGML and XML cannot be processed using standard SGML and XML tools.
- The Z Format markup language and programming languages like Scala are examples.
- The Organization for the Advancement of Structured Information Standards (OASIS), S-expression, DSSSL, LaTeX, and other related concepts are also associated with SGML.
Note: The content has been organized into five comprehensive groups, combining identical concepts while keeping the facts, statistics, and detailed points intact.