Skip to main content
Glossary Term

Standard Generalized Markup Language

Standard Generalized Markup Language (SGML) - SGML is an ISO standard: ISO 8879:1986 Information processing– Text and office systems– Standard Generalized Markup Language (SGML). - There are three versions of SGML: Original SGML, SGML (ENR), and SGML (ENR+WWW or WebSGML). - SGML is part of a trio of enabling ISO standards for electronic documents developed by ISO/IEC JTC 1/SC 34. - SGML was reworked in 1998 into XML, a successful profile of SGML. - DSSSL (ISO/IEC 10179) and HyTime are the other two ISO standards related to electronic documents. History and Terminology - SGML descended from IBM's Generalized Markup Language (GML). - Charles Goldfarb, Edward Mosher, and Raymond Lorie developed GML in the 1960s. - Goldfarb coined the GML term using their surname initials. - Goldfarb also wrote the definitive work on SGML syntax in The SGML Handbook. - SGML was originally designed to enable the sharing of machine-readable large-project documents in government, law, and industry. - Tag-validity was introduced in SGML (ENR+WWW) to support XML. - Fully tagged refers to documents with no DOCTYPE declaration or with a DOCTYPE declaration that makes no XML Infoset contributions. - Integrally stored reflects the XML requirement that elements end in the same entity in which they started. - Reference-free reflects the HTML requirement that entity references are for special characters and do not contain markup. - SGML validity commentary before 1997 mainly covers type-validity. Document Validity and Syntax - SGML (ENR+WWW) defines two kinds of validity: type-valid SGML document and tag-valid SGML document. - A type-valid SGML document has an associated document type declaration (DTD) to which it conforms. - A tag-valid SGML document is fully tagged, and it may or may not have a document type declaration. - Users may enforce additional constraints on a document, such as integrally-stored or reference-free requirements. - SGML validity supports the requirement for rigorous markup. - An SGML document consists of the SGML Declaration, the Prologue (containing a DOCTYPE declaration), and the instance itself. - SGML documents can be composed of many entities. - The SGML Declaration specifies the entities, element types, character sets, features, delimiter sets, and keywords used in the document. - XML documents have both a logical and physical structure, indicated by explicit markup. - SGML syntax has optional features that can be enabled in the SGML Declaration. Markup Minimization and Formal Characterization - SGML has features for reducing the number of characters required to mark up a document. - SGML processors need not support every available feature. - XML is intolerant of syntax omissions and does not require a DTD for checking well-formedness. - Omitting start and end tags is allowed in SGML if certain conditions are met. - The OMITTAG feature in the SGML Declaration enables the omission of tags. - SGML has features that are difficult to describe using formal automata theory. - There is no definitive classification of full SGML against a known class of formal grammar. - XML is generally parsable like a two-level grammar for non-validated XML. - The SGML productions in the ISO standard are reported to be LL(3) or LL(4). - The class of documents conforming to a given SGML document grammar forms an LL(1) language. Derivatives and Applications - XML is a profile (subset) of SGML designed to ease implementation. - XML does not use the grammar (DTD) to change delimiter maps or inform parse modes. - XML validation of elements is not active in the same sense as SGML validation. - XML without a DTD is a grammar or a language. - XML with a DTD is a metalanguage. - There are other derivatives of SGML, such as HTML and XHTML. - HTML is an application of SGML and has its own set of rules and syntax. - XHTML is an XML-based version of HTML. - XML-based derivatives provide stricter syntax rules and well-formedness requirements. - Derivatives like HTML and XHTML have simplified and specific use cases compared to SGML. - Document markup languages defined using SGML are called applications. - The Text Encoding Initiative (TEI), DocBook, CALS, and HyTime are examples of SGML-based markup languages. - Significant open-source implementations of SGML include ASP-SGML, ARC-SGML, SGMLS, and Project YAO. - SP and Jade, maintained by the OpenJade project, are common parts of Linux distributions. - The second edition of the Oxford English Dictionary is marked up with an SGML-based markup language. - The third edition of the Oxford English Dictionary is marked up as XML. - Some document markup languages related to SGML and XML cannot be processed using standard SGML and XML tools. - The Z Format markup language and programming languages like Scala are examples. - The Organization for the Advancement of Structured Information Standards (OASIS), S-expression, DSSSL, LaTeX, and other related concepts are also associated with SGML. Note: The content has been organized into five comprehensive groups, combining identical concepts while keeping the facts, statistics, and detailed points intact.