Standard Generalized Markup Language (SGML)
– SGML is an ISO standard: ISO 8879:1986 Information processing– Text and office systems– Standard Generalized Markup Language (SGML).
– There are three versions of SGML: Original SGML, SGML (ENR), and SGML (ENR+WWW or WebSGML).
– SGML is part of a trio of enabling ISO standards for electronic documents developed by ISO/IEC JTC 1/SC 34.
– SGML was reworked in 1998 into XML, a successful profile of SGML.
– DSSSL (ISO/IEC 10179) and HyTime are the other two ISO standards related to electronic documents.
History and Terminology
– SGML descended from IBM’s Generalized Markup Language (GML).
– Charles Goldfarb, Edward Mosher, and Raymond Lorie developed GML in the 1960s.
– Goldfarb coined the GML term using their surname initials.
– Goldfarb also wrote the definitive work on SGML syntax in The SGML Handbook.
– SGML was originally designed to enable the sharing of machine-readable large-project documents in government, law, and industry.
– Tag-validity was introduced in SGML (ENR+WWW) to support XML.
– Fully tagged refers to documents with no DOCTYPE declaration or with a DOCTYPE declaration that makes no XML Infoset contributions.
– Integrally stored reflects the XML requirement that elements end in the same entity in which they started.
– Reference-free reflects the HTML requirement that entity references are for special characters and do not contain markup.
– SGML validity commentary before 1997 mainly covers type-validity.
Document Validity and Syntax
– SGML (ENR+WWW) defines two kinds of validity: type-valid SGML document and tag-valid SGML document.
– A type-valid SGML document has an associated document type declaration (DTD) to which it conforms.
– A tag-valid SGML document is fully tagged, and it may or may not have a document type declaration.
– Users may enforce additional constraints on a document, such as integrally-stored or reference-free requirements.
– SGML validity supports the requirement for rigorous markup.
– An SGML document consists of the SGML Declaration, the Prologue (containing a DOCTYPE declaration), and the instance itself.
– SGML documents can be composed of many entities.
– The SGML Declaration specifies the entities, element types, character sets, features, delimiter sets, and keywords used in the document.
– XML documents have both a logical and physical structure, indicated by explicit markup.
– SGML syntax has optional features that can be enabled in the SGML Declaration.
Markup Minimization and Formal Characterization
– SGML has features for reducing the number of characters required to mark up a document.
– SGML processors need not support every available feature.
– XML is intolerant of syntax omissions and does not require a DTD for checking well-formedness.
– Omitting start and end tags is allowed in SGML if certain conditions are met.
– The OMITTAG feature in the SGML Declaration enables the omission of tags.
– SGML has features that are difficult to describe using formal automata theory.
– There is no definitive classification of full SGML against a known class of formal grammar.
– XML is generally parsable like a two-level grammar for non-validated XML.
– The SGML productions in the ISO standard are reported to be LL(3) or LL(4).
– The class of documents conforming to a given SGML document grammar forms an LL(1) language.
Derivatives and Applications
– XML is a profile (subset) of SGML designed to ease implementation.
– XML does not use the grammar (DTD) to change delimiter maps or inform parse modes.
– XML validation of elements is not active in the same sense as SGML validation.
– XML without a DTD is a grammar or a language.
– XML with a DTD is a metalanguage.
– There are other derivatives of SGML, such as HTML and XHTML.
– HTML is an application of SGML and has its own set of rules and syntax.
– XHTML is an XML-based version of HTML.
– XML-based derivatives provide stricter syntax rules and well-formedness requirements.
– Derivatives like HTML and XHTML have simplified and specific use cases compared to SGML.
– Document markup languages defined using SGML are called applications.
– The Text Encoding Initiative (TEI), DocBook, CALS, and HyTime are examples of SGML-based markup languages.
– Significant open-source implementations of SGML include ASP-SGML, ARC-SGML, SGMLS, and Project YAO.
– SP and Jade, maintained by the OpenJade project, are common parts of Linux distributions.
– The second edition of the Oxford English Dictionary is marked up with an SGML-based markup language.
– The third edition of the Oxford English Dictionary is marked up as XML.
– Some document markup languages related to SGML and XML cannot be processed using standard SGML and XML tools.
– The Z Format markup language and programming languages like Scala are examples.
– The Organization for the Advancement of Structured Information Standards (OASIS), S-expression, DSSSL, LaTeX, and other related concepts are also associated with SGML.
Note: The content has been organized into five comprehensive groups, combining identical concepts while keeping the facts, statistics, and detailed points intact.
The Standard Generalized Markup Language (SGML; ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 states that generalized markup is "based on two postulates":
- Declarative: Markup should describe a document's structure and other attributes rather than specify the processing that needs to be performed, because it is less likely to conflict with future developments.
- Rigorous: In order to allow markup to take advantage of the techniques available for processing, markup should rigorously define objects like programs and databases.
Filename extension |
.sgml |
---|---|
Internet media type |
application/sgml, text/sgml |
Uniform Type Identifier (UTI) | public.xml[citation needed] |
Developed by | ISO |
Type of format | Markup language |
Extended from | GML |
Extended to | HTML, XML |
Standard | ISO 8879 |
DocBook SGML and LinuxDoc are examples which used SGML tools.