Access the NEW Basecamp Support Portal


« Back to Glossary Index

Overview and Applications of XML
– XML is a markup language and file format for storing, transmitting, and reconstructing arbitrary data.
– It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.
– XML emphasizes simplicity, generality, and usability across the Internet.
– XML is widely used for representing arbitrary data structures, such as those used in web services.
– XML is commonly used for data interchange over the Internet.
– Many document formats, including RSS, Atom, Office Open XML, and XHTML, use XML syntax.
– XML is used as the base language for communication protocols like SOAP and XMPP.
– Industry data standards like Health Level 7 and OpenTravel Alliance are based on XML.
– XML underpins various publishing formats and is used extensively in publishing.

Key Terminology and Characters in XML
– XML documents consist of characters, and every legal Unicode character (except Null) can appear in an XML document.
– XML tags categorize and structurally organize information.
– XML schema (XSD) provides necessary metadata for interpreting and validating XML.
– XML attributes have a single value and can appear at most once on each element.
– XML declaration describes information about the XML document.
– XML documents consist of characters from the Unicode repertoire.
– XML includes facilities for identifying the encoding of Unicode characters and for expressing characters that cannot be used directly.
– Unicode code points within specific ranges are valid in XML 1.0 documents.
– XML 1.1 extends the set of allowed characters and restricts the use of certain control characters.
– The code point U+0000 (Null) is not permitted in any XML 1.1 document.
– The Unicode character set can be encoded into bytes using different encodings.
– XML allows the use of any Unicode-defined encodings and any preexisting text encodings.
– Well-known encodings include UTF-8 and UTF-16.
– XML recommends using UTF-8 without a BOM (Byte Order Mark).
– Various ISO/IEC 8859 encodings are subsets of the Unicode character set.

Escaping, Comments, and International Use in XML
– XML provides escape facilities for including problematic characters.
– Characters like < and & are syntax markers and should not appear outside a CDATA section. - Some character encodings only support a subset of Unicode, limiting the representation of certain characters. - It may not be possible to type certain characters on a keyboard. - Some characters have visually indistinguishable glyphs, causing confusion. - Comments can appear anywhere in a document outside other markup. - Comments cannot be nested and cannot contain the string '--'. - Entity and character references are not recognized within comments. - Characters outside the document encoding's character set cannot be represented in comments. - XML supports the direct use of almost any Unicode character. - Chinese, Armenian, and Cyrillic characters can be included in XML documents. - Proper rendering support is necessary to display non-supported characters correctly. Syntactical Correctness, Schemas, and Validation in XML - An XML document must be well-formed, satisfying syntax rules. - Only legal Unicode characters should be used in the document. - Special syntax characters like < and & should only appear when performing markup roles. - Tags must be correctly nested and case-sensitive. - Tag names have certain restrictions and cannot contain certain characters. - An XML document can be valid if it references a Document Type Definition (DTD). - XML processors can be validating or non-validating. - Validity errors should be reported, but processing can continue. - Schema languages like DTDs and XML Schema constrain the elements and attributes in a document. - XML Schema (XSD) is more powerful than DTDs and allows for detailed constraints. - RELAX NG is a standard for validating XML documents. - It has a simpler definition and validation framework than XML Schema. - RELAX NG schemas can be written in XML or a more compact non-XML syntax. - Schematron is a language for making assertions about patterns in XML documents. - It is a standard for rule-based validation. - Schematron typically uses XPath expressions. - DSDL is a multi-part ISO/IEC standard that includes different schema languages. - It includes RELAX NG, Schematron, and languages for defining datatypes and character repertoire constraints. Related Specifications, Programming Interfaces, and XML History - XML namespaces enable the use of different vocabularies in a single document without naming collisions. - XML Base defines the xml:base attribute for resolving relative URI references. - XML Information Set (Infoset) is an abstract data model for describing XML documents. - XSL is a family of languages for transforming and rendering XML documents. - XPath is a non-XML language for addressing components of an XML document. - APIs for XML processing fall into different categories, including stream-oriented, tree-traversal, data binding, and declarative transformation. - Stream-oriented APIs like SAX and StAX are fast and memory-efficient. - Tree-traversal APIs like DOM provide convenience for programmers but require more memory. - XML data binding automates the translation between XML and programming-language objects. - Declarative transformation languages like XSLT and XQuery are used for transforming and querying XML data. - XML has appeared as a first-class data type in other languages. - XML is a historical application profile of SGML. - XML 1.0 initially defined in 1998, currently in fifth edition. - XML 1.1 published on February 4, 2004, contains features to make XML easier to use. - XML 1.0 and XML 1.1 have undergone minor revisions. - XML 1.0 is widely implemented and recommended for general use.

XML (Wikipedia)

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

XML (standard)
Extensible Markup Language
StatusPublished, W3C recommendation
Year started1996; 28 years ago (1996)
First publishedFebruary 10, 1998; 25 years ago (1998-02-10)
Latest version1.1 (2nd ed.)
September 29, 2006; 17 years ago (2006-09-29)
OrganizationWorld Wide Web Consortium (W3C)
EditorsTim Bray, Jean Paoli, Michael Sperberg-McQueen, Eve Maler, François Yergeau, John W. Cowan
Base standardsSGML
Related standardsW3C XML Schema
XML (file format)
Filename extension
Internet media typeapplication/xml, text/xml
Uniform Type Identifier (UTI)public.xml
UTI conformationpublic.text
Magic number<?xml
Developed byWorld Wide Web Consortium
Type of formatMarkup language
Extended fromSGML
Extended toNumerous languages, including XHTML, RSS, Atom, and KML
Open format?Yes
Free format?Yes

The design goals of XML emphasize simplicity, generality, and usability across the Internet. It is a textual data format with strong support via Unicode for different human languages. Although the design of XML focuses on documents, the language is widely used for the representation of arbitrary data structures such as those used in web services.

Several schema systems exist to aid in the definition of XML-based languages, while programmers have developed many application programming interfaces (APIs) to aid the processing of XML data.

« Back to Glossary Index

Request an article

Please let us know what you were looking for and our team will not only create the article but we'll also email you to let you know as soon as it's been published.
Most articles take 1-2 business days to research, write, and publish.
Content/Article Request Form

Submit your RFP

We can't wait to read about your project. Use the form below to submit your RFP!
Request for Proposal

Contact and Business Information

Provide details about how we can contact you and your business.

Quote Request Details

Provide some information about why you'd like a quote.