Misc

Case sensitivity

  • XML is case sensitive.

Comments

<!-- comment -->

CDATA

  • Character data
  • Used to escape a block of text containing mark up characters
  • Required for any text containing < or &
  • The end result is the same as if individual characters were escaped (&lt; &amp;)
<![CDATA[...]]>

Unicode

  • The XML specification requires support for UTF-8 and UTF-16.
  • UTF-8 is used by default.
ISO-8859-1
  • Latin-1
  • Widely supported
  • Subset of UTF-8

XML utilities

Linux

XML parsers
XML editors
XSLT engines
General S-converters
General N-converters
  • Tidy (HTML to XHTML)

XML converters

  • General N-converters (Non-XML converters): Non-XML to XML
  • Specific N-converters: Non-XML to specific document type of XML
  • General S-converters: Automated processing of XML documents
  • Publishing converters: XML to publishing format (for distribution)

RELAX

  • RELAX (Regular Language description for XML)
  • Validation criteria
  • Uses XML syntax for structure relationships, and the XML Datatype Schema for datatypes

TREX

  • TREX (Tree Regular Expressions for XML)
  • Validation criteria
  • Uses patterns for describing the structure and content of an XML document

Misc

  • XML parsers fall into two classes: validating (optionally checks the XML against a given DTD) and non-validating
  • The namespace locations (URIs) used for XML Schemas do not point to anything; they're just unique identifiers (though sometimes the location is also used to provide related documentation about the schema)
  • XDR (XML Data Reduced) (An MS standard for XML Schemas)