XML

  • XML (Extensible Markup Language)
  • Specifies how to identify data in a set of files
  • Does not specify how to display the data
  • For web content, allows the data for a page to be stored independently of the HTML (presentation details)
  • Allows data to be stored and shared using plain text files (very portable)
  • XML is a subset of SGML

XML Syntax

  • White space in the data is left as-is (not trimmed like in HTML)
  • CR LF is converted to LF (like Unix)
  • XML is extensible; new tags can be added to a file without affecting the existing usage of the file

Elements

  • All XML files must have a single root element (<note>...</note>)
  • All container elements must have a closing tag (<b>...</b>)
  • All container elements must be properly nested (<b><i>...</i></b>)
  • Empty element: <tag-name/>
Names
  • Start with a letter or underscore
  • Can contain alphanumeric characters, underscores, hyphens, and periods
  • Favor not using hyphens and periods in element names
  • Element names must not start with the letters "xml" (regardless of case)
Attributes
  • Attribute values must always be quoted (<note id="12">...</note>)
  • Single or double quotes can be used for attribute values
  • Data can be stored in child elements or in attributes, but it's best to use child elements for data, and to use attributes for info that does not relate to the data (or for meta data, like an id)

Validity

  • A "well formed" XML file is one that conforms to the above XML syntax
  • A "valid" XML file is one that is "well formed" and conforms to the rules of a DTD or XML Schema (validation criteria)
  • By design, a program should stop processing an XML document if it finds a validation error

XML declaration

  • <?xml version="1.0" encoding="UTF-8"?>
  • XML prolog
  • First line of the XML file
  • The XML declaration is not a part of the XML document itself (does not require a closing tag)

Comments

  • <!-- This is a comment -->