XML = e'''X'''tensible '''M'''arkup '''L'''anguage [http://www.xml.org/].
Very generally spoken it is a simplified form of [SGML], but stricter (more regular) in some aspects:
* Singleton elements must end with />
* attribute values must be quoted
Example:
"Programming XML in Tcl" [http://www-106.ibm.com/developerworks/webservices/library/ws-xtcl.html]
surveys the state-of-the-art as of spring 2001, mainly from a [Zveno]-biased perspective.
One deficiency of that article is its neglect of [Jochen Loewer]'s [tDOM] work.
----
**Parsing XML**
* The two main standard APIs for XML [parser]s are [SAX] and [DOM].
* [tDOM] and [TclXML]/[TclDOM] are the two main Tcl extensions for parsing XML, providing both SAX and DOM implementations.
* See also [Parsing XML], [A little XML parser], [XML shallow parsing with regular expressions] and [Regular Expressions Are Not A Good Idea for Parsing XML, HTML, or e-mail Addresses].
**Related Technologies**
There are a whole host of technologies related to XML, such as [XPath] for selecting nodes from
a document, [XSL]/[XSLT] for transforming XML documents, and various tools for validating XML
documents for well-formedness and conformance to some schema definition. [tDOM] and [TclXML] both
provide good support for at least XPath and XSLT.
**Applications**
XML by itself is just a partially-standardised syntax for data. It's used as the basis for a
variety of different applications, such as:
* (X)[HTML] for web pages
* [RDF] and [OWL] for general relational/logical data models
* [DocBook] for technical documentation, along with other office document formats (e.g. Microsoft's office XML format, [excel xml], OpenDoc, etc)
* Various configuration file formats (especially in the [Java] world)
* [SOAP] and [XML-RPC] for remote procedure calls/web-services.
**Alternatives**
Alternatives to using XML for data files include:
* [Tcl] itself
* [JSON]
* ...
----
One way of specifying the valid tag structure of a class of documents is to use a Document Type Definition, [DTD] for short. This way was inherited from SGML. There are alternative ways ... XMLSchema, Relax(NG), ...
----
Perhaps the single most important introductory point to make to Tcl
developers about XML is that it's built-in! Almost--while the core
Tcl distribution doesn't know about XML, it does have excellent
[Unicode] abilities. The [Kitten] [starkit] includes an XML package while the
[ActiveTcl] installations of Tcl can easily add an XML package via [teacup].
----
tDOM builds-in a pretty-printing serialization option. Those with an interest in a comparable function
for TclDOM are welcome to try/use/improve/... dom_pretty_print [http://phaseit.net/claird/comp.lang.tcl/dom_pretty_print.html].
"[XML pretty-printing]" will eventually have more on this topic.
----
How can you start to generate your own XML documents with Tcl? In
answering just that question in a mailing list [[reference?], [Steve Ball] succinctly
advised, "When creating XML, I generally use [TclDOM]. Create a [DOM] tree in memory,
and then use 'dom::DOMImplementation serialize $doc' to generate the
XML. The TclDOM package will make sure that the generated XML is
well-formed.
Alternatively, XML is just text so there's no reason why you can't
just create the string directly. Eg:
puts $content"
The problem with this is that (a) you have to worry about the
XML syntax nitty-gritty and (b) the content variable may contain
special characters which you have to deal with.
There are also some generation packages available, like the '[html]'
package in [tcllib] (this will be added to TclXML RSN, when my
workload permits)."
[DKF] - If you're going for the cheap-hack method of XML generation mentioned above, you'll want this:
proc asXML {content {tag document}} {
set XML_MAP {
< <
> >
& &
\" "
' '
}
return <$tag>[string map $XML_MAP $content]$tag>
}
Naturally, the ''XML_MAP'' variable is factorisable...
<
> [MHo]: Why not using '''html::quoteFormValue''' for this purpose?
For generation of XML (HTML) the pure Tcl way, have a look at the xmlgen module
of TclXML on sourceforge: http://sourceforge.net/projects/tclxml/.
[DKF]: That's when you're moving away from cheap hacks. And HTML has a lot more entities than XML, though most are optional.
----
If you want to get particular about entity encoding '''arbitrary text''', this is working for me:
variable entityMap [list & &\; < <\; > >\; \" "\;\
\u0000
----
A [Cameron Laird] article on XSD and XML schema can be found at http://ldn.linuxfoundation.org/column/untaught-xml-schema .
----
A August 2009 article on how Microsoft has been awarded a software patent for XML files and processing [http://news.zdnet.com/2100-9595_22-329645.html?tag=nl.e539].
----
!!!!!!
%| [Category Data Serialization Format] | [Category XML] |%
!!!!!!