Saturday, April 11, 2009
Filed under:
xslt
- by Douglas Robar
XSLT (eXtensible Sylesheet
Language Transformation) is used
to transform XML into other formats, such as (x)HTML. In this
series of articles you'll learn the basics of using XSLT for
creating umbraco macros. This is the first post in that series.
When you create a clean XSLT macro in umbraco 4 you see the
following template code (some indenting added for increased
clarity):
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp " "> ]>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxml="urn:schemas-microsoft-com:xslt"
xmlns:umbraco.library="urn:umbraco.library"
exclude-result-prefixes="msxml umbraco.library">
<xsl:output method="xml" omit-xml-declaration="yes"/>
<xsl:param name="currentPage"/>
<xsl:template match="/">
<!-- start writing XSLT -->
</xsl:template>
</xsl:stylesheet>
Before diving in and starting to write an XSLT macro, let's take
a moment to understand the anatomy of an umbraco XSLT file. Though
you won't often need to change any of the default values,
understanding the various parts will be invaluable as we proceed,
and we'll refer back to this article often.
Line 1: XML version and encoding
<?xml version="1.0" encoding="UTF-8"?>
Umbraco provides full unicode support for websites, which means
you aren't limited to only ASCII or HTML-encoded characters. For instance, the following unicode characters might be used:
ä ¿ § ¡ ò مرحبًا καλημέρα müjde. To ensure your XSLT macro honors unicode characters you must always specify the XML
encoding.
There is no reason to change this value when using umbraco.
Line 2: !DOCTYPE and !ENTITY
<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp " "> ]>
XSLT follows all the formatting rules of XML, such as proper
nesting and closing of all tags. Though (x)HTML and XML share many
similarities they are not exactly the same. For instance,
<br> is valid HTML but <br /> is required in
XML/XHTML.
There are five character sequences (or "entities") built-in to
the XSL specification. They are: < >
' " and & The
!DOCTYPE is only required if you wish to
specify additional entities.
Besides the five built-in entities, HTML makes use of another character sequence very
frequently, the non-breaking space ( ). Line 2 adds this
entity so that you may include non-breaking spaces in your XSLT
macros.
If the entity had not been added you would have to use
&nbsp; rather than the more traditional in your
macros.
TIP: Remember to use & rather than just the & symbol in your XSLT code. Similarly, when comparing values in XSLT you cannot simply use "5 < 10" but must use "5 < 10". This is because the built-in entities resolve the problem of trying to use reserved characters in your XSLT macros. We'll see many examples of this in future articles.
As with line 1, this line rarely needs any modification, though
in extreme cases you might wish to add additional entities.
Lines 3 and 18: <xsl:stylesheet>
</xsl:stylesheet>
<xsl:stylesheet
...
...
</xsl:stylesheet>
These tags enclose the stylesheet instructions themselves. Apart
from the header information in the two lines above, all of your
XSLT will appear inside these tags.
Lines 4 and 5: XSLT version and namespace
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
Within the opening of the <xsl:stylesheet … > there are a
number of elements specifying details about this stylesheet.
Umbraco uses the Microsoft XML Parser (MSXML 6.0), which follows
the XSLT 1.0
specification. The XSLT 1.0 specification defines a standard
namespace, which is shown on line 5 and needs no modification.
Notice that all XSL functions are pre-pended with the xsl:
prefix label to indicate that they are found in the core XSL
library built into the MSXML parser.
Note: The XSLT 1.0 specification does not support all the XSLT
functions noted at http://www.w3schools.com/Xpath/xpath_functions.asp
(a great reference, by the way!), which lists the full set of
functions available in the newer XSLT 2.0 specification. I'll
discuss which XSLT functions are available in XSLT 1.0 and also
explain how you can work around the limitations of XSLT 1.0 to get
the same effect as the functions in XSLT 2.0 in another
article.
Lines 6 and 7: Additional namespaces
xmlns:msxml="urn:schemas-microsoft-com:xslt"
xmlns:umbraco.library="urn:umbraco.library"
You can specify additional namespaces and gain access to functions
beyond those built into the core XSL namespace. To access these
extra functions you need to add namespaces to the xsl:stylesheet
declaration. This is similar to the "include" or "using" feature of
other languages.
Umbraco references two additional namespaces by default, as
shown in these lines. One for the MSMXL functions (such as
node-set()), and the other for the umbraco.library API functions (such
as NiceUrl()). We'll discuss the umbraco.library functions in
detail in another post.
There is no need to change these lines, though you may wish to
include more namespaces to allow you to reference the
functions contained in other XSLT extensions, including those you
create yourself (a topic we'll talk more about in another
article).
Note: Beginning with umbraco 4.0.1, the EXSLT library namespace
and functions are also referenced by default, making it easier to
use those extended functions within your umbraco macros. The full xsl:stylesheet declaration
is shown below (some indenting and line breaks added for
clarity):
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxml="urn:schemas-microsoft-com:xslt"
xmlns:umbraco.library="urn:umbraco.library"
xmlns:Exslt.ExsltCommon="urn:Exslt.ExsltCommon"
xmlns:Exslt.ExsltDatesAndTimes="urn:Exslt.ExsltDatesAndTimes"
xmlns:Exslt.ExsltMath="urn:Exslt.ExsltMath"
xmlns:Exslt.ExsltRegularExpressions="urn:Exslt.ExsltRegularExpressions"
xmlns:Exslt.ExsltStrings="urn:Exslt.ExsltStrings"
xmlns:Exslt.ExsltSets="urn:Exslt.ExsltSets"
exclude-result-prefixes="msxml
umbraco.library
Exslt.ExsltCommon
Exslt.ExsltDatesAndTimes
Exslt.ExsltMath
Exslt.ExsltRegularExpressions
Exslt.ExsltStrings
Exslt.ExsltSets"
>
Line 8: Excluding prefixes in the output
exclude-result-prefixes="msxml umbraco.library">
When you include additional namespaces you should also list them
in the exclude-result-prefixes list. Simply list all namespaces
you've included (use the prefix after the xslns:) with a space
between each prefix.
You only need to modify this line if you include additional
namespaces.
Line 10: XSL output method
<xsl:output method="xml" omit-xml-declaration="yes"/>
This is the line you are most likely to want to change.
By default, umbraco specifies the output method as "xml". This
is fine in most cases. But there is one situation that you may encounter in which you might
prefer different behavior.
That situation is when you have an empty
tag in the output. When the output method is set to "xml" the tag will be
simplified or collapsed.
For instance, if your XSLT contained the following:
<div></div>
The output sent to the browser with method="xml" would be:
<div />
Most browsers will not handle this
gracefully, your CSS styles will not be applied as expected, and
your site may not display correctly.
There are two solutions. One is to include a test in the XSLT to ensure you
never output an empty tag that will collapse. The other is to
change the XSL output method to "html", which will preserve the
expanded tags and not collapse them.
The output sent to the browser with method="html" would be:
<div></div>
Do be aware that the "html" output method relaxes the
requirements on the output somewhat and this may introduce issues that could
cause your webpages to fail xhtml validation because of un-closed tags, as
is appropriate for HTML 4.0 but not XHTML.
According to the XSLT specification at http://www.w3.org/TR/xslt#section-HTML-Output-Method, you'll find the following:
The html output method should not output an end-tag for empty elements.
For HTML 4.0, the empty elements are:
area, base, basefont, br, col, frame, hr,
img, input, isindex, link, meta, and param.
For example, an element written as <br /> or <br></br> in the
stylesheet should be output as <br>
A more interesting situation is that the img tag will not have a closing slash with method="html".
Indeed, you may wish to set all your XSLT macros to use
method="html" and reserve method="xml" for special
cases,such as when creating RSS feeds that must be in valid XML
format with collapsed tags.
Because some tags do not get closed when using method="html" I recommend the use of
method="xml" unless you have a specific reason to use method="html". Just be sure to avoid
empty tags that would collapse and potentially break your site's css rendering.
Only in very rare and special cases would you want to include
the xml declaration (<?xml...?>) in the output, and never for
output used in (x)HTML pages no matter what setting you've used for
the output method.
Line 12: The currentPage parameter
<xsl:param name="currentPage"/>
Unlike traditional XSLT in which the entire content of a single
file is processed and transformed, a website must include some
sense of context.
What page is the website visitor currently viewing? That is what
the currentPage parameter is used for. Of all the pages in your
site, currentPage will tell you which one is being viewed.
The currentPage parameter is automatically populated by umbraco.
It is referenced as $currentPage in your XSLT and you will use it a
lot, as we'll see in future articles.
Lines 14 to 16: Template match="/"
<xsl:template match="/">
<!-- start writing XSLT -->
</xsl:template>
Templates are a lot like functions() in other languages. They
can be referenced by name (named templates) or they can run
automatically when the xml being processed matches a certain condition
(match templates).
All xml files will have a root node and will match the root or
"/" xpath query. Thus, a match template with match="/" will always
be executed, and executed first because the root node in the XML is
always the first node.
Or, putting it another way, template match="/" is a lot like the
main() or Page_Load() functions in other languages.
We'll discuss xpath as well as match and named templates in
detail in other articles. For now, the important thing to understand
is that any XSLT code you put in the match="/" template will always
be executed when the macro runs.
Conclusion
The pre-defined XSLT templates provided by umbraco can be used
"as-is" and few people bother to worry about what all those lines
are doing. You, too, might often do that, focusing your efforts on
line 15 and writing your own XSLT code.
But understanding each of the lines that make up the
structure or "anatomy" of an XSLT file will give you confidence and
guidance as you develop your XSLT skills through this series of
articles. We will look back to these settings in many of the
articles to come.