If the parser returns always byte strings for all text nodes and attribute values, it is up to the programmer to correctly interpret them; for example to transform the value of href attributes to URI references so we can work with them more comfortably (see ChapterĀ 11 for details on URI references).
To make this task easier itools offers support, out of the box, for several common XML namespaces. One of them is XHTML:
>>> from itools.xml import XMLParser, START_ELEMENT
>>> from itools.xml import get_namespace
>>> import itools.html
>>>
>>> data = ('<a xmlns="http://www.w3.org/1999/xhtml"'
... ' href="http://www.example.com"'
... ' title="Example" />')
>>> for type, value, line in XMLParser(data):
... if type == START_ELEMENT:
... tag_uri, tag_name, attributes = value
... for attr_uri, attr_name in attributes:
... attr_value = attributes[(attr_uri, attr_name)]
... namespace = get_namespace(attr_uri)
... type = namespace.get_datatype(attr_name)
... attr_value = type.decode(attr_value)
... print attr_name, type
... print repr(attr_value)
... print
...
title <class 'itools.datatypes.primitive.Unicode'>
u'Example'
None <class 'itools.datatypes.primitive.String'>
'http://www.w3.org/1999/xhtml'
href <class 'itools.datatypes.primitive.URI'>
<itools.uri.generic.Reference object at 0xa3f368>
The function get_namespace will return the namespace handler for the given URI. Then we can use the get_datatype method to get the datatype (see ChapterĀ 4) that will allow us to deserialize the attribute value.
The package itools.html is the one that actually implements the namespace handler for XHTML.