1.. _unicode_toplevel: 2 3=================== 4The Unicode Chapter 5=================== 6 7In normal Mako operation, all parsed template constructs and 8output streams are handled internally as Python 3 ``str`` (Unicode) 9objects. It's only at the point of :meth:`~.Template.render` that this stream of Unicode objects may be rendered into whatever the desired output encoding 10is. The implication here is that the template developer must 11:ensure that :ref:`the encoding of all non-ASCII templates is explicit 12<set_template_file_encoding>` (still required in Python 3, although Mako defaults to ``utf-8``), 13that :ref:`all non-ASCII-encoded expressions are in one way or another 14converted to unicode <handling_non_ascii_expressions>` 15(not much of a burden in Python 3), and that :ref:`the output stream of the 16template is handled as a unicode stream being encoded to some 17encoding <defining_output_encoding>` (still required in Python 3). 18 19.. _set_template_file_encoding: 20 21Specifying the Encoding of a Template File 22========================================== 23 24.. versionchanged:: 1.1.3 25 26 As of Mako 1.1.3, the default template encoding is "utf-8". Previously, a 27 Python "magic encoding comment" was required for templates that were not 28 using ASCII. 29 30Mako templates support Python's "magic encoding comment" syntax 31described in `pep-0263 <http://www.python.org/dev/peps/pep-0263/>`_: 32 33.. sourcecode:: mako 34 35 ## -*- coding: utf-8 -*- 36 37 Alors vous imaginez ma surprise, au lever du jour, quand 38 une drôle de petite voix m’a réveillé. Elle disait: 39 « S’il vous plaît… dessine-moi un mouton! » 40 41As an alternative, the template encoding can be specified 42programmatically to either :class:`.Template` or :class:`.TemplateLookup` via 43the ``input_encoding`` parameter: 44 45.. sourcecode:: python 46 47 t = TemplateLookup(directories=['./'], input_encoding='utf-8') 48 49The above will assume all located templates specify ``utf-8`` 50encoding, unless the template itself contains its own magic 51encoding comment, which takes precedence. 52 53.. _handling_non_ascii_expressions: 54 55Handling Expressions 56==================== 57 58The next area that encoding comes into play is in expression 59constructs. By default, Mako's treatment of an expression like 60this: 61 62.. sourcecode:: mako 63 64 ${"hello world"} 65 66looks something like this: 67 68.. sourcecode:: python 69 70 context.write(str("hello world")) 71 72That is, **the output of all expressions is run through the 73``str`` built-in**. This is the default setting, and can be 74modified to expect various encodings. The ``str`` step serves 75both the purpose of rendering non-string expressions into 76strings (such as integers or objects which contain ``__str()__`` 77methods), and to ensure that the final output stream is 78constructed as a Unicode object. The main implication of this is 79that **any raw byte-strings that contain an encoding other than 80ASCII must first be decoded to a Python unicode object**. 81 82Similarly, if you are reading data from a file that is streaming 83bytes, or returning data from some object that is returning a 84Python byte-string containing a non-ASCII encoding, you have to 85explicitly decode to Unicode first, such as: 86 87.. sourcecode:: mako 88 89 ${call_my_object().decode('utf-8')} 90 91Note that filehandles acquired by ``open()`` in Python 3 default 92to returning "text": that is, the decoding is done for you. See 93Python 3's documentation for the ``open()`` built-in for details on 94this. 95 96If you want a certain encoding applied to *all* expressions, 97override the ``str`` builtin with the ``decode`` built-in at the 98:class:`.Template` or :class:`.TemplateLookup` level: 99 100.. sourcecode:: python 101 102 t = Template(templatetext, default_filters=['decode.utf8']) 103 104Note that the built-in ``decode`` object is slower than the 105``str`` function, since unlike ``str`` it's not a Python 106built-in, and it also checks the type of the incoming data to 107determine if string conversion is needed first. 108 109The ``default_filters`` argument can be used to entirely customize 110the filtering process of expressions. This argument is described 111in :ref:`filtering_default_filters`. 112 113.. _defining_output_encoding: 114 115Defining Output Encoding 116======================== 117 118Now that we have a template which produces a pure Unicode output 119stream, all the hard work is done. We can take the output and do 120anything with it. 121 122As stated in the :doc:`"Usage" chapter <usage>`, both :class:`.Template` and 123:class:`.TemplateLookup` accept ``output_encoding`` and ``encoding_errors`` 124parameters which can be used to encode the output in any Python 125supported codec: 126 127.. sourcecode:: python 128 129 from mako.template import Template 130 from mako.lookup import TemplateLookup 131 132 mylookup = TemplateLookup(directories=['/docs'], output_encoding='utf-8', encoding_errors='replace') 133 134 mytemplate = mylookup.get_template("foo.txt") 135 print(mytemplate.render()) 136 137:meth:`~.Template.render` will return a ``bytes`` object in Python 3 if an output 138encoding is specified. By default it performs no encoding and 139returns a native string. 140 141:meth:`~.Template.render_unicode` will return the template output as a Python 142``str`` object: 143 144.. sourcecode:: python 145 146 print(mytemplate.render_unicode()) 147 148The above method disgards the output encoding keyword argument; 149you can encode yourself by saying: 150 151.. sourcecode:: python 152 153 print(mytemplate.render_unicode().encode('utf-8', 'replace')) 154