1.. _unicode_toplevel:
2
3===================
4The Unicode Chapter
5===================
6
7In normal Mako operation, all parsed template constructs and
8output streams are handled internally as Python 3 ``str`` (Unicode)
9objects. It's only at the point of :meth:`~.Template.render` that this stream of Unicode objects may be rendered into whatever the desired output encoding
10is. The implication here is that the template developer must
11:ensure that :ref:`the encoding of all non-ASCII templates is explicit
12<set_template_file_encoding>` (still required in Python 3, although Mako defaults to ``utf-8``),
13that :ref:`all non-ASCII-encoded expressions are in one way or another
14converted to unicode <handling_non_ascii_expressions>`
15(not much of a burden in Python 3), and that :ref:`the output stream of the
16template is handled as a unicode stream being encoded to some
17encoding <defining_output_encoding>` (still required in Python 3).
18
19.. _set_template_file_encoding:
20
21Specifying the Encoding of a Template File
22==========================================
23
24.. versionchanged:: 1.1.3
25
26    As of Mako 1.1.3, the default template encoding is "utf-8".  Previously, a
27    Python "magic encoding comment" was required for templates that were not
28    using ASCII.
29
30Mako templates support Python's "magic encoding comment" syntax
31described in  `pep-0263 <http://www.python.org/dev/peps/pep-0263/>`_:
32
33.. sourcecode:: mako
34
35    ## -*- coding: utf-8 -*-
36
37    Alors vous imaginez ma surprise, au lever du jour, quand
38    une drôle de petite voix m’a réveillé. Elle disait:
39     « S’il vous plaît… dessine-moi un mouton! »
40
41As an alternative, the template encoding can be specified
42programmatically to either :class:`.Template` or :class:`.TemplateLookup` via
43the ``input_encoding`` parameter:
44
45.. sourcecode:: python
46
47    t = TemplateLookup(directories=['./'], input_encoding='utf-8')
48
49The above will assume all located templates specify ``utf-8``
50encoding, unless the template itself contains its own magic
51encoding comment, which takes precedence.
52
53.. _handling_non_ascii_expressions:
54
55Handling Expressions
56====================
57
58The next area that encoding comes into play is in expression
59constructs. By default, Mako's treatment of an expression like
60this:
61
62.. sourcecode:: mako
63
64    ${"hello world"}
65
66looks something like this:
67
68.. sourcecode:: python
69
70    context.write(str("hello world"))
71
72That is, **the output of all expressions is run through the
73``str`` built-in**. This is the default setting, and can be
74modified to expect various encodings. The ``str`` step serves
75both the purpose of rendering non-string expressions into
76strings (such as integers or objects which contain ``__str()__``
77methods), and to ensure that the final output stream is
78constructed as a Unicode object. The main implication of this is
79that **any raw byte-strings that contain an encoding other than
80ASCII must first be decoded to a Python unicode object**.
81
82Similarly, if you are reading data from a file that is streaming
83bytes, or returning data from some object that is returning a
84Python byte-string containing a non-ASCII encoding, you have to
85explicitly decode to Unicode first, such as:
86
87.. sourcecode:: mako
88
89    ${call_my_object().decode('utf-8')}
90
91Note that filehandles acquired by ``open()`` in Python 3 default
92to returning "text": that is, the decoding is done for you. See
93Python 3's documentation for the ``open()`` built-in for details on
94this.
95
96If you want a certain encoding applied to *all* expressions,
97override the ``str`` builtin with the ``decode`` built-in at the
98:class:`.Template` or :class:`.TemplateLookup` level:
99
100.. sourcecode:: python
101
102    t = Template(templatetext, default_filters=['decode.utf8'])
103
104Note that the built-in ``decode`` object is slower than the
105``str`` function, since unlike ``str`` it's not a Python
106built-in, and it also checks the type of the incoming data to
107determine if string conversion is needed first.
108
109The ``default_filters`` argument can be used to entirely customize
110the filtering process of expressions. This argument is described
111in :ref:`filtering_default_filters`.
112
113.. _defining_output_encoding:
114
115Defining Output Encoding
116========================
117
118Now that we have a template which produces a pure Unicode output
119stream, all the hard work is done. We can take the output and do
120anything with it.
121
122As stated in the :doc:`"Usage" chapter <usage>`, both :class:`.Template` and
123:class:`.TemplateLookup` accept ``output_encoding`` and ``encoding_errors``
124parameters which can be used to encode the output in any Python
125supported codec:
126
127.. sourcecode:: python
128
129    from mako.template import Template
130    from mako.lookup import TemplateLookup
131
132    mylookup = TemplateLookup(directories=['/docs'], output_encoding='utf-8', encoding_errors='replace')
133
134    mytemplate = mylookup.get_template("foo.txt")
135    print(mytemplate.render())
136
137:meth:`~.Template.render` will return a ``bytes`` object in Python 3 if an output
138encoding is specified. By default it performs no encoding and
139returns a native string.
140
141:meth:`~.Template.render_unicode` will return the template output as a Python
142``str`` object:
143
144.. sourcecode:: python
145
146    print(mytemplate.render_unicode())
147
148The above method disgards the output encoding keyword argument;
149you can encode yourself by saying:
150
151.. sourcecode:: python
152
153    print(mytemplate.render_unicode().encode('utf-8', 'replace'))
154