1.. _sortinghowto:
2
3Sorting HOW TO
4**************
5
6:Author: Andrew Dalke and Raymond Hettinger
7:Release: 0.1
8
9
10Python lists have a built-in :meth:`list.sort` method that modifies the list
11in-place.  There is also a :func:`sorted` built-in function that builds a new
12sorted list from an iterable.
13
14In this document, we explore the various techniques for sorting data using Python.
15
16
17Sorting Basics
18==============
19
20A simple ascending sort is very easy: just call the :func:`sorted` function. It
21returns a new sorted list:
22
23.. doctest::
24
25    >>> sorted([5, 2, 3, 1, 4])
26    [1, 2, 3, 4, 5]
27
28You can also use the :meth:`list.sort` method. It modifies the list
29in-place (and returns ``None`` to avoid confusion). Usually it's less convenient
30than :func:`sorted` - but if you don't need the original list, it's slightly
31more efficient.
32
33.. doctest::
34
35    >>> a = [5, 2, 3, 1, 4]
36    >>> a.sort()
37    >>> a
38    [1, 2, 3, 4, 5]
39
40Another difference is that the :meth:`list.sort` method is only defined for
41lists. In contrast, the :func:`sorted` function accepts any iterable.
42
43.. doctest::
44
45    >>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'})
46    [1, 2, 3, 4, 5]
47
48Key Functions
49=============
50
51Both :meth:`list.sort` and :func:`sorted` have a *key* parameter to specify a
52function (or other callable) to be called on each list element prior to making
53comparisons.
54
55For example, here's a case-insensitive string comparison:
56
57.. doctest::
58
59    >>> sorted("This is a test string from Andrew".split(), key=str.lower)
60    ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']
61
62The value of the *key* parameter should be a function (or other callable) that
63takes a single argument and returns a key to use for sorting purposes. This
64technique is fast because the key function is called exactly once for each
65input record.
66
67A common pattern is to sort complex objects using some of the object's indices
68as keys. For example:
69
70.. doctest::
71
72    >>> student_tuples = [
73    ...     ('john', 'A', 15),
74    ...     ('jane', 'B', 12),
75    ...     ('dave', 'B', 10),
76    ... ]
77    >>> sorted(student_tuples, key=lambda student: student[2])   # sort by age
78    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
79
80The same technique works for objects with named attributes. For example:
81
82.. doctest::
83
84    >>> class Student:
85    ...     def __init__(self, name, grade, age):
86    ...         self.name = name
87    ...         self.grade = grade
88    ...         self.age = age
89    ...     def __repr__(self):
90    ...         return repr((self.name, self.grade, self.age))
91
92    >>> student_objects = [
93    ...     Student('john', 'A', 15),
94    ...     Student('jane', 'B', 12),
95    ...     Student('dave', 'B', 10),
96    ... ]
97    >>> sorted(student_objects, key=lambda student: student.age)   # sort by age
98    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
99
100Operator Module Functions
101=========================
102
103The key-function patterns shown above are very common, so Python provides
104convenience functions to make accessor functions easier and faster. The
105:mod:`operator` module has :func:`~operator.itemgetter`,
106:func:`~operator.attrgetter`, and a :func:`~operator.methodcaller` function.
107
108Using those functions, the above examples become simpler and faster:
109
110.. doctest::
111
112    >>> from operator import itemgetter, attrgetter
113
114    >>> sorted(student_tuples, key=itemgetter(2))
115    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
116
117    >>> sorted(student_objects, key=attrgetter('age'))
118    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
119
120The operator module functions allow multiple levels of sorting. For example, to
121sort by *grade* then by *age*:
122
123.. doctest::
124
125    >>> sorted(student_tuples, key=itemgetter(1,2))
126    [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
127
128    >>> sorted(student_objects, key=attrgetter('grade', 'age'))
129    [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
130
131Ascending and Descending
132========================
133
134Both :meth:`list.sort` and :func:`sorted` accept a *reverse* parameter with a
135boolean value. This is used to flag descending sorts. For example, to get the
136student data in reverse *age* order:
137
138.. doctest::
139
140    >>> sorted(student_tuples, key=itemgetter(2), reverse=True)
141    [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
142
143    >>> sorted(student_objects, key=attrgetter('age'), reverse=True)
144    [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
145
146Sort Stability and Complex Sorts
147================================
148
149Sorts are guaranteed to be `stable
150<https://en.wikipedia.org/wiki/Sorting_algorithm#Stability>`_\. That means that
151when multiple records have the same key, their original order is preserved.
152
153.. doctest::
154
155    >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
156    >>> sorted(data, key=itemgetter(0))
157    [('blue', 1), ('blue', 2), ('red', 1), ('red', 2)]
158
159Notice how the two records for *blue* retain their original order so that
160``('blue', 1)`` is guaranteed to precede ``('blue', 2)``.
161
162This wonderful property lets you build complex sorts in a series of sorting
163steps. For example, to sort the student data by descending *grade* and then
164ascending *age*, do the *age* sort first and then sort again using *grade*:
165
166.. doctest::
167
168    >>> s = sorted(student_objects, key=attrgetter('age'))     # sort on secondary key
169    >>> sorted(s, key=attrgetter('grade'), reverse=True)       # now sort on primary key, descending
170    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
171
172This can be abstracted out into a wrapper function that can take a list and
173tuples of field and order to sort them on multiple passes.
174
175.. doctest::
176
177    >>> def multisort(xs, specs):
178    ...     for key, reverse in reversed(specs):
179    ...         xs.sort(key=attrgetter(key), reverse=reverse)
180    ...     return xs
181
182    >>> multisort(list(student_objects), (('grade', True), ('age', False)))
183    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
184
185The `Timsort <https://en.wikipedia.org/wiki/Timsort>`_ algorithm used in Python
186does multiple sorts efficiently because it can take advantage of any ordering
187already present in a dataset.
188
189Decorate-Sort-Undecorate
190========================
191
192This idiom is called Decorate-Sort-Undecorate after its three steps:
193
194* First, the initial list is decorated with new values that control the sort order.
195
196* Second, the decorated list is sorted.
197
198* Finally, the decorations are removed, creating a list that contains only the
199  initial values in the new order.
200
201For example, to sort the student data by *grade* using the DSU approach:
202
203    >>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)]
204    >>> decorated.sort()
205    >>> [student for grade, i, student in decorated]               # undecorate
206    [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
207
208This idiom works because tuples are compared lexicographically; the first items
209are compared; if they are the same then the second items are compared, and so
210on.
211
212It is not strictly necessary in all cases to include the index *i* in the
213decorated list, but including it gives two benefits:
214
215* The sort is stable -- if two items have the same key, their order will be
216  preserved in the sorted list.
217
218* The original items do not have to be comparable because the ordering of the
219  decorated tuples will be determined by at most the first two items. So for
220  example the original list could contain complex numbers which cannot be sorted
221  directly.
222
223Another name for this idiom is
224`Schwartzian transform <https://en.wikipedia.org/wiki/Schwartzian_transform>`_\,
225after Randal L. Schwartz, who popularized it among Perl programmers.
226
227Now that Python sorting provides key-functions, this technique is not often needed.
228
229Comparison Functions
230====================
231
232Unlike key functions that return an absolute value for sorting, a comparison
233function computes the relative ordering for two inputs.
234
235For example, a `balance scale
236<https://upload.wikimedia.org/wikipedia/commons/1/17/Balance_à_tabac_1850.JPG>`_
237compares two samples giving a relative ordering: lighter, equal, or heavier.
238Likewise, a comparison function such as ``cmp(a, b)`` will return a negative
239value for less-than, zero if the inputs are equal, or a positive value for
240greater-than.
241
242It is common to encounter comparison functions when translating algorithms from
243other languages.  Also, some libraries provide comparison functions as part of
244their API.  For example, :func:`locale.strcoll` is a comparison function.
245
246To accommodate those situations, Python provides
247:class:`functools.cmp_to_key` to wrap the comparison function
248to make it usable as a key function::
249
250    sorted(words, key=cmp_to_key(strcoll))  # locale-aware sort order
251
252Odds and Ends
253=============
254
255* For locale aware sorting, use :func:`locale.strxfrm` for a key function or
256  :func:`locale.strcoll` for a comparison function.  This is necessary
257  because "alphabetical" sort orderings can vary across cultures even
258  if the underlying alphabet is the same.
259
260* The *reverse* parameter still maintains sort stability (so that records with
261  equal keys retain the original order). Interestingly, that effect can be
262  simulated without the parameter by using the builtin :func:`reversed` function
263  twice:
264
265  .. doctest::
266
267    >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
268    >>> standard_way = sorted(data, key=itemgetter(0), reverse=True)
269    >>> double_reversed = list(reversed(sorted(reversed(data), key=itemgetter(0))))
270    >>> assert standard_way == double_reversed
271    >>> standard_way
272    [('red', 1), ('red', 2), ('blue', 1), ('blue', 2)]
273
274* The sort routines use ``<`` when making comparisons
275  between two objects. So, it is easy to add a standard sort order to a class by
276  defining an :meth:`__lt__` method:
277
278  .. doctest::
279
280    >>> Student.__lt__ = lambda self, other: self.age < other.age
281    >>> sorted(student_objects)
282    [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
283
284  However, note that ``<`` can fall back to using :meth:`__gt__` if
285  :meth:`__lt__` is not implemented (see :func:`object.__lt__`).
286
287* Key functions need not depend directly on the objects being sorted. A key
288  function can also access external resources. For instance, if the student grades
289  are stored in a dictionary, they can be used to sort a separate list of student
290  names:
291
292  .. doctest::
293
294    >>> students = ['dave', 'john', 'jane']
295    >>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'}
296    >>> sorted(students, key=newgrades.__getitem__)
297    ['jane', 'dave', 'john']
298