1.. _sortinghowto: 2 3Sorting HOW TO 4************** 5 6:Author: Andrew Dalke and Raymond Hettinger 7:Release: 0.1 8 9 10Python lists have a built-in :meth:`list.sort` method that modifies the list 11in-place. There is also a :func:`sorted` built-in function that builds a new 12sorted list from an iterable. 13 14In this document, we explore the various techniques for sorting data using Python. 15 16 17Sorting Basics 18============== 19 20A simple ascending sort is very easy: just call the :func:`sorted` function. It 21returns a new sorted list: 22 23.. doctest:: 24 25 >>> sorted([5, 2, 3, 1, 4]) 26 [1, 2, 3, 4, 5] 27 28You can also use the :meth:`list.sort` method. It modifies the list 29in-place (and returns ``None`` to avoid confusion). Usually it's less convenient 30than :func:`sorted` - but if you don't need the original list, it's slightly 31more efficient. 32 33.. doctest:: 34 35 >>> a = [5, 2, 3, 1, 4] 36 >>> a.sort() 37 >>> a 38 [1, 2, 3, 4, 5] 39 40Another difference is that the :meth:`list.sort` method is only defined for 41lists. In contrast, the :func:`sorted` function accepts any iterable. 42 43.. doctest:: 44 45 >>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'}) 46 [1, 2, 3, 4, 5] 47 48Key Functions 49============= 50 51Both :meth:`list.sort` and :func:`sorted` have a *key* parameter to specify a 52function (or other callable) to be called on each list element prior to making 53comparisons. 54 55For example, here's a case-insensitive string comparison: 56 57.. doctest:: 58 59 >>> sorted("This is a test string from Andrew".split(), key=str.lower) 60 ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This'] 61 62The value of the *key* parameter should be a function (or other callable) that 63takes a single argument and returns a key to use for sorting purposes. This 64technique is fast because the key function is called exactly once for each 65input record. 66 67A common pattern is to sort complex objects using some of the object's indices 68as keys. For example: 69 70.. doctest:: 71 72 >>> student_tuples = [ 73 ... ('john', 'A', 15), 74 ... ('jane', 'B', 12), 75 ... ('dave', 'B', 10), 76 ... ] 77 >>> sorted(student_tuples, key=lambda student: student[2]) # sort by age 78 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 79 80The same technique works for objects with named attributes. For example: 81 82.. doctest:: 83 84 >>> class Student: 85 ... def __init__(self, name, grade, age): 86 ... self.name = name 87 ... self.grade = grade 88 ... self.age = age 89 ... def __repr__(self): 90 ... return repr((self.name, self.grade, self.age)) 91 92 >>> student_objects = [ 93 ... Student('john', 'A', 15), 94 ... Student('jane', 'B', 12), 95 ... Student('dave', 'B', 10), 96 ... ] 97 >>> sorted(student_objects, key=lambda student: student.age) # sort by age 98 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 99 100Operator Module Functions 101========================= 102 103The key-function patterns shown above are very common, so Python provides 104convenience functions to make accessor functions easier and faster. The 105:mod:`operator` module has :func:`~operator.itemgetter`, 106:func:`~operator.attrgetter`, and a :func:`~operator.methodcaller` function. 107 108Using those functions, the above examples become simpler and faster: 109 110.. doctest:: 111 112 >>> from operator import itemgetter, attrgetter 113 114 >>> sorted(student_tuples, key=itemgetter(2)) 115 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 116 117 >>> sorted(student_objects, key=attrgetter('age')) 118 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 119 120The operator module functions allow multiple levels of sorting. For example, to 121sort by *grade* then by *age*: 122 123.. doctest:: 124 125 >>> sorted(student_tuples, key=itemgetter(1,2)) 126 [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)] 127 128 >>> sorted(student_objects, key=attrgetter('grade', 'age')) 129 [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)] 130 131Ascending and Descending 132======================== 133 134Both :meth:`list.sort` and :func:`sorted` accept a *reverse* parameter with a 135boolean value. This is used to flag descending sorts. For example, to get the 136student data in reverse *age* order: 137 138.. doctest:: 139 140 >>> sorted(student_tuples, key=itemgetter(2), reverse=True) 141 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 142 143 >>> sorted(student_objects, key=attrgetter('age'), reverse=True) 144 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 145 146Sort Stability and Complex Sorts 147================================ 148 149Sorts are guaranteed to be `stable 150<https://en.wikipedia.org/wiki/Sorting_algorithm#Stability>`_\. That means that 151when multiple records have the same key, their original order is preserved. 152 153.. doctest:: 154 155 >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)] 156 >>> sorted(data, key=itemgetter(0)) 157 [('blue', 1), ('blue', 2), ('red', 1), ('red', 2)] 158 159Notice how the two records for *blue* retain their original order so that 160``('blue', 1)`` is guaranteed to precede ``('blue', 2)``. 161 162This wonderful property lets you build complex sorts in a series of sorting 163steps. For example, to sort the student data by descending *grade* and then 164ascending *age*, do the *age* sort first and then sort again using *grade*: 165 166.. doctest:: 167 168 >>> s = sorted(student_objects, key=attrgetter('age')) # sort on secondary key 169 >>> sorted(s, key=attrgetter('grade'), reverse=True) # now sort on primary key, descending 170 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 171 172This can be abstracted out into a wrapper function that can take a list and 173tuples of field and order to sort them on multiple passes. 174 175.. doctest:: 176 177 >>> def multisort(xs, specs): 178 ... for key, reverse in reversed(specs): 179 ... xs.sort(key=attrgetter(key), reverse=reverse) 180 ... return xs 181 182 >>> multisort(list(student_objects), (('grade', True), ('age', False))) 183 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 184 185The `Timsort <https://en.wikipedia.org/wiki/Timsort>`_ algorithm used in Python 186does multiple sorts efficiently because it can take advantage of any ordering 187already present in a dataset. 188 189Decorate-Sort-Undecorate 190======================== 191 192This idiom is called Decorate-Sort-Undecorate after its three steps: 193 194* First, the initial list is decorated with new values that control the sort order. 195 196* Second, the decorated list is sorted. 197 198* Finally, the decorations are removed, creating a list that contains only the 199 initial values in the new order. 200 201For example, to sort the student data by *grade* using the DSU approach: 202 203 >>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)] 204 >>> decorated.sort() 205 >>> [student for grade, i, student in decorated] # undecorate 206 [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] 207 208This idiom works because tuples are compared lexicographically; the first items 209are compared; if they are the same then the second items are compared, and so 210on. 211 212It is not strictly necessary in all cases to include the index *i* in the 213decorated list, but including it gives two benefits: 214 215* The sort is stable -- if two items have the same key, their order will be 216 preserved in the sorted list. 217 218* The original items do not have to be comparable because the ordering of the 219 decorated tuples will be determined by at most the first two items. So for 220 example the original list could contain complex numbers which cannot be sorted 221 directly. 222 223Another name for this idiom is 224`Schwartzian transform <https://en.wikipedia.org/wiki/Schwartzian_transform>`_\, 225after Randal L. Schwartz, who popularized it among Perl programmers. 226 227Now that Python sorting provides key-functions, this technique is not often needed. 228 229Comparison Functions 230==================== 231 232Unlike key functions that return an absolute value for sorting, a comparison 233function computes the relative ordering for two inputs. 234 235For example, a `balance scale 236<https://upload.wikimedia.org/wikipedia/commons/1/17/Balance_à_tabac_1850.JPG>`_ 237compares two samples giving a relative ordering: lighter, equal, or heavier. 238Likewise, a comparison function such as ``cmp(a, b)`` will return a negative 239value for less-than, zero if the inputs are equal, or a positive value for 240greater-than. 241 242It is common to encounter comparison functions when translating algorithms from 243other languages. Also, some libraries provide comparison functions as part of 244their API. For example, :func:`locale.strcoll` is a comparison function. 245 246To accommodate those situations, Python provides 247:class:`functools.cmp_to_key` to wrap the comparison function 248to make it usable as a key function:: 249 250 sorted(words, key=cmp_to_key(strcoll)) # locale-aware sort order 251 252Odds and Ends 253============= 254 255* For locale aware sorting, use :func:`locale.strxfrm` for a key function or 256 :func:`locale.strcoll` for a comparison function. This is necessary 257 because "alphabetical" sort orderings can vary across cultures even 258 if the underlying alphabet is the same. 259 260* The *reverse* parameter still maintains sort stability (so that records with 261 equal keys retain the original order). Interestingly, that effect can be 262 simulated without the parameter by using the builtin :func:`reversed` function 263 twice: 264 265 .. doctest:: 266 267 >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)] 268 >>> standard_way = sorted(data, key=itemgetter(0), reverse=True) 269 >>> double_reversed = list(reversed(sorted(reversed(data), key=itemgetter(0)))) 270 >>> assert standard_way == double_reversed 271 >>> standard_way 272 [('red', 1), ('red', 2), ('blue', 1), ('blue', 2)] 273 274* The sort routines use ``<`` when making comparisons 275 between two objects. So, it is easy to add a standard sort order to a class by 276 defining an :meth:`__lt__` method: 277 278 .. doctest:: 279 280 >>> Student.__lt__ = lambda self, other: self.age < other.age 281 >>> sorted(student_objects) 282 [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] 283 284 However, note that ``<`` can fall back to using :meth:`__gt__` if 285 :meth:`__lt__` is not implemented (see :func:`object.__lt__`). 286 287* Key functions need not depend directly on the objects being sorted. A key 288 function can also access external resources. For instance, if the student grades 289 are stored in a dictionary, they can be used to sort a separate list of student 290 names: 291 292 .. doctest:: 293 294 >>> students = ['dave', 'john', 'jane'] 295 >>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'} 296 >>> sorted(students, key=newgrades.__getitem__) 297 ['jane', 'dave', 'john'] 298