1 2.. _lexical: 3 4**************** 5Lexical analysis 6**************** 7 8.. index:: lexical analysis, parser, token 9 10A Python program is read by a *parser*. Input to the parser is a stream of 11*tokens*, generated by the *lexical analyzer*. This chapter describes how the 12lexical analyzer breaks a file into tokens. 13 14Python reads program text as Unicode code points; the encoding of a source file 15can be given by an encoding declaration and defaults to UTF-8, see :pep:`3120` 16for details. If the source file cannot be decoded, a :exc:`SyntaxError` is 17raised. 18 19 20.. _line-structure: 21 22Line structure 23============== 24 25.. index:: line structure 26 27A Python program is divided into a number of *logical lines*. 28 29 30.. _logical-lines: 31 32Logical lines 33------------- 34 35.. index:: logical line, physical line, line joining, NEWLINE token 36 37The end of a logical line is represented by the token NEWLINE. Statements 38cannot cross logical line boundaries except where NEWLINE is allowed by the 39syntax (e.g., between statements in compound statements). A logical line is 40constructed from one or more *physical lines* by following the explicit or 41implicit *line joining* rules. 42 43 44.. _physical-lines: 45 46Physical lines 47-------------- 48 49A physical line is a sequence of characters terminated by an end-of-line 50sequence. In source files and strings, any of the standard platform line 51termination sequences can be used - the Unix form using ASCII LF (linefeed), 52the Windows form using the ASCII sequence CR LF (return followed by linefeed), 53or the old Macintosh form using the ASCII CR (return) character. All of these 54forms can be used equally, regardless of platform. The end of input also serves 55as an implicit terminator for the final physical line. 56 57When embedding Python, source code strings should be passed to Python APIs using 58the standard C conventions for newline characters (the ``\n`` character, 59representing ASCII LF, is the line terminator). 60 61 62.. _comments: 63 64Comments 65-------- 66 67.. index:: comment, hash character 68 single: # (hash); comment 69 70A comment starts with a hash character (``#``) that is not part of a string 71literal, and ends at the end of the physical line. A comment signifies the end 72of the logical line unless the implicit line joining rules are invoked. Comments 73are ignored by the syntax. 74 75 76.. _encodings: 77 78Encoding declarations 79--------------------- 80 81.. index:: source character set, encoding declarations (source file) 82 single: # (hash); source encoding declaration 83 84If a comment in the first or second line of the Python script matches the 85regular expression ``coding[=:]\s*([-\w.]+)``, this comment is processed as an 86encoding declaration; the first group of this expression names the encoding of 87the source code file. The encoding declaration must appear on a line of its 88own. If it is the second line, the first line must also be a comment-only line. 89The recommended forms of an encoding expression are :: 90 91 # -*- coding: <encoding-name> -*- 92 93which is recognized also by GNU Emacs, and :: 94 95 # vim:fileencoding=<encoding-name> 96 97which is recognized by Bram Moolenaar's VIM. 98 99If no encoding declaration is found, the default encoding is UTF-8. In 100addition, if the first bytes of the file are the UTF-8 byte-order mark 101(``b'\xef\xbb\xbf'``), the declared file encoding is UTF-8 (this is supported, 102among others, by Microsoft's :program:`notepad`). 103 104If an encoding is declared, the encoding name must be recognized by Python 105(see :ref:`standard-encodings`). The 106encoding is used for all lexical analysis, including string literals, comments 107and identifiers. 108 109 110.. _explicit-joining: 111 112Explicit line joining 113--------------------- 114 115.. index:: physical line, line joining, line continuation, backslash character 116 117Two or more physical lines may be joined into logical lines using backslash 118characters (``\``), as follows: when a physical line ends in a backslash that is 119not part of a string literal or comment, it is joined with the following forming 120a single logical line, deleting the backslash and the following end-of-line 121character. For example:: 122 123 if 1900 < year < 2100 and 1 <= month <= 12 \ 124 and 1 <= day <= 31 and 0 <= hour < 24 \ 125 and 0 <= minute < 60 and 0 <= second < 60: # Looks like a valid date 126 return 1 127 128A line ending in a backslash cannot carry a comment. A backslash does not 129continue a comment. A backslash does not continue a token except for string 130literals (i.e., tokens other than string literals cannot be split across 131physical lines using a backslash). A backslash is illegal elsewhere on a line 132outside a string literal. 133 134 135.. _implicit-joining: 136 137Implicit line joining 138--------------------- 139 140Expressions in parentheses, square brackets or curly braces can be split over 141more than one physical line without using backslashes. For example:: 142 143 month_names = ['Januari', 'Februari', 'Maart', # These are the 144 'April', 'Mei', 'Juni', # Dutch names 145 'Juli', 'Augustus', 'September', # for the months 146 'Oktober', 'November', 'December'] # of the year 147 148Implicitly continued lines can carry comments. The indentation of the 149continuation lines is not important. Blank continuation lines are allowed. 150There is no NEWLINE token between implicit continuation lines. Implicitly 151continued lines can also occur within triple-quoted strings (see below); in that 152case they cannot carry comments. 153 154 155.. _blank-lines: 156 157Blank lines 158----------- 159 160.. index:: single: blank line 161 162A logical line that contains only spaces, tabs, formfeeds and possibly a 163comment, is ignored (i.e., no NEWLINE token is generated). During interactive 164input of statements, handling of a blank line may differ depending on the 165implementation of the read-eval-print loop. In the standard interactive 166interpreter, an entirely blank logical line (i.e. one containing not even 167whitespace or a comment) terminates a multi-line statement. 168 169 170.. _indentation: 171 172Indentation 173----------- 174 175.. index:: indentation, leading whitespace, space, tab, grouping, statement grouping 176 177Leading whitespace (spaces and tabs) at the beginning of a logical line is used 178to compute the indentation level of the line, which in turn is used to determine 179the grouping of statements. 180 181Tabs are replaced (from left to right) by one to eight spaces such that the 182total number of characters up to and including the replacement is a multiple of 183eight (this is intended to be the same rule as used by Unix). The total number 184of spaces preceding the first non-blank character then determines the line's 185indentation. Indentation cannot be split over multiple physical lines using 186backslashes; the whitespace up to the first backslash determines the 187indentation. 188 189Indentation is rejected as inconsistent if a source file mixes tabs and spaces 190in a way that makes the meaning dependent on the worth of a tab in spaces; a 191:exc:`TabError` is raised in that case. 192 193**Cross-platform compatibility note:** because of the nature of text editors on 194non-UNIX platforms, it is unwise to use a mixture of spaces and tabs for the 195indentation in a single source file. It should also be noted that different 196platforms may explicitly limit the maximum indentation level. 197 198A formfeed character may be present at the start of the line; it will be ignored 199for the indentation calculations above. Formfeed characters occurring elsewhere 200in the leading whitespace have an undefined effect (for instance, they may reset 201the space count to zero). 202 203.. index:: INDENT token, DEDENT token 204 205The indentation levels of consecutive lines are used to generate INDENT and 206DEDENT tokens, using a stack, as follows. 207 208Before the first line of the file is read, a single zero is pushed on the stack; 209this will never be popped off again. The numbers pushed on the stack will 210always be strictly increasing from bottom to top. At the beginning of each 211logical line, the line's indentation level is compared to the top of the stack. 212If it is equal, nothing happens. If it is larger, it is pushed on the stack, and 213one INDENT token is generated. If it is smaller, it *must* be one of the 214numbers occurring on the stack; all numbers on the stack that are larger are 215popped off, and for each number popped off a DEDENT token is generated. At the 216end of the file, a DEDENT token is generated for each number remaining on the 217stack that is larger than zero. 218 219Here is an example of a correctly (though confusingly) indented piece of Python 220code:: 221 222 def perm(l): 223 # Compute the list of all permutations of l 224 if len(l) <= 1: 225 return [l] 226 r = [] 227 for i in range(len(l)): 228 s = l[:i] + l[i+1:] 229 p = perm(s) 230 for x in p: 231 r.append(l[i:i+1] + x) 232 return r 233 234The following example shows various indentation errors:: 235 236 def perm(l): # error: first line indented 237 for i in range(len(l)): # error: not indented 238 s = l[:i] + l[i+1:] 239 p = perm(l[:i] + l[i+1:]) # error: unexpected indent 240 for x in p: 241 r.append(l[i:i+1] + x) 242 return r # error: inconsistent dedent 243 244(Actually, the first three errors are detected by the parser; only the last 245error is found by the lexical analyzer --- the indentation of ``return r`` does 246not match a level popped off the stack.) 247 248 249.. _whitespace: 250 251Whitespace between tokens 252------------------------- 253 254Except at the beginning of a logical line or in string literals, the whitespace 255characters space, tab and formfeed can be used interchangeably to separate 256tokens. Whitespace is needed between two tokens only if their concatenation 257could otherwise be interpreted as a different token (e.g., ab is one token, but 258a b is two tokens). 259 260 261.. _other-tokens: 262 263Other tokens 264============ 265 266Besides NEWLINE, INDENT and DEDENT, the following categories of tokens exist: 267*identifiers*, *keywords*, *literals*, *operators*, and *delimiters*. Whitespace 268characters (other than line terminators, discussed earlier) are not tokens, but 269serve to delimit tokens. Where ambiguity exists, a token comprises the longest 270possible string that forms a legal token, when read from left to right. 271 272 273.. _identifiers: 274 275Identifiers and keywords 276======================== 277 278.. index:: identifier, name 279 280Identifiers (also referred to as *names*) are described by the following lexical 281definitions. 282 283The syntax of identifiers in Python is based on the Unicode standard annex 284UAX-31, with elaboration and changes as defined below; see also :pep:`3131` for 285further details. 286 287Within the ASCII range (U+0001..U+007F), the valid characters for identifiers 288are the same as in Python 2.x: the uppercase and lowercase letters ``A`` through 289``Z``, the underscore ``_`` and, except for the first character, the digits 290``0`` through ``9``. 291 292Python 3.0 introduces additional characters from outside the ASCII range (see 293:pep:`3131`). For these characters, the classification uses the version of the 294Unicode Character Database as included in the :mod:`unicodedata` module. 295 296Identifiers are unlimited in length. Case is significant. 297 298.. productionlist:: python-grammar 299 identifier: `xid_start` `xid_continue`* 300 id_start: <all characters in general categories Lu, Ll, Lt, Lm, Lo, Nl, the underscore, and characters with the Other_ID_Start property> 301 id_continue: <all characters in `id_start`, plus characters in the categories Mn, Mc, Nd, Pc and others with the Other_ID_Continue property> 302 xid_start: <all characters in `id_start` whose NFKC normalization is in "id_start xid_continue*"> 303 xid_continue: <all characters in `id_continue` whose NFKC normalization is in "id_continue*"> 304 305The Unicode category codes mentioned above stand for: 306 307* *Lu* - uppercase letters 308* *Ll* - lowercase letters 309* *Lt* - titlecase letters 310* *Lm* - modifier letters 311* *Lo* - other letters 312* *Nl* - letter numbers 313* *Mn* - nonspacing marks 314* *Mc* - spacing combining marks 315* *Nd* - decimal numbers 316* *Pc* - connector punctuations 317* *Other_ID_Start* - explicit list of characters in `PropList.txt 318 <https://www.unicode.org/Public/14.0.0/ucd/PropList.txt>`_ to support backwards 319 compatibility 320* *Other_ID_Continue* - likewise 321 322All identifiers are converted into the normal form NFKC while parsing; comparison 323of identifiers is based on NFKC. 324 325A non-normative HTML file listing all valid identifier characters for Unicode 32614.0.0 can be found at 327https://www.unicode.org/Public/14.0.0/ucd/DerivedCoreProperties.txt 328 329 330.. _keywords: 331 332Keywords 333-------- 334 335.. index:: 336 single: keyword 337 single: reserved word 338 339The following identifiers are used as reserved words, or *keywords* of the 340language, and cannot be used as ordinary identifiers. They must be spelled 341exactly as written here: 342 343.. sourcecode:: text 344 345 False await else import pass 346 None break except in raise 347 True class finally is return 348 and continue for lambda try 349 as def from nonlocal while 350 assert del global not with 351 async elif if or yield 352 353 354.. _soft-keywords: 355 356Soft Keywords 357------------- 358 359.. index:: soft keyword, keyword 360 361.. versionadded:: 3.10 362 363Some identifiers are only reserved under specific contexts. These are known as 364*soft keywords*. The identifiers ``match``, ``case`` and ``_`` can 365syntactically act as keywords in contexts related to the pattern matching 366statement, but this distinction is done at the parser level, not when 367tokenizing. 368 369As soft keywords, their use with pattern matching is possible while still 370preserving compatibility with existing code that uses ``match``, ``case`` and ``_`` as 371identifier names. 372 373 374.. index:: 375 single: _, identifiers 376 single: __, identifiers 377.. _id-classes: 378 379Reserved classes of identifiers 380------------------------------- 381 382Certain classes of identifiers (besides keywords) have special meanings. These 383classes are identified by the patterns of leading and trailing underscore 384characters: 385 386``_*`` 387 Not imported by ``from module import *``. 388 389``_`` 390 In a ``case`` pattern within a :keyword:`match` statement, ``_`` is a 391 :ref:`soft keyword <soft-keywords>` that denotes a 392 :ref:`wildcard <wildcard-patterns>`. 393 394 Separately, the interactive interpreter makes the result of the last evaluation 395 available in the variable ``_``. 396 (It is stored in the :mod:`builtins` module, alongside built-in 397 functions like ``print``.) 398 399 Elsewhere, ``_`` is a regular identifier. It is often used to name 400 "special" items, but it is not special to Python itself. 401 402 .. note:: 403 404 The name ``_`` is often used in conjunction with internationalization; 405 refer to the documentation for the :mod:`gettext` module for more 406 information on this convention. 407 408 It is also commonly used for unused variables. 409 410``__*__`` 411 System-defined names, informally known as "dunder" names. These names are 412 defined by the interpreter and its implementation (including the standard library). 413 Current system names are discussed in the :ref:`specialnames` section and elsewhere. 414 More will likely be defined in future versions of Python. *Any* use of ``__*__`` names, 415 in any context, that does not follow explicitly documented use, is subject to 416 breakage without warning. 417 418``__*`` 419 Class-private names. Names in this category, when used within the context of a 420 class definition, are re-written to use a mangled form to help avoid name 421 clashes between "private" attributes of base and derived classes. See section 422 :ref:`atom-identifiers`. 423 424 425.. _literals: 426 427Literals 428======== 429 430.. index:: literal, constant 431 432Literals are notations for constant values of some built-in types. 433 434 435.. index:: string literal, bytes literal, ASCII 436 single: ' (single quote); string literal 437 single: " (double quote); string literal 438 single: u'; string literal 439 single: u"; string literal 440.. _strings: 441 442String and Bytes literals 443------------------------- 444 445String literals are described by the following lexical definitions: 446 447.. productionlist:: python-grammar 448 stringliteral: [`stringprefix`](`shortstring` | `longstring`) 449 stringprefix: "r" | "u" | "R" | "U" | "f" | "F" 450 : | "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF" 451 shortstring: "'" `shortstringitem`* "'" | '"' `shortstringitem`* '"' 452 longstring: "'''" `longstringitem`* "'''" | '"""' `longstringitem`* '"""' 453 shortstringitem: `shortstringchar` | `stringescapeseq` 454 longstringitem: `longstringchar` | `stringescapeseq` 455 shortstringchar: <any source character except "\" or newline or the quote> 456 longstringchar: <any source character except "\"> 457 stringescapeseq: "\" <any source character> 458 459.. productionlist:: python-grammar 460 bytesliteral: `bytesprefix`(`shortbytes` | `longbytes`) 461 bytesprefix: "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB" 462 shortbytes: "'" `shortbytesitem`* "'" | '"' `shortbytesitem`* '"' 463 longbytes: "'''" `longbytesitem`* "'''" | '"""' `longbytesitem`* '"""' 464 shortbytesitem: `shortbyteschar` | `bytesescapeseq` 465 longbytesitem: `longbyteschar` | `bytesescapeseq` 466 shortbyteschar: <any ASCII character except "\" or newline or the quote> 467 longbyteschar: <any ASCII character except "\"> 468 bytesescapeseq: "\" <any ASCII character> 469 470One syntactic restriction not indicated by these productions is that whitespace 471is not allowed between the :token:`~python-grammar:stringprefix` or 472:token:`~python-grammar:bytesprefix` and the rest of the literal. The source 473character set is defined by the encoding declaration; it is UTF-8 if no encoding 474declaration is given in the source file; see section :ref:`encodings`. 475 476.. index:: triple-quoted string, Unicode Consortium, raw string 477 single: """; string literal 478 single: '''; string literal 479 480In plain English: Both types of literals can be enclosed in matching single quotes 481(``'``) or double quotes (``"``). They can also be enclosed in matching groups 482of three single or double quotes (these are generally referred to as 483*triple-quoted strings*). The backslash (``\``) character is used to escape 484characters that otherwise have a special meaning, such as newline, backslash 485itself, or the quote character. 486 487.. index:: 488 single: b'; bytes literal 489 single: b"; bytes literal 490 491Bytes literals are always prefixed with ``'b'`` or ``'B'``; they produce an 492instance of the :class:`bytes` type instead of the :class:`str` type. They 493may only contain ASCII characters; bytes with a numeric value of 128 or greater 494must be expressed with escapes. 495 496.. index:: 497 single: r'; raw string literal 498 single: r"; raw string literal 499 500Both string and bytes literals may optionally be prefixed with a letter ``'r'`` 501or ``'R'``; such strings are called :dfn:`raw strings` and treat backslashes as 502literal characters. As a result, in string literals, ``'\U'`` and ``'\u'`` 503escapes in raw strings are not treated specially. Given that Python 2.x's raw 504unicode literals behave differently than Python 3.x's the ``'ur'`` syntax 505is not supported. 506 507.. versionadded:: 3.3 508 The ``'rb'`` prefix of raw bytes literals has been added as a synonym 509 of ``'br'``. 510 511.. versionadded:: 3.3 512 Support for the unicode legacy literal (``u'value'``) was reintroduced 513 to simplify the maintenance of dual Python 2.x and 3.x codebases. 514 See :pep:`414` for more information. 515 516.. index:: 517 single: f'; formatted string literal 518 single: f"; formatted string literal 519 520A string literal with ``'f'`` or ``'F'`` in its prefix is a 521:dfn:`formatted string literal`; see :ref:`f-strings`. The ``'f'`` may be 522combined with ``'r'``, but not with ``'b'`` or ``'u'``, therefore raw 523formatted strings are possible, but formatted bytes literals are not. 524 525In triple-quoted literals, unescaped newlines and quotes are allowed (and are 526retained), except that three unescaped quotes in a row terminate the literal. (A 527"quote" is the character used to open the literal, i.e. either ``'`` or ``"``.) 528 529.. index:: physical line, escape sequence, Standard C, C 530 single: \ (backslash); escape sequence 531 single: \\; escape sequence 532 single: \a; escape sequence 533 single: \b; escape sequence 534 single: \f; escape sequence 535 single: \n; escape sequence 536 single: \r; escape sequence 537 single: \t; escape sequence 538 single: \v; escape sequence 539 single: \x; escape sequence 540 single: \N; escape sequence 541 single: \u; escape sequence 542 single: \U; escape sequence 543 544Unless an ``'r'`` or ``'R'`` prefix is present, escape sequences in string and 545bytes literals are interpreted according to rules similar to those used by 546Standard C. The recognized escape sequences are: 547 548+-----------------+---------------------------------+-------+ 549| Escape Sequence | Meaning | Notes | 550+=================+=================================+=======+ 551| ``\``\ <newline>| Backslash and newline ignored | \(1) | 552+-----------------+---------------------------------+-------+ 553| ``\\`` | Backslash (``\``) | | 554+-----------------+---------------------------------+-------+ 555| ``\'`` | Single quote (``'``) | | 556+-----------------+---------------------------------+-------+ 557| ``\"`` | Double quote (``"``) | | 558+-----------------+---------------------------------+-------+ 559| ``\a`` | ASCII Bell (BEL) | | 560+-----------------+---------------------------------+-------+ 561| ``\b`` | ASCII Backspace (BS) | | 562+-----------------+---------------------------------+-------+ 563| ``\f`` | ASCII Formfeed (FF) | | 564+-----------------+---------------------------------+-------+ 565| ``\n`` | ASCII Linefeed (LF) | | 566+-----------------+---------------------------------+-------+ 567| ``\r`` | ASCII Carriage Return (CR) | | 568+-----------------+---------------------------------+-------+ 569| ``\t`` | ASCII Horizontal Tab (TAB) | | 570+-----------------+---------------------------------+-------+ 571| ``\v`` | ASCII Vertical Tab (VT) | | 572+-----------------+---------------------------------+-------+ 573| ``\ooo`` | Character with octal value | (2,4) | 574| | *ooo* | | 575+-----------------+---------------------------------+-------+ 576| ``\xhh`` | Character with hex value *hh* | (3,4) | 577+-----------------+---------------------------------+-------+ 578 579Escape sequences only recognized in string literals are: 580 581+-----------------+---------------------------------+-------+ 582| Escape Sequence | Meaning | Notes | 583+=================+=================================+=======+ 584| ``\N{name}`` | Character named *name* in the | \(5) | 585| | Unicode database | | 586+-----------------+---------------------------------+-------+ 587| ``\uxxxx`` | Character with 16-bit hex value | \(6) | 588| | *xxxx* | | 589+-----------------+---------------------------------+-------+ 590| ``\Uxxxxxxxx`` | Character with 32-bit hex value | \(7) | 591| | *xxxxxxxx* | | 592+-----------------+---------------------------------+-------+ 593 594Notes: 595 596(1) 597 A backslash can be added at the end of a line to ignore the newline:: 598 599 >>> 'This string will not include \ 600 ... backslashes or newline characters.' 601 'This string will not include backslashes or newline characters.' 602 603 The same result can be achieved using :ref:`triple-quoted strings <strings>`, 604 or parentheses and :ref:`string literal concatenation <string-concatenation>`. 605 606 607(2) 608 As in Standard C, up to three octal digits are accepted. 609 610 .. versionchanged:: 3.11 611 Octal escapes with value larger than ``0o377`` produce a :exc:`DeprecationWarning`. 612 In a future Python version they will be a :exc:`SyntaxWarning` and 613 eventually a :exc:`SyntaxError`. 614 615(3) 616 Unlike in Standard C, exactly two hex digits are required. 617 618(4) 619 In a bytes literal, hexadecimal and octal escapes denote the byte with the 620 given value. In a string literal, these escapes denote a Unicode character 621 with the given value. 622 623(5) 624 .. versionchanged:: 3.3 625 Support for name aliases [#]_ has been added. 626 627(6) 628 Exactly four hex digits are required. 629 630(7) 631 Any Unicode character can be encoded this way. Exactly eight hex digits 632 are required. 633 634 635.. index:: unrecognized escape sequence 636 637Unlike Standard C, all unrecognized escape sequences are left in the string 638unchanged, i.e., *the backslash is left in the result*. (This behavior is 639useful when debugging: if an escape sequence is mistyped, the resulting output 640is more easily recognized as broken.) It is also important to note that the 641escape sequences only recognized in string literals fall into the category of 642unrecognized escapes for bytes literals. 643 644 .. versionchanged:: 3.6 645 Unrecognized escape sequences produce a :exc:`DeprecationWarning`. In 646 a future Python version they will be a :exc:`SyntaxWarning` and 647 eventually a :exc:`SyntaxError`. 648 649Even in a raw literal, quotes can be escaped with a backslash, but the 650backslash remains in the result; for example, ``r"\""`` is a valid string 651literal consisting of two characters: a backslash and a double quote; ``r"\"`` 652is not a valid string literal (even a raw string cannot end in an odd number of 653backslashes). Specifically, *a raw literal cannot end in a single backslash* 654(since the backslash would escape the following quote character). Note also 655that a single backslash followed by a newline is interpreted as those two 656characters as part of the literal, *not* as a line continuation. 657 658 659.. _string-concatenation: 660 661String literal concatenation 662---------------------------- 663 664Multiple adjacent string or bytes literals (delimited by whitespace), possibly 665using different quoting conventions, are allowed, and their meaning is the same 666as their concatenation. Thus, ``"hello" 'world'`` is equivalent to 667``"helloworld"``. This feature can be used to reduce the number of backslashes 668needed, to split long strings conveniently across long lines, or even to add 669comments to parts of strings, for example:: 670 671 re.compile("[A-Za-z_]" # letter or underscore 672 "[A-Za-z0-9_]*" # letter, digit or underscore 673 ) 674 675Note that this feature is defined at the syntactical level, but implemented at 676compile time. The '+' operator must be used to concatenate string expressions 677at run time. Also note that literal concatenation can use different quoting 678styles for each component (even mixing raw strings and triple quoted strings), 679and formatted string literals may be concatenated with plain string literals. 680 681 682.. index:: 683 single: formatted string literal 684 single: interpolated string literal 685 single: string; formatted literal 686 single: string; interpolated literal 687 single: f-string 688 single: fstring 689 single: {} (curly brackets); in formatted string literal 690 single: ! (exclamation); in formatted string literal 691 single: : (colon); in formatted string literal 692 single: = (equals); for help in debugging using string literals 693.. _f-strings: 694 695Formatted string literals 696------------------------- 697 698.. versionadded:: 3.6 699 700A :dfn:`formatted string literal` or :dfn:`f-string` is a string literal 701that is prefixed with ``'f'`` or ``'F'``. These strings may contain 702replacement fields, which are expressions delimited by curly braces ``{}``. 703While other string literals always have a constant value, formatted strings 704are really expressions evaluated at run time. 705 706Escape sequences are decoded like in ordinary string literals (except when 707a literal is also marked as a raw string). After decoding, the grammar 708for the contents of the string is: 709 710.. productionlist:: python-grammar 711 f_string: (`literal_char` | "{{" | "}}" | `replacement_field`)* 712 replacement_field: "{" `f_expression` ["="] ["!" `conversion`] [":" `format_spec`] "}" 713 f_expression: (`conditional_expression` | "*" `or_expr`) 714 : ("," `conditional_expression` | "," "*" `or_expr`)* [","] 715 : | `yield_expression` 716 conversion: "s" | "r" | "a" 717 format_spec: (`literal_char` | NULL | `replacement_field`)* 718 literal_char: <any code point except "{", "}" or NULL> 719 720The parts of the string outside curly braces are treated literally, 721except that any doubled curly braces ``'{{'`` or ``'}}'`` are replaced 722with the corresponding single curly brace. A single opening curly 723bracket ``'{'`` marks a replacement field, which starts with a 724Python expression. To display both the expression text and its value after 725evaluation, (useful in debugging), an equal sign ``'='`` may be added after the 726expression. A conversion field, introduced by an exclamation point ``'!'`` may 727follow. A format specifier may also be appended, introduced by a colon ``':'``. 728A replacement field ends with a closing curly bracket ``'}'``. 729 730Expressions in formatted string literals are treated like regular 731Python expressions surrounded by parentheses, with a few exceptions. 732An empty expression is not allowed, and both :keyword:`lambda` and 733assignment expressions ``:=`` must be surrounded by explicit parentheses. 734Replacement expressions can contain line breaks (e.g. in triple-quoted 735strings), but they cannot contain comments. Each expression is evaluated 736in the context where the formatted string literal appears, in order from 737left to right. 738 739.. versionchanged:: 3.7 740 Prior to Python 3.7, an :keyword:`await` expression and comprehensions 741 containing an :keyword:`async for` clause were illegal in the expressions 742 in formatted string literals due to a problem with the implementation. 743 744When the equal sign ``'='`` is provided, the output will have the expression 745text, the ``'='`` and the evaluated value. Spaces after the opening brace 746``'{'``, within the expression and after the ``'='`` are all retained in the 747output. By default, the ``'='`` causes the :func:`repr` of the expression to be 748provided, unless there is a format specified. When a format is specified it 749defaults to the :func:`str` of the expression unless a conversion ``'!r'`` is 750declared. 751 752.. versionadded:: 3.8 753 The equal sign ``'='``. 754 755If a conversion is specified, the result of evaluating the expression 756is converted before formatting. Conversion ``'!s'`` calls :func:`str` on 757the result, ``'!r'`` calls :func:`repr`, and ``'!a'`` calls :func:`ascii`. 758 759The result is then formatted using the :func:`format` protocol. The 760format specifier is passed to the :meth:`__format__` method of the 761expression or conversion result. An empty string is passed when the 762format specifier is omitted. The formatted result is then included in 763the final value of the whole string. 764 765Top-level format specifiers may include nested replacement fields. These nested 766fields may include their own conversion fields and :ref:`format specifiers 767<formatspec>`, but may not include more deeply nested replacement fields. The 768:ref:`format specifier mini-language <formatspec>` is the same as that used by 769the :meth:`str.format` method. 770 771Formatted string literals may be concatenated, but replacement fields 772cannot be split across literals. 773 774Some examples of formatted string literals:: 775 776 >>> name = "Fred" 777 >>> f"He said his name is {name!r}." 778 "He said his name is 'Fred'." 779 >>> f"He said his name is {repr(name)}." # repr() is equivalent to !r 780 "He said his name is 'Fred'." 781 >>> width = 10 782 >>> precision = 4 783 >>> value = decimal.Decimal("12.34567") 784 >>> f"result: {value:{width}.{precision}}" # nested fields 785 'result: 12.35' 786 >>> today = datetime(year=2017, month=1, day=27) 787 >>> f"{today:%B %d, %Y}" # using date format specifier 788 'January 27, 2017' 789 >>> f"{today=:%B %d, %Y}" # using date format specifier and debugging 790 'today=January 27, 2017' 791 >>> number = 1024 792 >>> f"{number:#0x}" # using integer format specifier 793 '0x400' 794 >>> foo = "bar" 795 >>> f"{ foo = }" # preserves whitespace 796 " foo = 'bar'" 797 >>> line = "The mill's closed" 798 >>> f"{line = }" 799 'line = "The mill\'s closed"' 800 >>> f"{line = :20}" 801 "line = The mill's closed " 802 >>> f"{line = !r:20}" 803 'line = "The mill\'s closed" ' 804 805 806A consequence of sharing the same syntax as regular string literals is 807that characters in the replacement fields must not conflict with the 808quoting used in the outer formatted string literal:: 809 810 f"abc {a["x"]} def" # error: outer string literal ended prematurely 811 f"abc {a['x']} def" # workaround: use different quoting 812 813Backslashes are not allowed in format expressions and will raise 814an error:: 815 816 f"newline: {ord('\n')}" # raises SyntaxError 817 818To include a value in which a backslash escape is required, create 819a temporary variable. 820 821 >>> newline = ord('\n') 822 >>> f"newline: {newline}" 823 'newline: 10' 824 825Formatted string literals cannot be used as docstrings, even if they do not 826include expressions. 827 828:: 829 830 >>> def foo(): 831 ... f"Not a docstring" 832 ... 833 >>> foo.__doc__ is None 834 True 835 836See also :pep:`498` for the proposal that added formatted string literals, 837and :meth:`str.format`, which uses a related format string mechanism. 838 839 840.. _numbers: 841 842Numeric literals 843---------------- 844 845.. index:: number, numeric literal, integer literal 846 floating point literal, hexadecimal literal 847 octal literal, binary literal, decimal literal, imaginary literal, complex literal 848 849There are three types of numeric literals: integers, floating point numbers, and 850imaginary numbers. There are no complex literals (complex numbers can be formed 851by adding a real number and an imaginary number). 852 853Note that numeric literals do not include a sign; a phrase like ``-1`` is 854actually an expression composed of the unary operator '``-``' and the literal 855``1``. 856 857 858.. index:: 859 single: 0b; integer literal 860 single: 0o; integer literal 861 single: 0x; integer literal 862 single: _ (underscore); in numeric literal 863 864.. _integers: 865 866Integer literals 867---------------- 868 869Integer literals are described by the following lexical definitions: 870 871.. productionlist:: python-grammar 872 integer: `decinteger` | `bininteger` | `octinteger` | `hexinteger` 873 decinteger: `nonzerodigit` (["_"] `digit`)* | "0"+ (["_"] "0")* 874 bininteger: "0" ("b" | "B") (["_"] `bindigit`)+ 875 octinteger: "0" ("o" | "O") (["_"] `octdigit`)+ 876 hexinteger: "0" ("x" | "X") (["_"] `hexdigit`)+ 877 nonzerodigit: "1"..."9" 878 digit: "0"..."9" 879 bindigit: "0" | "1" 880 octdigit: "0"..."7" 881 hexdigit: `digit` | "a"..."f" | "A"..."F" 882 883There is no limit for the length of integer literals apart from what can be 884stored in available memory. 885 886Underscores are ignored for determining the numeric value of the literal. They 887can be used to group digits for enhanced readability. One underscore can occur 888between digits, and after base specifiers like ``0x``. 889 890Note that leading zeros in a non-zero decimal number are not allowed. This is 891for disambiguation with C-style octal literals, which Python used before version 8923.0. 893 894Some examples of integer literals:: 895 896 7 2147483647 0o177 0b100110111 897 3 79228162514264337593543950336 0o377 0xdeadbeef 898 100_000_000_000 0b_1110_0101 899 900.. versionchanged:: 3.6 901 Underscores are now allowed for grouping purposes in literals. 902 903 904.. index:: 905 single: . (dot); in numeric literal 906 single: e; in numeric literal 907 single: _ (underscore); in numeric literal 908.. _floating: 909 910Floating point literals 911----------------------- 912 913Floating point literals are described by the following lexical definitions: 914 915.. productionlist:: python-grammar 916 floatnumber: `pointfloat` | `exponentfloat` 917 pointfloat: [`digitpart`] `fraction` | `digitpart` "." 918 exponentfloat: (`digitpart` | `pointfloat`) `exponent` 919 digitpart: `digit` (["_"] `digit`)* 920 fraction: "." `digitpart` 921 exponent: ("e" | "E") ["+" | "-"] `digitpart` 922 923Note that the integer and exponent parts are always interpreted using radix 10. 924For example, ``077e010`` is legal, and denotes the same number as ``77e10``. The 925allowed range of floating point literals is implementation-dependent. As in 926integer literals, underscores are supported for digit grouping. 927 928Some examples of floating point literals:: 929 930 3.14 10. .001 1e100 3.14e-10 0e0 3.14_15_93 931 932.. versionchanged:: 3.6 933 Underscores are now allowed for grouping purposes in literals. 934 935 936.. index:: 937 single: j; in numeric literal 938.. _imaginary: 939 940Imaginary literals 941------------------ 942 943Imaginary literals are described by the following lexical definitions: 944 945.. productionlist:: python-grammar 946 imagnumber: (`floatnumber` | `digitpart`) ("j" | "J") 947 948An imaginary literal yields a complex number with a real part of 0.0. Complex 949numbers are represented as a pair of floating point numbers and have the same 950restrictions on their range. To create a complex number with a nonzero real 951part, add a floating point number to it, e.g., ``(3+4j)``. Some examples of 952imaginary literals:: 953 954 3.14j 10.j 10j .001j 1e100j 3.14e-10j 3.14_15_93j 955 956 957.. _operators: 958 959Operators 960========= 961 962.. index:: single: operators 963 964The following tokens are operators: 965 966.. code-block:: none 967 968 969 + - * ** / // % @ 970 << >> & | ^ ~ := 971 < > <= >= == != 972 973 974.. _delimiters: 975 976Delimiters 977========== 978 979.. index:: single: delimiters 980 981The following tokens serve as delimiters in the grammar: 982 983.. code-block:: none 984 985 ( ) [ ] { } 986 , : . ; @ = -> 987 += -= *= /= //= %= @= 988 &= |= ^= >>= <<= **= 989 990The period can also occur in floating-point and imaginary literals. A sequence 991of three periods has a special meaning as an ellipsis literal. The second half 992of the list, the augmented assignment operators, serve lexically as delimiters, 993but also perform an operation. 994 995The following printing ASCII characters have special meaning as part of other 996tokens or are otherwise significant to the lexical analyzer: 997 998.. code-block:: none 999 1000 ' " # \ 1001 1002The following printing ASCII characters are not used in Python. Their 1003occurrence outside string literals and comments is an unconditional error: 1004 1005.. code-block:: none 1006 1007 $ ? ` 1008 1009 1010.. rubric:: Footnotes 1011 1012.. [#] https://www.unicode.org/Public/11.0.0/ucd/NameAliases.txt 1013