Brown Water Python: Better Docs for the Python `tokenize` Module.
=================================================================

The `tokenize` module in the Python standard library is very powerful, but its
[documentation](https://docs.python.org/3/library/tokenize.html) is somewhat
limited. In the spirit of Thomas Kluyver's [Green Tree
Snakes](https://greentreesnakes.readthedocs.io/) project, which provides
similar extended documentation for the `ast` module, I am providing here some
extended documentation for effectively working with the `tokenize` module.

## Python Versions Supported

The contents of this guide apply to Python 3.5 and up. Several minor changes
were made to the `tokenize` module in various Python versions between 3.5
and 3.8, and they have been noted where appropriate.

The `tokenize` module tokenizes code according to the version of Python that
it is being run under. For example, some new syntax features in 3.6 affect
tokenization (in particular,
[f-strings](https://docs.python.org/3.6/whatsnew/3.6.html#pep-498-formatted-string-literals)
and [underscores in numeric
literals](https://docs.python.org/3.6/whatsnew/3.6.html#pep-515-underscores-in-numeric-literals)).
Take `123_456`. This will tokenize as a single token in Python 3.6+, `NUMBER`
(`123_456`), but in Python 3.5, it tokenizes as two tokens, `NUMBER` (`123`)
and `NAME` (`_456`) (see the reference for the [`NUMBER`](number)
token type for more info).

Most of what is written here will also apply to earlier Python 3 versions,
with obvious exceptions (like tokens that were added for new syntax), though
none of it has been tested.

I don't have any interest in supporting Python 2 in this guide. [Its lifetime](https://devguide.python.org/#status-of-python-branches) has officially come
to an end, so you should strongly consider being Python 3-only for new code
that is written.

With that being said, I will point out one important difference in Python 2:
the `tokenize()` function in Python 2 *prints* the tokens instead of
returning them. Instead, you should use the `generate_tokens()` function,
which works like `tokenize()` in Python 3 (see the [docs](https://docs.python.org/2.7/library/tokenize.html)).


```py
>>> # Python 2.7 tokenize example
>>> import tokenize
>>> import io
>>> for tok in tokenize.generate_tokens(io.BytesIO('1 + 2').readline):
...     print tok # doctest: +SKIP
...
(2, '1', (1, 0), (1, 1), '1 + 2')
(51, '+', (1, 2), (1, 3), '1 + 2')
(2, '2', (1, 4), (1, 5), '1 + 2')
(0, '', (2, 0), (2, 0), '')
```

Another difference is that the result of this function in Python 2 is a
regular tuple, not a `namedtuple`, so you will not be able to use attributes
to access the members. Instead use something like `for toknum, tokval, start,
end, line in tokenize.generate_tokens(...):` (this pattern can be used in
Python 3 as well, see the [Usage](calling-syntax) section).


## Contributing

[Contributions](https://github.com/asmeurer/brown-water-python) are welcome.
So are [questions](https://github.com/asmeurer/brown-water-python/issues). My
goal here is to help people to understand the `tokenize` module, so if
something is not clear, [please let me
know](https://github.com/asmeurer/brown-water-python/issues). If you see
something written here that is wrong, please make a [pull
request](https://github.com/asmeurer/brown-water-python/pulls) correcting it.
I'm not an expert at `tokenize`. I mainly know what is written here from trial
and error and from reading the [source
code](https://github.com/python/cpython/blob/master/Lib/tokenize.py).

## Table of Contents

```{toctree}
---
maxdepth: 3
---

intro
alternatives
usage
tokens
helper-functions
examples
```

---

<div class="footer" style="font-size: 14px; color: #888;">Water Python image by Matt
from Melbourne, Australia <a
href="https://creativecommons.org/licenses/by/2.0/">CC BY 2.0</a>, via <a
href="https://commons.wikimedia.org/wiki/File:Water_Python_(Liasis_mackloti)_(8692394648).jpg">Wikimedia
Commons</a></div><p>

```{eval-rst}
.. meta::
   :description: Better documentation for the Python standard library tokenize module.
   :keywords: Python, tokenize module, tokenization
```