"Doing Math with Python" by Amit Saha: Book Review

Aaron Meurer

2015-12-19

Note: No Starch Press has sent me a copy of this book for review purposes.

SHORT VERSION: Doing Math with Python is well written and introduces topics in a nice, mathematical way. I would recommend it for new users of SymPy.

Doing Math with Python by Amit Saha is a new book published by No Starch Press. The book shows how to use Python to do high school-level mathematics. It makes heavy use of SymPy in many chapters, and this review will focus mainly on those parts, as that is the area I have expertise in.

The book assumes a basic understanding of programming in Python 3, as well as the mathematics used (although advanced topics are explained). No prior background in the libraries used, SymPy and matplotlib, is assumed. For this reason, this book can serve as an introduction them. Each chapter ends with some programming exercises, which range from easy exercises to more advanced ones.

The book has seven chapters. In the first chapter, "Working with numbers", basic mathematics using pure Python is introduced (no SymPy yet). It should be noted that Python 3 (not Python 2) is required for this book. One of the earliest examples in the book (3/2 == 1.5) will not work correctly without it. I applaud this choice, although I might have added a more prominent warning to wary users. (As a side note, in the appendix, it is recommended to install Python via Anaconda, which I also applaud). This chapter also introduces the fractions module, which seems odd since sympy.Rational will be implicitly used for rational numbers later in the text (to little harm, however, since SymPy automatically converts fractions.Fraction instances to sympy.Rational).

In all, this chapter is a good introduction to the basics of the mathematics of Python. There is also an introduction to variables and strings. However, as I noted above, one should really have some background with basic Python before reading this book, as concepts like flow control and function definition are assumed (note: there is an appendix that goes over this).

Chapters 2 and 3 cover plotting with matplotlib and basic statistics, respectively. I will not say much about the matplotlib chapter, since I know only basic matplotlib myself. I will note that the chapter covers matplotlib from a (high school) mathematics point of view, starting with a definition of the Cartesian plane, which seems a fitting choice for the book.

Chapter 3 shows how to do basic statistics (mean, median, standard deviation, etc.) using pure Python. This chapter is clearly meant for pedagogical purposes for basic statistics, since the basic functions mean, median, etc. are implemented from scratch (as opposed to using numpy.mean or the standard library statistics.mean). This serves as a good introduction to more Python concepts (like collections.Counter) and statistics.

Note that the functions in this chapter assume that the data is the entire population, not a sample. This is mentioned at the beginning of the chapter, but not elaborated on. For example, this leads to a different definition of variance than what might be seen elsewhere (the calculate_variance used in this chapter is statistics.pvariance, not statistics.variance).

It is good to see that a numerically stable definition of variance is used here (see PEP 450 for more discussion on this). These numerical issues show why it is important to use a real statistics library rather than a home grown one. In other words, use this chapter to learn more about statistics and Python, but if you ever need to do statistics on real data, use a statistics library like statistics or numpy. Finally, I should note that this book appears to be written against Python 3.3, whereas statistics was added to the Python standard library in Python 3.4. Perhaps it will get a mention in future editions.

Chapter 4, "Algebra and Symbolic Math with SymPy" starts the introduction to SymPy. The chapter starts similar to the official SymPy tutorial in describing what symbolics is, and guiding the reader away from common misconceptions and gotchas. The chapter does a good job of explaining common gotchas and avoiding antipatterns.

This chapter may serve as an alternative to the official tutorial. Unlike the official tutorial, which jumps into higher-level mathematics and broader use-cases, this chapter may be better suited to those wishing to use SymPy from the standpoint of high school mathematics.

My only gripes with this chapter, which, in total, are minor, relate to printing.

The typesetting of the pretty printing is inconsistent and, in some cases, incorrect. Powers are printed in the book using superscript numbers, like
```
x²
```
However, SymPy prints powers like
```
 2
x
```
even when Unicode pretty printing is enabled. This is a minor point, but it may confuse users. Also, the output appears to use ASCII pretty printing (mixed with superscript powers), for example
```
    x²   x³   x⁴   x⁵
x + -- + -- + -- + --
    2    3    4    5
```
Most users will either get MathJax printing (if they are using the Jupyter notebook), or Unicode printing, like
```
     2    3    4    5
    x    x    x    x
x + ── + ── + ── + ──
    2    3    4    5
```
Again, this is a minor point, but at the very least the correct printing looks better than the fake printing used here.
In line with the previous point, I would recommend telling the user to start with init_printing(). The function is used once to change the order of printing to rev-lex (for series printing). There is a link to the tutorial page on printing. That page goes into more depth than is necessary for the book, but I would recommend at least mentioning to always call init_printing(), as 2-D printing can make a huge difference over the default str printing, and it obviates the need to call pprint.

Chapter 5, "Playing with Sets and Probability" covers SymPy's set objects (particularly FiniteSet) to do some basic set theory and probability. I'm excited to see this in the book. The sets module in SymPy is relatively new, but quite powerful. We do not yet have an introduction to the sets module in the SymPy tutorial. This chapter serves as a good introduction to it (albeit only with finite sets, but the SymPy functions that operate on infinite sets are exactly the same as the ones that operate on finite sets). In all, I don't have much to say about this chapter other than that I was pleasantly surprised to see it included.

Chapter 6 shows how to draw geometric shapes and fractals with matplotlib. I again won't say much on this, as I am no matplotlib expert. The ability to draw leaf fractals and Sierpiński triangles with Python does look entertaining, and should keep readers enthralled.

Chapter 7, "Solving Calculus Problems" goes into more depth with SymPy. In particular, assumptions, limits, derivatives, and integrals. The chapter alternates between symbolic formulations using SymPy and numeric calculations (using evalf). The numeric calculations are done both for simple examples and more advanced things (like implementing gradient descent).

One small gripe here. The book shows that

from sympy import Symbol
x = Symbol('x')
if (x + 5) > 0:
    print('Do Something')
else:
    print('Do Something else')

raises TypeError at the evaluation of (x + 5) > 0 because its truth value cannot be determined. The solution to this issue is given as

x = Symbol('x', positive=True)
if (x + 5) > 0:
    print('Do Something')
else:
    print('Do Something else')

Setting x to be positive via Symbol('x', positive=True) is correct, but even in this case, evaluating an inequality may still raise a TypeError (for example, if (x - 5) > 0). The better way to do this is to use (x + 5).is_positive. This would require a bit more discussion, especially since SymPy uses a three-valued logic for assumptions, but I do consider "if <symbolic inequality>" to be a SymPy antipattern.

I like Saha's approach in this chapter of first showing unevaluated forms (Limit, Derivative, Integral), and then evaluating them with doit(). This puts users in the mindset of a mathematical expression being a formula which may or may not later be "calculated". The opposite approach, using the function forms, limit, diff, and integrate, which evaluate if they can and return an unevaluated object if they can't, can be confusing to new users in my experience. A common new SymPy user question is (some form of) "how do I evaluate an expression?" (the answer is doit()). Saha's approach avoids this question by showing doit() from the outset.

I also like that this chapter explains the gotcha of math.sin(Symbol('x')), although I personally would have included this earlier in the text.

(Side note: now that I look, these are both areas in which the official tutorial could be improved).

Summary

This book is a good introduction to doing math with Python, and, for the chapters that use it, a good basic introduction to SymPy. I would recommend it to anyone wishing to learn SymPy, but especially to anyone whose knowledge of mathematics may preclude them from getting the most out of the official SymPy tutorial.

I imagine this book would work well as a pedagogical tool, either for math teachers or for self-learners. The exercises in this book should push the motivated to learn more.

I have a few minor gripes, but no major issues.

You can purchase this book from the No Starch Press website, both as a print book or an ebook. The website also includes a sample chapter (chapter 1), code samples from the book, and exercise solutions.

Lessons learned from working at Continuum

Aaron Meurer

2015-10-05

Comments

Last Friday was my last day working at Continuum Analytics. I enjoyed my time at the company, and wish success to it, but the time has come for me to move on. Starting later this year, I will start working with Anthony Scopatz at his new lab ERGS at the University of South Carolina.

During my time at Continuum (over two years if you count a summer internship), I primarily worked on the Anaconda distribution and its open source package manager, conda. I learned a lot of lessons in that time, and I'd like to share some of them here.

In no particular order:

Left to their own devices, people will make the minimal possible solution to packaging. They won't try to architect something. The result will be over-engineered, specific to their use-case, and lack reproducibility.
The best way to ensure that some software has no bugs is for it to have many users.
Be wary of the "software would be great if it weren't for all the users" mentality (cf. the previous point).
Most people don't code defensively. If you are working on a project that requires extreme stability, be cautious of contributions from those outside the development team.
Hostility towards Windows and Windows users doesn't help anyone.
https://twitter.com/asmeurer/status/593170976981913600
For a software updater, stability is the number one priority. If the updater breaks, how can a fix be deployed?
Even if you configure your program to update itself every time it runs you will still get bug reports with arbitrarily old versions.
Separating components into separate git repositories leads to a philosophical separation of concerns among the components.
Everyone who isn't an active developer on the project will ignore this separation and open issues in the wrong repo.
Avoid object oriented programming when procedural programming will do just fine.¹
Open source is more about the open than the source. Develop things in the open, and you will create a community that respects you.¹
Academics (often) don't know good software practices, nor good licensing practices.
Neither do some large corporations.
Avoid over-engineering things.
Far fewer people than I would have thought understand the difference between hard links and soft links.²
Changelogs are useful.
Semantic versioning is over-hyped.
If you make something and release it, the first version should be 1.0 (not 0.1 or 0.0.1).
Getting a difficult package to compile is like hacking a computer. All it takes is time.
It doesn't matter how open source friendly your business is, there will always be people who will be skeptical and point their fingers at the smallest proprietary components, fear monger, and overgeneralize unrelated issues into FUD. These people should generally be ignored.
Don't feed the trolls.¹
People constantly misspell the name of Apple's desktop operating system.
People always assume you have way more automation than you really do.
The Python standard library is not a Zen garden. Some parts of it are completely broken, and if you need to rely on them, you'll have to rewrite them. shutil.rmtree on Windows is one example of this.
Linux is strictly backwards compatible. Windows is strictly forwards compatible. ³
On Linux, things tend to be very simple. On Windows, things tend to be very complicated.
I can't decide about OS X. It lies somewhere in between.
Nobody uses 32-bit Linux. Why do we even support that?
People oversimplify the problem of solving for package dependencies in their heads. No one realizes that it's meaningless to say something like "the dependencies of NumPy" (every build of every version of NumPy has its own set of dependencies, which may or may not be the same).
Writing a set of rules and a solver to solve against those rules is relatively easy. Writing heuristics to tell users why those rules are unsolvable when they are is hard.
SAT solvers solve NP-complete problems in general, but they can be very fast to solve common case problems. ¹
Some of the smartest people I know, who otherwise make very rational and intelligent decisions, refuse to update to Python 3.
As an introvert, the option of working from home is great for maintaining sanity.
Aeron chairs are awesome.
If living in Austin doesn't turn you into a foodie you will at least gain a respect for them.
Twitter, if used correctly, is a great way to interact with your users.
Twitter is also a great place to learn new things. Follow John Cook and Bret Victor.
One of the best ways to make heavily shared content is to make it about git (at least if you're an expert).
A good optimization algorithm avoids getting caught in local maxima by trying different parts of the search space that initially appear to be worse. The same approach should be taken in life.

Footnotes

These are things that I already knew, but were reiterated. ↩↩↩↩
If you are one of those people, I have a small presentation that explains the difference here ↩
These terms can be confusing, and I admit I got this backwards the first time I wrote this. According to Wikipedia, forwards compatible means a system can accept input intended for a later version of itself and backwards compatible means a system can accept input intended for an earlier version of itself.

What I specifically mean here is that in terms of building packages for Linux or Windows, for Linux, you should build a package on the oldest version that you wish to support. That package will work on newer versions of Linux, but not anything older (generally due to the version of libc you are linked against).

On the other hand, on Windows, you can can compile things on the newest version (I used Windows 8 on my main Windows build VM), and it will work on older versions of Windows like XP (as long as you ship the right runtime DLLs). This is also somewhat confusing because Windows tends to be both forwards compatible and backwards compatible. ↩

Python Trickery

Aaron Meurer

2015-05-04

Comments

Here are some bits of Python trickery that I have come across. Each of the following is invalid syntax. See if you can figure out why

1 + not(2)

f(*(
i for i in range(10)
if i % 2),
x=3,)