Skip to main content

Automatically deploying this blog to GitHub Pages with Travis CI

This blog is now deployed to GitHub pages automatically from Travis CI.

As I've outlined in the past, is built with the Nikola static blogging engine. I really like Nikola because it uses Python, has lots of nice extensions, and is sanely licensed.

Most importantly, it is a static site generator, meaning I write my posts in Markdown, and Nikola generates the site as static web content ("static" means no web server is required to run the site). This means that the site can be hosted for free on GitHub pages. This is how this site has been hosted since I started it. I have a GitHub repo for the site, and the content itself is deployed to the gh-pages branch of the repo. But until now, the deployment has happened only manually with the nikola github_deploy command.

A much better way is to deploy automatically using Travis CI. That way, I do not need to run any software on my computer to deploy the blog.

The steps outlined here will work for any static site generator. They assume you already have one set up and hosted on GitHub.

Step 1: Create a .travis.yml file

Create a .travis.yml file like the one below

sudo: false
language: python

python:
  - 3.6

install:
  - pip install "Nikola[extras]" doctr

script:
  - set -e
  - nikola build
  - doctr deploy . --built-docs output/
  • If you use a different static site generator, replace nikola with that site generator's command.
  • If you have Nikola configured to output to a different directory, or use a different static site generator, replace --built-docs output/ with the directory where the site is built.
  • Add any extra packages you need to build your site to the pip install command. For instance, I use the commonmark extension for Nikola, so I need to install commonmark.
  • The set -e line is important. It will prevent the blog from being deployed if the build fails.

Then go to https://travis-ci.org/profile/ and enable Travis for your blog repo.

Step 2: Run doctr

The key here is doctr, a tool I wrote with Gil Forsyth that makes deploying anything from Travis CI to GitHub Pages a breeze. It automatically handles creating and encrypting a deploy SSH key for GitHub, and the syncing of files to the gh-pages branch.

First install doctr. doctr requires Python 3.5+, so you'll need that. You can install it with conda:

conda install -c conda-forge doctr

or if you don't use conda, with pip

pip install doctr

Then run this command in your blog repo:

doctr configure

This will ask you for your GitHub username and password, and for the name of the repo you are deploying from and to (for instance, for my blog, I entered asmeurer/blog). The output will look something like this:

$ doctr configure
What is your GitHub username? asmeurer
Enter the GitHub password for asmeurer:
A two-factor authentication code is required: app
Authentication code: 911451
What repo do you want to build the docs for (org/reponame, like 'drdoctr/doctr')? asmeurer/blog
What repo do you want to deploy the docs to? [asmeurer/blog] asmeurer/blog
Generating public/private rsa key pair.
Your identification has been saved in github_deploy_key.
Your public key has been saved in github_deploy_key.pub.
The key fingerprint is:
SHA256:4cscEfJCy9DTUb3DnPNfvbBHod2bqH7LEqz4BvBEkqc doctr deploy key for asmeurer/blog
The key's randomart image is:
+---[RSA 4096]----+
|    ..+.oo..     |
|     *o*..  .    |
|      O.+  o o   |
|     E + o  B  . |
|      + S .  +o +|
|       = o o o.o+|
|        * . . =.=|
|       . o ..+ =.|
|        o..o+oo  |
+----[SHA256]-----+

The deploy key has been added for asmeurer/blog.

You can go to https://github.com/asmeurer/blog/settings/keys to revoke the deploy key.

================== You should now do the following ==================

1. Commit the file github_deploy_key.enc.

2. Add

    script:
      - set -e
      - # Command to build your docs
      - pip install doctr
      - doctr deploy <deploy_directory>

to the docs build of your .travis.yml.  The 'set -e' prevents doctr from
running when the docs build fails. Use the 'script' section so that if
doctr fails it causes the build to fail.

3. Put

    env:
      global:
        - secure: "Kf8DlqFuQz9ekJXpd3Q9sW5cs+CvaHpsXPSz0QmSZ01HlA4iOtdWVvUttDNb6VGyR6DcAkXlADRf/KzvAJvaqUVotETJ1LD2SegnPzgdz4t8zK21DhKt29PtqndeUocTBA6B3x6KnACdBx4enmZMTafTNRX82RMppwqxSMqO8mA="

in your .travis.yml.

Follow the steps at the end of the command:

  1. Commit the file github_deploy_key.enc.
  2. You already have doctr deploy in your .travis.yml from step 1 above.
  3. Add the env block to your .travis.yml. This will let Travis CI decrypt the SSH key used to deploy to gh-pages.

That's it

Doctr will now deploy your blog automatically. You may want to look at the Travis build to make sure everything works. Note that doctr only deploys from master by default (see below). You may also want to look at the other command line flags for doctr deploy, which let you do things such as to deploy to gh-pages for a different repo than the one your blog is hosted on.

I recommend these steps over the ones in the Nikola manual because doctr handles the SSH key generation for you, making things more secure. I also found that the nikola github_deploy command was doing too much, and doctr handles syncing the built pages already anyway. Using doctr is much simpler.

Extra stuff

Reverting a build

If a build goes wrong and you need to revert it, you'll need to use git to revert the commit on your gh-pages branch. Unfortunately, GitHub doesn't seem to have a way to revert commits in their web interface, so it has to be done from the command line.

Revoking the deploy key

To revoke the deploy key generated by doctr, go your repo in GitHub, click on "settings" and then "deploy keys". Do this if you decide to stop using this, or if you feel the key may have been compromised. If you do this, the deployment will stop until you run step 2 again to create a new key.

Building the blog from branches

You can also build your blog from branches, e.g., if you want to test things out without deploying to the final repo.

We will use the steps outlined here.

Replace the line

  - doctr deploy . --built-docs output/

in your .travis.yml with something like

  - if [[ "${TRAVIS_BRANCH}" == "master" ]]; then
      doctr deploy . --built-docs output/;
    else
      doctr deploy "branch-$TRAVIS_BRANCH" --built-docs output/ --no-require-master;
    fi

This will deploy your blog as normal from master, but from a branch it will deploy to the branch-<branchname> subdir. For instance, my blog is at http://www.asmeurer.com/blog/, and if I had a branch called test, it would deploy it to http://www.asmeurer.com/blog/branch-test/.

Note that it will not delete old branches for you from gh-pages. You'll need to do that manually once they are merged.

This only works for branches in the same repo. It is not possible to deploy from a branch from pull request from a fork, for security purposes.

Enable build cancellation in Travis

If you go the Travis page for your blog and choose "settings" from the hamburger menu, you can enable auto cancellation for branch builds. This will make it so that if you push many changes in succession, only the most recent one will get built. This makes the changes get built faster, and lets you revert mistakes or typos without them ever actually being deployed.

GitHub Reviews Gripes

Update: I want to keep the original post intact, but I'll post any feature updates that GitHub makes here as they are released.


GitHub recently rolled out a new feature on their pull requests called "Reviews". The feature is (in theory), something that I've been asking for. However, it really falls short in a big way, and I wanted to write down my gripes with it, in the hopes that it spurs someone at GitHub to fix it.

If you don't know, GitHub has had, for some time, a feature called "pull requests" ("PRs"), which lets you quite nicely show the diff and commit differences between two branches before merging them. People can comment on pull requests, or on individual lines in the diff. Once an administrator feels that the pull request is "ready", they can click a button and have the branch automatically merged.

The concept seems super simple in retrospect, but this feature completely revolutionized open source software development. It really is the bread and butter of GitHub. I would argue that this one single feature has made GitHub the (the) primary hosting site for open source software.

Aside from being an awesome idea, GitHub's trademark with pull requests, along with their other features, has been absolute simplicity in implementation. GitHub Reviews marks, by my estimation, the first major feature released by GitHub that completely and utterly lacks in this execution of simplicity.

Let's look at what Reviews is. When the feature first came out, I had a hard time figuring out how it even worked (the poor release date docs didn't help here either).

Basically, at the bottom of a pull request, you now see this

Clicking the "Add your review" button takes you to the diff page (first gripe: why does it move you to the diff page?), and opens this dialog

"OK", you might think, "this is simple enough. A review is just a special comment box where I can approve or reject a pull request." (This is basically the feature that I've been wanting, the ability to approve or reject pull requests.) And if you thought that, you'd be wrong.

The simplest way I can describe a review, having played with it, is that it is a distinct method of commenting on pull requests and on lines of diffs of pull requests. Distinct, that is, from the methods that already exist in the GitHub pull requests feature. That's right. There are now two ways to comment on a pull request (or on a line in a pull request). There's the old way, which involves typing text into the box at the bottom of the main pull request page (or on a line, and then pressing "Add a single comment"), and the new way, which involves clicking a special button at the top of the diff view (and the diff view only) (or by clicking a line in the diff and clicking "Start a review").

How do these two ways of the extremely simple task of commenting differ from one another? Two ways. One, with the old way, when you comment on a PR (or line), the comment is made immediately. It's saved instantly to the GitHub database, and a notification email is sent to everyone subscribed to the PR. With the new way, the comment is not made immediately. Instead, you start a "review", which postpones all comments from being published until you scroll to the top and press a button ("Review changes"). Did you forget to scroll to the top and press that button? Oh well, your comments never got sent to anyone.

Now, I've been told by some people that delayed commenting is a feature that they like. I can see how fewer total emails could be nice. But if you just want a way to delay comments, why do you need distinct commenting UIs? Couldn't the same thing be achieved via a user setting (I highly suspect that any given person will either like or dislike delayed commenting universally)? Or with a checkbox next to the comment button, like "delay notifications for this comment"? You can probably guess by now which of the two commenting systems I prefer. But guess what happens when I press the "Cmd-Enter" keyboard shortcut that's been hard-wired into my brain to submit a comment? I'll give you a hint: the result does not make me happy.

The second distinction between normal, old-fashioned commenting and the new-fangled reviews system is that when you finalize a review, you can elect to "approve" or "reject" the pull request. This approval or rejection gets set as a special status on the pull request. This status, for me personally, is the only feature here that I've been wanting. It turns out, however, that it's completely broken, and useless.

Here's my problem. We have, at the time of writing, 382 open pull requests in SymPy. A lot of these are old, and need to be triaged. But the problem from my point of view is the new ones. When I look through the list of pull requests, I want to be able to know, at a glance, which ones are "reviewable". For me, this means two things

  1. The tests pass.

  2. No other reviewer (myself included) has already requested changes, which still need to be made by the PR author.

Point 1 is really easy to see. In the pull request list, there is a nice green checkmark if Travis passed and a red X if it failed.

The second point is a disaster. Unfortunately, there's no simple way to do this. You might suggest adding a special label, like "Needs changes", to pull requests that have been reviewed. The problem with this is that the label won't go away when the changes have been made. And to worsen things, people who don't have push access (in the list above, only two PR authors have push access, and one of them is me), cannot add or remove labels on pull requests.

Another thing that has been suggested to me is an external "review" service that sets a status for a review. The problem with this (aside from the fact that I couldn't find one that actually did the very simple thing that I wanted), is that you now have to teach all your pull request reviewers to use this service. You might as well forget about it.

Having a first-class way in GitHub for reviewers to say "I approve these changes" or "I don't approve these changes" would be a huge boon, because then everyone would use it.

So great right, this new Reviews feature is exactly what you want, you say. You can now approve or reject pull requests.

Well no, because GitHub managed to overengineer this feature to the point of making it useless. This completely simple feature. All they had to do was extend the status UI and add a simple "I approve/I reject" button. If they did that, it would have worked perfectly.

Here are the problems. First, the pull request list has no indication of review status. Guess which pull requests in the above screenshot have reviews (and which are positive and which are negative). You can't tell (for example, the last one in the list has a negative review). If they were actually treated like statuses, like the UI suggests that they would, you would at least see an X on the ones that have negative reviews (positive reviews I'm much less worried about; most people who review PRs have merge access, so if they like the PR they can just merge it). I would suggest to GitHub to add, next to the status checkbox, a picture of everyone who's made a review on the PR, with a green checkmark or red X to indicate the type of review. Also, add buttons (buttons, not just buried advanced search options) to filter by reviews.

OK, so that's a minor UI annoyance, but it gets worse. Next on the docket, you can't review your own pull requests. It's not allowed for some reason.

Now why would you want to review your own pull request, you might ask? Aren't you always going to "approve" your own PR? Well, first off, no. There is such a thing as a WIP PR. The author setting a negative review on his own PR would be a great way to indicate WIP status (especially given the way reviews work, see my next gripe). Secondly, the "author" of a pull request is just the person who clicked the "New pull request" button. That's not necessarily the only person who has changes in the pull request. Thanks to the magic of how git works, it's quite easy to have a pull request with commits from many people. Multiple people pushing to a shared branch, with a matching pull request for discussion (and easy viewing of new commits and diff) is a valid and useful workflow (it's the only one I know of that works for writing collaborative prose). For the SymPy paper, I wanted to use GitHub Reviews to sign off on a common branch, but since I'm the one who started the pull request, I couldn't do it.

Next gripe, and this, I want to stress, makes the whole feature completely useless for my needs: reviews do not reset when new commits are pushed. Now, I just outlined above two use-cases where you might want to do a review that doesn't reset (marking WIP, and marking approval, although the second is debatable), but both of those can easily be done by other means, like editing the title of the PR, or old-fashioned commenting. The whole point of Reviews (especially negative reviews), you'd think, would be to indicate to people that the pull request, as it currently stands, needs new changes. A negative review is like failing your "human" test suite.

But unlike your automated test suite, which reset and get a new go every time you push a change (because hey, who knows, maybe the change ACTUALLY FIXED THE ISSUE), reviews do not reset, unless the original reviewers explicitly change them. So my dream of being able to glance at the pull request list and see which PRs need changes has officially been piped. Even if the list actually showed what PRs have been reviewed, it would be a lie, because as soon as the PR author pushes a change, the review status becomes potentially outdated.

Now, given the rocky start that this whole feature has had, I figured that this was probably just a simple bug. But after I reported it to GitHub, they've informed me that this is in fact intended behavior.

To make things worse, GitHub has another feature with Reviews, called required reviews. You can make it so that every pull request must receive at least one positive review and zero negative reviews before it can be merged (go to the branch settings for your repository). This works similar to required status checks, which make it so that your tests must pass before a PR can be merged. In practice, this means you need zero negative reviews, since anyone with push access could just do a positive review before merging (although annoyingly, you have to actually manually do it; IMHO, just requiring zero negative reviews should be sufficient, since merging is implicitly a positive review).

Now, you can see that the above "feature" of reviews not resetting breaks the whole thing. If someone negative reviews a PR, that one person has to go in and change their review before it can be merged. And even if the author pushes new changes to fix the issues outlined in the review, the PR cannot be merged until the reviewer resets it. So this actually makes the reviewing situation worse, because now anyone who reviews a pull request at any point in time has to go through with it all the way to the merge. I can't go to a PR that someone requested changes for, which were later made by the author, and merge it. I have to ping the reviewer and get them to change their review first. Needless to say, we do not have this feature enabled for SymPy's repo.

I think I maybe see the intended use-case here. You want to make it so that people's reviews are not forgotten or ignored. But that's completely foreign to my own problems. I trust the SymPy community, and the people who have push access to do due diligence before merging a pull request. And if a bad change gets in, we can revert it. Maybe this feature matters more for projects that continuously deploy. Likely most of the code internal at GitHub works like that. But guess what GitHub, most of the code on GitHub does not work like that. You need to rethink this feature to support more than just your use-cases.

I think starting simple, say, just a simple "approve/reject" button on each PR, which just adds an icon, and that's it, would have been a better approach. Then they could have listened to the feedback on what sorts of things people wanted it to be able to do (like setting a status, or showing up in the search list, or "delayed commenting" if that's really what people want). This is how GitHub used to do things. It's frustrating to see a feature implemented that doesn't (yet) do quite what you want, but it's even more frustrating to see a feature implemented that does all the things that you don't want.

Summary

Yes, I'm a little mad here. I hope you enjoyed my rant. Here are what I see as the problems with the "Reviews" feature. I don't know how to fix these problems (I'm not a UI/UX guy. GitHub supposedly hires them, though).

  • There are now two distinct ways to comment on a PR (delayed and non-delayed). There should be one (say, with a checkbox to delay commenting).

  • If you insist on keeping delayed commenting, let me turn it off by default (default = the Cmd-Enter keyboard shortcut).

  • The reviews button is buried on the diff page. I would put it under the main PR comment box, and just reuse the same comment box.

  • Reviews should show up in the pull request list. They should be filterable with a nice UI.

  • Let me review my own pull requests. These can be excluded from required reviews (that makes sense to me). Beyond that, there's no reason this shouldn't be allowed.

  • Don't require a positive review for required reviews, only zero negative reviews. Merging a PR is implicitly positively reviewing it.

  • Allow reviews to reset when new commits are pushed.

I get that the last point may not be what everyone wants. But GitHub needs to think about UI, and defaults here. Right now, the UI looks like reviews are like statuses, but they actually aren't because of this.

I am dispirited to see GitHub release such a broken feature, but even the best trip up sometimes. I'm not yet calling "doom" on GitHub. Everyone has their hockey puck mice. I'm actually hopeful that they can fix these issues, and implement a feature that makes real headway into helping me solve one of my biggest problems on GitHub right now, the reviewing of pull requests.

Tuples

Today, David Beazley made some tweets:

There are quite a few good responses to these tweets, both from David and from others (and from yours truly). I recommend reading the the thread (click on the first tweet above).

Now to start off, I want to say that I respect the hell out of David Beazley. The guy literally wrote the book on Python, and he knows way more about Python than I ever will. He's also one of the most entertaining Python people you can follow on Twitter. But hey, that doesn't mean I can't disagree sometimes.

List vs. Tuple. Fight!

As you probably know, there are two "array" datatypes in Python, list and tuple.1 The primary difference between the two is that lists are mutable, that is you can change their entries and length after they are created, with methods like .append or +=. Tuples, on the other hand, are immutable. Once you create one, you cannot change it. This makes the implementation simpler (and hence faster, although don't let anyone tell you you should use a tuple just because it's faster). This, as Ned Batchelder points out, is the only technical difference between the two.

The the idea that particularly bugs me here is that tuples are primarily useful as "record" datatypes.

Tuples are awesome for records. This is both by design—since they have a fixed shape, the positions in a tuple can be "fixed" values, and by convention—if a Python programmer sees parentheses instead of square brackets, he is more likely to see the object as "record-like". The namedtuple object in the standard library takes the record idea further by letting you actually name the fields:

>>> from collections import namedtuple
>>> person = namedtuple('Person', 'name, age')
>>> person('Aaron', 26)
Person(name='Aaron', age=26)

But is that really the only place you'd want to use a tuple over a list?

Consider five other places you might encounter a tuple in Python, courtesy of Allen Downey:

In code these look like:

  1. Multiple assignments:

    >>> (a, b) = 1, 2
    

    (yes, the parentheses are optional here, as they are in many places where a tuple can be used, but this is still a tuple, or at least it looks like one ;)

  2. Multiple return values:

    For example, os.walk. This is for the most part a special case of using tuples as records.

  3. *args:

    >>> def f(*args):
    ...     print(type(args), args)
    ...
    >>> f(1, 2, 3)
    <class 'tuple'> (1, 2, 3)
    

    Arbitrary positional function arguments are always stored as a tuple.

  4. Return value from builtins zip, enumerate, etc.:

    >>> for i in zip(range(3), 'abc'):
    ...     print(i)
    ...
    (0, 'a')
    (1, 'b')
    (2, 'c')
    >>> for i in enumerate('abc'):
    ...     print(i)
    ...
    (0, 'a')
    (1, 'b')
    (2, 'c')
    

    This also applies to the combinatoric generators in itertools (like product, combinations, etc.)

  5. Dictionary keys:

    >>> {
    ...     (0, 0): '.',
    ...     (0, 1): ' ',
    ...     (1, 0): '.',
    ...     (1, 1): ' ',
    ... }
    {(0, 1): ' ', (1, 0): '.', (0, 0): '.', (1, 1): ' '}
    

This last one I find to be very important. You could arguably use a list for the first four of Allen Downey's points2 (or Python could have, if it wanted to). But it is impossible to meaningfully hash a mutable data structure in Python, and hashability is a requirement for dictionary keys.

However, be careful. Not all tuples are hashable. Tuples can contain anything, but only tuples of immutable values are hashable. Consider4

>>> t = (1, 2, [3, 4])
>>> t[2].append(5)
>>> t
(1, 2, [3, 4, 5])

Such tuples are not hashable, and cannot be used as dictionary keys.

>>> hash(t)
Traceback (most recent call last):
  File "<ipython-input-39-36822ba665ca>", line 1, in <module>
    hash(t)
TypeError: unhashable type: 'list'

Why is list the Default?

My second gripe here is this notion that your default ordered collection object in Python should be list. tuples are only to be used as "records", or if you suspect might want to use it as a dictionary key. First off, you never know when you'll want something to be hashable. Both dictionary keys and sets require hashability. Suppose you want to de-duplicate a collection of sequences. If you represent the sequences with list, you'll either have to write a custom loop that checks for duplicates, or manually convert them to tuple and throw them in a set. If you start with tuple, you don't have to worry about it (again, assuming the entries of the tuples are all hashable as well).

Consider another usage of tuples, which I consider to be important, namely tree structures. Say you wanted a simple representation of a Python syntax tree. You might represent 1 - 2*(-3 + 4) as

('-', 1, ('*', 2, ('+', ('-', 3), 4)))

This isn't really a record. The meaning of the entries in the tuples is determined by the first value of the tuple, not position. In this example, the length of the tuple also signifies meaning (binary vs. unary -).

If this looks familiar to you, it's because this is how the language Lisp represents all programs. This is a common pattern. Dask graphs use tuples and dictionaries to represent computations. SymPy expression trees use tuples and Python classes to represent symbolic mathematical expressions.

But why use tuples over lists here? Suppose you had an object like the one above, but using lists: ['-', 1, ['*', 2, ['+', ['-', 3], 4]]]. If you discover you need to use this as a dictionary key, or want to put it in a set, you would need to convert this to a hashable object. To do this you need to write a function that recursively converts each list to a tuple. See how long it takes you to write that function correctly.

Mutability is Bad

More to the point, however, mutability is bad. I counted 12 distinct methods on list that mutate it (how many can you remember off the top of your head?3). Any function that gets access to a list can mutate it, using any one of these methods. All it takes is for someone to forget that += mutates a list (and that they should copy it first) for code completely distant from the origin definition to cause issues. The hardest bug I ever debugged had a three character fix, adding [:] to copy a global list that I was (accidentally) mutating. It took me a several hour airplane ride and some deep dark magic that I'll leave for another blog post to discover the source of my problems (the problems I was having appeared to be quite distant from the actual source).

A Better "Default"

I propose that Python code in general would be vastly improved if people used tuple as the default ordered collection, and only switched to list if mutation was necessary (it's less necessary than you think; you can always copy a tuple instead of mutating it). I agree with David Beazley that you don't "sometimes need a read only list". Rather, you "sometimes need a writable tuple".

This makes more sense than defaulting to list, and only switching to tuple when hashability is needed, or when some weird "rule of thumb" applies that says that you should use tuple if you have a "record". Maybe there's a good reason that *args and almost all builtin and standard library functions return tuples instead of lists. It's harder to accidentally break someone else's code, or have someone else accidentally break your code, when your data structures are immutable.

Footnotes


  1. I want to avoid saying "a tuple is an immutable list", since "list" can be interpreted in two ways, as an English word meaning "ordered collection" (in which case, the statement is true), or as the Python type list (in which case, the statement is false—tuple is not a subclass of list). 

  2. Yes,

    >>> [a, b] = 1, 2
    

    works. 

  3.  
  4. One of the tweets from the conversation:

    This is similar to this example. But it turns out this one doesn't work:

    >>> t = (1,2, [3, 4])
    >>> t[2] += [5,6]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'tuple' object does not support item assignment
    

    I have no idea why. It seems to me that it should work. t[2] is a list and list has __iadd__ defined. It seems that Python gets kind of weird about things on the left-hand side of an assignment. EDIT: Here's why.