A Reasonable Approximation

Latest posts

Against boots theory

The reason that the rich were so rich, Vimes reasoned, was because they managed to spend less money.

Take boots, for example. He earned thirty-eight dollars a month plus allowances. A really good pair of leather boots cost fifty dollars. But an affordable pair of boots, which were sort of OK for a season or two and then leaked like hell when the cardboard gave out, cost about ten dollars. Those were the kind of boots Vimes always bought, and wore until the soles were so thin that he could tell where he was in Ankh-Morpork on a foggy night by the feel of the cobbles.

But the thing was that good boots lasted for years and years. A man who could afford fifty dollars had a pair of boots that'd still be keeping his feet dry in ten years' time, while the poor man who could only afford cheap boots would have spent a hundred dollars on boots in the same time and would still have wet feet.

This was the Captain Samuel Vimes 'Boots' theory of socioeconomic unfairness.

– Terry Pratchett, Men at Arms

This is a compelling narrative. And I do believe there's some truth to it. I could believe that if you always buy the cheapest boots you can find, you'll spend more money than if you bought something more expensive and reliable. Similar for laptops, smartphones, cars. Especially (as Siderea notes, among other things) if you know how to buy expensive things that are more reliable.

But it's presented as "the reason that the rich [are] so rich". Is that true? I mean, no, obviously not. If your pre-tax income is less than the amount I put into my savings account, then no amount of "spending less money on things" is going to bring you to my level.

Is it even a contributing factor? Is part of the reason why the rich are so rich, that they manage to spend less money? Do the rich in fact spend less money than the poor?

That's less obvious, but I predict not. I predict that the rich spend more than the poor in total, but also on boots, laptops, smartphones, cars, and most other things. There might be exceptions where rich people consume less of the thing than poor people - bus tickets, for example - but I think if you group spending in fairly natural ways, the rich will spend more than the poor in almost every group.

  • Maybe they spend less money on their daily wear boots, but own more pairs of shoes for different occasions. Or maybe they decide that they care about other things than lifetime cost for their daily wear boots, and spend more on those, too. (Being rich means they can afford to care about other things than lifetime cost.)

  • Apparently famous people often get comped meals, but I bet most of them still spend more money on food than I do.

  • I spent £500 on a laptop in 2013, and before that, £300 in 2008. If I'd gone for £200 laptops each time, maybe they would only have lasted two years each. But if I weren't a techno-masochist, maybe I'd realize that using old laptops actually kind of sucks, and I'd upgrade far more often. My work laptop, bought by people who want me to be maximally effective at my job, cost over £1000 and isn't going to last ten years.

  • Financial services are a case where I'd guess the rich and the poor spend money on very different things. I assume the rich don't have to pay to cash a cheque, and very rarely visit loan sharks. But the poor rarely have Amex Platinum cards ($550/year), or personal accountants. (Maybe it's unfair to count those because they save you money in other areas?)

  • Buying a house may be cheaper in the long run than renting a similar house nearby. But rich people tend to live in nicer houses and/or nicer areas.

Those are all guesses. I don't have good data on this, and I'd love to see it if you do.

For what data I do have, the first google result was this page from the UK's Office of National Statistics. Specifically, look at figure 4, "Indexed household income, total spending and spending by component by income decile, UK, FYE 2019".

They split households into ten income levels, and look at four categories of spending plus total spending. Each of those is a near-strictly increasing line from "poor people spend less" to "rich people spend more". (I see two blips: the 90th percentile of income spends slightly less on housing than the 80th, and the 70th spends slightly less on food and non-alcoholid drinks than the 60th. The other categories are transport, and recreation and culture. These four are the largest spending categories on average across all income levels. The graph also has disposable income, which I think is irrelevant for current purposes.)

(I repeat that this specific data is not strong evidence. The source for it is the living costs and food survey, which might have more detail. (Link goes to the previous year's version because that's what I could find.) Unfortunately it's not open access. It might be freely available if I register, but I don't care enough to try right now. In any case, we'd also want data from outside the UK.)

There will obviously be some exceptions. There will be some rich people who spend less money than some poor people. There will probably even be some rich people who spend less money than some poor people, and would not be rich otherwise. But as a general theory for why the rich are rich? I just don't buy it.

I believe boots theory points towards one component of socioeconomic unfairness. But boots theory itself is supposed to be a theory of why the rich are so rich. It's very clear about that. It's clearly wrong, and I predict that even a weakened version of it is wrong.

To be a little more precise, I think boots theory as written makes three increasingly strong claims, that we could think of as "levels of boots theory":

  1. Being rich enables you to spend less money on things. (More generally: having incrementally more capital lets you spend incrementally less money. Also, being rich is super convenient in many ways.) This phenomenon is also called a ghetto tax.
  2. Also, rich people do in fact spend less money on things.
  3. Also, this is why rich people are rich.

All of these levels have stronger and weaker forms. But I think a quick look at the world tells us that the first level is obviously true under any reasonable interpretation, and the third level is obviously false under any reasonable interpretation. The second I predict is "basically just false under most reasonable interpretations", but it's less obvious and more dependent on details. There may well be weak forms of it that are true.

It may be that most people, when they think of boots theory, think only of levels one or two, not level three. I don't know if you can read this quora thread that I found on Google. It asks "How applicable to real life is the Sam Vimes "Boots" Theory of Economic Injustice?" The answers mostly agree it's very applicable, but I think most of them are on level one or two. (The one talking about leverage seems like level three, if it's talking about boots theory at all. I'm not convinced it is.)

But it seems to me that boots theory is usually presented in whole in its original form. Its original form is succinct and well written. When people want to comment on it, they very often include the very same quote as I did. And the original form starts by very clearly telling us "this is a theory of why the rich are so rich". It is very obviously level three, which is very obviously wrong.

So I have a few complaints here.

One is, I get the impression that most people don't even notice this. They link or quote something that starts out by saying very clearly "this is a theory of why the rich are so rich", and they don't notice that it's a theory of why the rich are so rich.

(I wouldn't be too surprised (though this is not a prediction) if even the author didn't notice this. Maybe if you had asked him, Terry Pratchett would have said that no, obviously Sam Vimes does not think this is why the rich are so rich, Sam Vimes just thinks this is a good illustration of why it's nice to be rich.)

This disconnect between what a thing actually says, and what people seem to think it says, just bothers me. I feel the desire to point it out.

Another is, I think there's a motte-and-bailey going on between levels one and two. A quora commenter says it's "far more expensive to be poor than it is to be rich, both in a percentage of income respect and a direct effect". He gives examples of things that rich people can spend less money on, if they choose. He doesn't provide data that rich people do spend less money on these things. Another describes how being rich lets you save money on food staples by stocking up when there's a sale. He doesn't provide data that rich people do spend less money on food or even staples. You could certainly make the case that neither of these people is explicitly claiming level two. But I do think they're hinting in that direction, even if it's not deliberate.

And relatedly: if we want to help people escape poverty, we need to know on what levels boots theory is true or false.1 If we want to know that, we need to be able to distinguish the levels. If "boots theory" can refer to any of these levels, then simply calling boots theory "true" (or even "false") is uninformative. We need to be more precise than that. To be fair, the quora commenters make specific falsifiable claims, which is commendable. But the claims are meant to be specific examples of a general phenomenon, and the general phenomenon is simply "boots theory", and it's not clear what they think that means.

I advise that if you talk about boots theory, you make it clear which level you're talking about. But maybe don't use that name at all. If you're talking about level one, the name "ghetto tax" seems fine. If you do want to talk about levels two or three, I don't have a good altiernative name to suggest. But since I don't think those levels are true, I'm not sure that's a big problem.

  1. I'm not too confident about this, and I don't want to get too distracted with object-level claims about how to actually fight poverty. But my sense is that: to the extent that level two is true, giving someone money fairly reliably sets up positive feedback loops that help them save more money in future. To the extent that it's not true, these feedback loops don't come for free. Maybe we can seek out spending categories where it is true, or groups of people for whom it is true. Maybe we can teach people how to find and take advantage of these feedback loops. If even level one isn't true, we don't get these loops at all. Of course, maybe it's worth giving people money even if we don't get the feedback loops. 

Posted on 14 September 2020 | Comments

Classifying games like the Prisoner's Dilemma

Consider games with the following payoff matrix:

    Player 2  
    Krump Flitz
Player 1 Krump $(W, W)$ $(X, Y)$
  Flitz $(Y, X)$ $(Z, Z)$

One such game is the Prisoner's Dilemma (in which strategy "Krump" is usually called "Cooperate", and "Flitz" is usually called "Defect"). But the Prisoner's Dilemma has additional structure. Specifically, to qualify as a PD, we must have $Y > W > Z > X$. $Y > W$ gives the motivation to defect if the other player cooperates, and $Z > X$ gives that motivation if the other player defects. With these two constraints, the Nash equilibrium is always going to be Flitz/Flitz for a payoff of $(Z, Z)$. $W > Z$ is what gives the dilemma its teeth; if instead $Z > W$, then that equilibrium is a perfectly fine outcome, possibly the optimal one.

I usually think of a Prisoner's Dilemma as also having $2W > X + Y > 2Z$. That specifies that mutual cooperation has the highest total return - it's "socially optimal" in a meaningful sense1 - while mutual defection has the lowest. It also means you can model the "defect" action as "take some value for yourself, but destroy value in the process". (Alternatively, "cooperate" as "give some of your value to your playmate2, adding to that value in the process".) We might consider instead:

  • If $2W < X + Y$, then defecting while your playmate cooperates creates value (relative to cooperating). From a social perspective, Krump/Flitz or Flitz/Krump is preferable to Krump/Krump; and in an iterated game of this sort, you'd prefer to alternate $X$ with $Y$ than to get a constant $W$. Wikipedia still classes this as a Prisoner's Dilemma, but I think that's dubious terminology, and I don't think it's standard. I might offhand suggest calling it the Too Many Cooks game. (This name assumes that you'd rather go hungry than cook, and that spoiled broth is better than no broth.)
  • If $2Z > X + Y$, then defecting while your playmate defects creates value. I have no issue thinking of this as a Prisoner's Dilemma; my instinct is that most analyses of the central case will also apply to this.

By assigning different values to the various numbers, what other games can we get?

As far as I can tell, we can classify games according to the ordering of $W, X, Y, Z$ (which determine individual outcomes) and of $2W, X + Y, 2Z$ (which determine the social outcomes). Sometimes we'll want to consider the case when two values are equal, but for simplicity I'm going to classify them assuming there are no equalities. Naively there would be $4! · 3! = 144$ possible games, but

  • Reversing the order of everything doesn't change the analysis, it just swaps the labels Krump and Flitz. So we can assume without loss of generality that $W > Z$. That eliminates half the combinations.
  • Obviously $2W > 2Z$, so it's just a question of where $X + Y$ falls in comparison to them. That eliminates another half.
  • If $W > Z > • > •$ then $X + Y < 2Z$. That eliminates another four combinations.
  • If $• > • > W > Z$ then $X + Y > 2W$, eliminating another four.
  • If $W > • > • > Z$ then $2W > X + Y > 2Z$, eliminating four.
  • If $W > • > Z > •$ then $2W > X + Y$, eliminating two.
  • If $• > W > • > Z$ then $X + Y > 2Z$, eliminating two.

That brings us down to just 20 combinations, and we've already looked at three of them, so this seems tractable. In the following, I've grouped games together mostly according to how interesting I think it is to distinguish them, and I've given them names when I didn't know an existing name. Both the names and the grouping should be considered tentative.

Cake Eating: $W > • > • > Z$ (two games)

In this game, you can either Eat Cake or Go Hungry. You like eating cake. You like when your playmate eats cake. There's enough cake for everyone, and no reason to go hungry. The only Nash equilibrium is the one where everyone eats cake, and this is the socially optimal result. Great game! We should play it more often.

(If $X > Y$, then if you had to choose between yourself and your playmate eating cake, you'd eat it yourself. If $Y > X$, then in that situation you'd give it to them. Equalities between $W, Z$ and $X, Y$ signify indifference to (yourself, your playmate) eating cake in various situations.)

Let's Party: $W > Z > • > •$ (two games)

In this game, you can either go to a Party or stay Home. If you both go to a party, great! If you both stay home, that's cool too. If either of you goes to a party while the other stays home, you'd both be super bummed about that.

Home/Home is a Nash equilibrium, but it's not optimal either individually or socially.

In the case $W = Z$, this is a pure coordination game, which doesn't have the benefit of an obvious choice that you can make without communicating.

(Wikipedia calls this the assurance game on that page, but uses that name for the Stag Hunt on the page for that, so I'm not using that name.)

Studying For a Test: $W > X > Z > Y$ (two games)

You can either Study or Bunk Off. No matter what your playmate does, you're better off Studying, and if you Study together you can help each other. If you Bunk Off, then it's more fun if your playmate Bunks Off with you; but better still for you if you just start Studying.

The only Nash equilibrium is Study/Study, which is also socially optimal.

Stag hunt: $W > Y > Z > X$ (two games)

You can either hunt Stag or Hare (sometimes "Rabbit"). If you both hunt Stag, you successfully catch a stag between you, which is great. If you both hunt Hare, you each catch a hare, which is fine. You can catch a hare by yourself, but if you hunt Stag and your playmate hunts Hare, you get nothing.

This also works with $Y = Z$. If $Y > Z$ then two people hunting Hare get in each other's way.

The Nash equilibria are at Stag/Stag and Hare/Hare, and Stag/Stag is socially optimal. Hare/Hare might be the worst possible social result, though I think this game is usually described with $2Z > Y + X$.

See: The Schelling Choice is "Rabbit", not "Stag".

The Abundant Commons: $X > W > • > •$ (five games)

You can Take some resource from the commons, or you can Leave it alone. There's plenty of resource to be taken, and you'll always be better off taking it. But if you and your playmate both play Take, you get in each other's way and reduce efficiency (unless $X = W$).

If $2W > X + Y$ then you don't intefere with each other significantly; the socially optimal result is also the Nash equilibrium. But if $2W < X + Y$ then the total cost of interfering is more than the value of resource either of you can take, and some means of coordinating one person to Take and one to Leave would be socially valuable.

If $Y > Z$ then if (for whatever reason) you Leave the resource, you'd prefer your partner Takes it. If $Z > Y$ you'd prefer them to also Leave it.

An interesting case here is $X > W > Z > Y$ and $X + Y > 2W$. Take/Leave and Leave/Take are social optimal, but the Leave player would prefer literally any other outcome.

Take/Take is the only Nash equilibrium.

Farmer's Dilemma: $Y > W > X > Z$ (two games)

In this game, you can Work (pitch in to help build a mutual resource) or Shirk (not do that). If either of you Works, it provides more than its cost to both of you. Ideally, you want to Shirk while your playmate Works; but if your playmate Shirks, you'd rather Work than leave the work undone. The Nash equilibria are at Work/Shirk and Shirk/Work.

If $2W > X + Y$ then the socially optimal outcome is Work/Work, and a means to coordinate on that outcome would be socially useful. If $2W < X + Y$, the socially optimal outcome is for one player to Work while the other Shirks, but with no obvious choice for which one of you it should be.

Also known as Chicken, Hawk/Dove and Snowdrift.

Anti-coordination: $• > • > W > Z$ (two games)

In this game, the goal is to play a different move than your playmate. If $X = Y$ then there's no reason to prefer one move over another, but if they're not equal there'll be some maneuvering around who gets which reward. If you're not happy with the outcome, then changing the move you play will harm your playmate more than it harms you. The Nash equilibria are when you play different moves, and these are socially optimal.

Prisoner's Dilemma/Too Many Cooks: $Y > W > Z > X$ (three games)

Covered in preamble.

(I'm a little surprised that this is the only case where I've wanted to rename the game depending on the social preference of the outcomes. That said, the only other games where $X + Y$ isn't forced to be greater or less than $2X$ are the Farmer's Dilemma and the Abundant Commons, and those are the ones I'd most expect to want to split in future.)

A graph

I made a graph of these games. I only classified them according to ordering of $W, X, Y, Z$ (i.e. I lumped Prisoner's Dilemma with Too Many Cooks), and I drew an edge whenever two games were the same apart from swapping two adjacent values. It looks like this:



The lines are colored according to which pair of values is swapped (red first two, blue middle two, green last two). I'm not sure we learn much from it, but I find the symmetry pleasing.

A change of basis?

I don't want to look too deep into this right now, but here's a transformation we could apply. Instead of thinking about these games in terms of the numbers $W, X, Y, Z$, we think in terms of "the value of Player 2 playing Flitz over Krump":

  • $α = X - W$, the value to Player 1, if Player 1 plays Krump.
  • $β = Y - W$, the value to Player 2, if Player 1 plays Krump.
  • $γ = Z - Y$, the value to Player 1, if Player 1 plays Flitz.
  • $δ = Z - X$, the value to Player 2, if Player 1 plays Flitz.

These four numbers determine $W, X, Y, Z$, up to adding a constant value to all of them, which doesn't change the games. For example, Prisoner's Dilemma and Too Many Cooks both have $α < 0, β > 0, γ < 0, δ > 0$. A Prisoner's Dilemma also has $α + β < 0$ while Too Many Cooks has $α + β > 0$.

So what happens if we start thinking about these games in terms of $α, β, γ, δ$ instead? Does this give us useful insights? I don't know.

Of course, for these numbers to point at one of the games studied in this post, we must have $α - β = γ - δ$. I think if you relax that constraint, you start looking into games slightly more general than these. But I haven't thought about it too hard.

  1. My use of the phrase comes from Ellickson's Order Without Law. Part of why I'm writing this is to help clarify my thinking about that book. I don't mean to imply anything in particular by it, I just like the ring of it better than alternatives like "welfare maximizing". 

  2. Calling them your "opponent" assumes a level of antagonism that may not be present. 

Posted on 04 July 2020 | Comments

Short essays on various things I've watched

Sometimes I write things in places that aren't here. Sometimes I think those things are worth preserving. Some of those things follow, with minor editing, mostly on the subject of various movies that I've watched. Also two stage musicals, one TV show, one short story, and one music video. They were written over the past four years, so I can't necessarily talk intelligently about these things any more. When I name a thing, spoilers generally follow.

I don't really get The Hurt Locker

It seemed like it was going to be a film about a cowboy with a death wish, who should be removed from duty but isn't because of institutional dysfunction in the US army. Or something along those lines.

Instead it turned out to be… a film about a cowboy with a death wish?

Like there's that bit where he's just been a fucking idiot who could have gotten everybody killed

(not very specific so far)

And someone who I got the impression outranks him comes up to him like «oh you seem like hot shit. Just how hot shit are you? Yeah that's pretty hot shit» in a tone that I swear is subtextually «I'm about to rip you a new one»

and then the scene just ends. No one gets ripped a new anything.


His team mates realise how dangerous he is to be around. But they don't do anything about it, just get pissed at him. And also he's supposedly defused almost a thousand bombs. There's tension there, a question of how he hasn't died yet (did he used to be good at his job and recently something happened to make him so reckless?) but the film doesn't acknowledge it, it doesn't seem to think "being a fucking dangerous idiot cowboy" and "successfully defusing almost a thousand bombs" are at all incompatible?

This was a really well regarded movie. It won six Oscars, including best picture, best screenplay and best director. Part of me would be inclined to chalk that up to politics, but the movie isn't even especially political from what I could tell; I'm pretty sure it could have been made both more political and a better movie. So I don't think that's a complete explanation.

I suspect all the praise came for reasons that I missed either because I was distracted by this stuff or because I'm just not into the kind of movie it was trying to be or something. But that's really not very specific.

The Jungle Book (2016) is surprisingly pro-human?

I don't think it was intended, and I might be putting too much of myself into it. But like-

Right up until the climax, tool use is shown as almost unambiguously good. Mowgli bangs rocks together to get a sharp stone to cut vines to make rope so he can climb down a cliff face and knock honeycombs with a stick, armored in clothes he makes. He collects honey on an unprecedented scale. Then he saves an elephant from a pit.

(The music to that scene is "The Bare Necessities", a song about how you should just take life easy and all your needs will be met. Everything you need is right within your reach. The implication seems to be that "using tools to extend your reach" is just the obvious, natural, path-of-least-resistance thing to do?)

And the counterpoint isn't really there, though there's plenty of opportunity. Shere Khan says man isn't allowed in the jungle, but doesn't talk about why. Akela and Bagheera tell Mowgli to act more like a wolf and less like a man, his "tricks" are not the wolf way, but don't say what the problems are with manhood. Kaa shows fire being dangerous, but then with more realism shows fire being used to fight off Shere Khan and save Mowgli's life. Mowgli refuses to give fire to King Louie but doesn't explain his refusal. The elephant could have fallen into a human trap, but instead it was apparently just a pit. And the bees?

From the bees' perspective, I imagine Mowgli is probably terrifying. He knocks down honeycomb after honeycomb and he just keeps going. He's an existential threat to their society and there's nothing they can do about it, though they throw their lives away trying to stop him. I imagine.

But the movie doesn't show that or invite us to think about it. In a world where the animals all talk to each other, the bees are silent. When they sting, it's to teach us about Baloo's trickery and Mowgli's cleverness. They might as well be thistles.

The film has so many opportunities to show or at least tell us that man is destructive, man will kill everyone in the jungle. But it mostly doesn't seem to take them.

So when Mowgli goes to steal fire - well, we know what's going to happen, but only because we already know fire. The movie hasn't given us any particular reason to be scared of it. Also, the mythical associations of stealing fire are fairly pro-fire.

And yes, the fire spreads and Mowgli throws his torch away. But then he fights using some other things humans excel at: coalition-building, getting other people to fight for you, and making a wolf think you're part of its family.

(Here I do think I'm putting in too much of myself. I think the climax is meant to be about winning by combing animal things and man things. The coalition-building is introduced with the law of the jungle, "the strength of the pack is the wolf and the strength of the wolf is the pack". Then Bagheera says "fight him like a man". Then Mowgli says "dead tree" calling back to what Bagheera taught him in the beginning. And in the end Mowgli stays in the jungle. I think that's what is going for, but it's not what I got from it.)

And the fire is put out without too much trouble (when the elephants decide "actually, it would be better if this river went in this direction" and do some engineering works, not done by humans but very human of them), and we haven't actually seen it kill anyone.

So the film roughly seems to be about how the things that separate man from the animals are actually really great. And that's… not what I was expecting.

Review: The Book of Mormon

From memory, Team America was funny and it had decent music. But what made it so great was that while parodying action films, it still managed to be just legitimately good as an action film.

The Book of Mormon was funny and it had decent music.

How many levels of irony is Goodbye Mr A on?

The lyrics are overtly critical of Mr A, but subtextually feel critical of the singers, who dislike him for exposing their flaws to themselves. ("You claim science ain't magic and expect me to buy it"; "you promised you would love us but you knew too much" "so busy showing me where I'm wrong, you forgot to switch your feelings on". And like, "if love is subtraction, your number is up / your love is a fraction, it's not adding up" - I could believe that was written seriously, but I hope not.) And the first part of the video clearly makes out the Hoosiers to be the villains, partly the whole "kidnapping a superhero" thing and partly idk, they just look subtly villainous. Possibly related to camera angles.

But then they go on to take his place, and as far as I can see they do it fine? We only see two of their deeds, but they rescue a cat and stop a burglary. Wikipedia says they fight crime as incompetent antiheros, but I'm not seeing the incompetence. And they become immensely popular.

If we're meant to see them as incompetent, then it reads to me like a criticism of society-at-large; a claim that the singers' narcissism is widespread. But if we're not, then…?

Also relevant: if it's critical of society, or even just the singers, then by extension it's defensive of Mr A. And if Mr A is meant to be identified with the comic book character, it becomes a defense of objectivism, which feels implausible on priors to me. (But maybe it was more likely in 1997? Wizard's First Rule only came out a few years before, I think that was objectivist? By reputation that wikipedia doesn't immediately confirm.)

I've been idly wondering this for over ten years. Today I got around to actually rewatching/relistening to it. It didn't help much.

Superhero origin stories: Matilda and Wonder Woman

Wonder Woman has the overt message that ordinary people can't do very much (though not literally nothing). You need someone with super powers to beat up the villain and then everything will be fine. The love interest keeps trying to tell the hero that things are just not that simple, and he keeps getting proved wrong. This message is undercut by the historical realities of WWI.

Matilda has the overt message that ordinary people, even young children, can save the day if they stand up to villainy. Before she gets powers, Matilda tells us that even young kids can do something to make things right, and she turns her dad's hair green and glues his hat on his head. In the climax, the kids revolt against Miss Trunchbull, saying "she can't put us all in chokey". This message is undercut by the fact that all of this is completely ineffectual. Matilda's dad doesn't become any less abusive or criminal. Trunchbull can and does put them all in chokey. What saves the day is someone using super powers to scare Miss Truchbull into running away, and then someone using the Russian mafia to scare the dad into running away.

Face/Off as a retelling of the Gospel narrative

Admittedly I haven't really read the gospels, so maybe I'm completely off base here. But consider:

God (signified by Sean Archer) takes on the form of a sinner (signified by Castor Troy), in order to redeem the sinners. He's pretty successful. Most notably, when Sasha dies, it comes from acting as a force for good, sacrificing herself to save an innocent. She tells Archer not to let her son grow up like him. And so her son (named Adam, signifying Man) grows up in the house of God.

Meanwhile, the Devil (also signified by Castor Troy) attempts to seduce God's followers (Archer's family and colleagues) off of the righteous path, but he's ultimately unsuccessful. People learn to see past outward appearances to what really matters: we know him by his works (the story of his date with Eve). Evil contains the seeds of its own downfall, and Jamie stabs him with the butterfly knife he gave her.

Assorted other thoughts:

"Sean" is derived from John, meaning "God is gracious". Arrows are of course both straight and narrow, signifying the path of God; "Archer" doesn't just follow the path of God, he directs it.

Meanwhile, "Castor" is cognate to (or at least sounds a bit like) "caster", as in a witch; and "Troy" refers to the Trojan war, signifying violence and trickery. Castor's brother is named "Pollux", a decidedly ugly name that sounds a lot like "pox", the plague. According to Wikipedia, Castor and Pollux (brothers of Helen of Troy) spent some time being venerated with Christian saints; and one of them may even have been the basis of John (Sean) the Apostle.

"Adam", as I said before, signifies Man; Archer's daughter is named "Jamie", a gender-neutral name, to ensure we know that God loves his sons and daughters and nonbinary children equally.

Archer's blood type is O-, the "purest" blood type (i.e. fewest antigens in the cells), which gives to all. Troy's is AB (unspecified, so probably AB+), the most "contaminated" type, which takes from all and never gives back. (House has used this same metaphor.)

Canonically, I think Jesus is supposed to get crucified and stabbed with the holy lance. The film has both of these elements, but gets it a bit confused; while Troy is in the classic raised-arms position, Archer shoots him with a harpoon gun.

The final confrontation, in which Good defeats Evil once and for all, is kicked off in a church; symbolically, in The Church, the hearts of God's followers. The battle startles the doves in the church, but we never see a dove die: the peace in our hearts may be disturbed, but will always return.

It's hard to miss the symbolism of Archer breaking out of a hellish prison.1 (The part of Thomas is briefly played by Jamie, when she shoots him.) The prison being named Erewhon ("nowhere" spelled almost backwards) probably intends to take a position on some Christian point of contention that I'm not familiar with.

You never see the butterfly knife being opened or closed. The cuts are kind of jarring, as if to draw attention to this fact. I think it's trying to tell us something about temptation: we may think we can stray a little, and turn back before we go too far down the wrong path. But by the time we know we're going too far, we've already gone too far. (cf)

This is perhaps the most overtly anti-makeup film I remember seeing ever.

Don't think too hard about the fact that Archer's wife, Adam's adoptive mother, is named Eve.

Hot take: the Saw franchise as exploration of meritocracy

Members of the underclass are given tests. If they pass, they get a new life. Thus, the meritorious are elevated to the elite. Meritocracy!

(Once you've survived a Saw trap, food tastes nicer, your addiction vanishes, you quit the job you hate and get the one you've always dreamed of, puppies and rainbows and so on. I'm not sure how much this is actually shown in the series, but it's what Jigsaw is going for, and the series is much more interesting if it works.)

This would be pretty dumb, but not obviously dumber than some of the other stuff out there.2

Review: A Yellow Rose

(Someone on reddit was wondering whether books are worth it. Someone recommended Borges to them, and in particular this story. The version I initially read was a slightly different translation to the one linked. I didn't like the story, but it has the virtue of being considerably shorter than this review.)

I gotta admit, this does not sell Borges to me.

Fair warning, this is gonna come across as fairly dismissive/derisive/antagonistic. I could try to describe my reaction without coming across like that, but it would be hard to do, and especially hard to do honestly.

To me it feels like a short piece of philosophy ("it lay within… the world's contents"), wrapped in some prose.

I don't feel like the prose adds anything to the philosophy. Like for example I don't see it as an illustration of the thing the philosophy is meant to teach us. (If anything, the philosophy is trying to teach us that the prose can't illustrate it.) So if that's all it is, then I'd rather the philosophy be presented by itself, so that I could read it more easily, and comment on it without feeling like someone's going to tell me I'm missing the point.

Commenting on the prose by itself… stylistically, it uses too many words in too few sentences, and feels like a chore to read.

One thing that strikes me is that for Borges to write this story, he must have had the same epiphany as Marino and (maybe) Homer and Dante. Since those are the only people mentioned in relation to the epiphany, it seems that Borges is comparing himself to the three of them. Is that deliberate?

Another thing that strikes me is that if "the motionless and silent act that took place that afternoon" is meant to be "Marino having an epiphany", then we're told that: it happened "on the eve of his death"; that it was "the last thing that happened in his life"; but also that "it was neither that afternoon nor the next that Giambattista Marino died". This seems inconsistent, and given the focus on it, I assume it was intentional. But I have no idea why.

I don't know what the poetry means, or whether the meaning is important.3

I could well be missing something here, and whether I am or not I suspect it's partly just a question of taste.

Captain Fantastic is such a good movie

It's an exploration of culture. Ben and Leslie created their own culture and raised their kids in it. The kids have almost no exposure to American culture. Culture Fantastic isn't perfect, and Ben isn't a flawless exemplar of it. There are some things it does better than American culture, and other things it does worse. But it's fine.

But when Culture Fantastic meets American culture, they clash. American kids seem weak, uneducated, not formidable. Fantastic kids seem - weird, socially unskilled, vaguely autistic.

And this should be fine, but none of the adults really respect the other culture. Ben's not especially rude, he knows he's on foreign ground and tries to be respectful to people's faces except when it matters, he just sometimes fails because he's human and hurting. It probably helps that he was probably raised in American culture, too. But he clearly does not approve of the culture. Meanwhile, Leslie's parents give her a Christian funeral, despite her own wishes. Two couples threaten to have the Fantastic kids taken away, because in their eyes, not being raised in American culture means they'll never survive in the real world - by which they mean American culture.

(And no one particularly respects the kids' wishes. That's not a part of either culture. They mostly like Culture Fantastic and want to stay in it, but if they didn't… I think Ben would come around and let them leave, but not easily.)

And it's particularly resonant for me because damn it, I like Culture Fantastic. He's raising those kids to be much like I want to be, even if it causes friction with the world around me.

Sometimes I read things that feel like the author thinks my culture is bad and should be destroyed. My culture is not bad. You do not get to take it away from me.

Captain Fantastic comes down firmly on the side of "not bad", and vaguely-noncomitally on the side of "do not take away".

Hacksaw Ridge is three different films

Act 1: Stick to your principles.
Act 2: War is hell.
Act 3: The War Prayer, but unironically.

Once Upon a Time has terrible ethics

(Spoilers only up to season 2.)

Snow is way too forgiving of Regina. Fine, characters are allowed to be silly in boring ways. Snow finally makes the right decision and kills Cora, go her! I don't even mind that she feels guilty about it, though I'd much prefer if that was combined with "but I know it was the right thing to do anyway".

But the world of the show is like, nope! That was objectively wrong of you! Your heart is black now! Fuck you for wanting to protect your family, fuck consequentialism.

And because of her black heart, supposedly she's now inevitably going to go bad and tear apart her family. (That came from Regina, who may have been making it up, but… I'd give at least 3:1 that it pays off.4)

Other people are allowed to kill, either without their hearts getting black spots or at least without becoming the sort of people that destroy the people they love. But Snow gets held to a higher standard than everyone else, apparently because she's So Good?

So like, if you're the type of person who kills people, you're allowed to kill people. If you're not the type of person who kills people, but you decide that killing someone is the right thing to do anyway, you're not allowed, and if you do it you become Objectively Bad, unlike people for whom killing is just a thing they do.

And! I don't remember if we've ever definitely seen Snow kill someone yet (at the very least she shot someone with an arrow, it's possible he survived), but she led an army to overthrow two kingdoms! At least one of those kingdoms was her suggestion! People die in war! Even if she hasn't directly killed anyone, she has been responsible for lots of deaths! But those don't count because ???

(And it's not because it was indirect, because Cora was indirect too.)

(It could be because she did it with the candle, but that's not how the narrative is pushing.)

It's like only named characters matter. Not just in the sense that the narrative focuses on them, not just in the sense that the named characters only care about other named characters, those both feel like common mistakes that fiction makes. But as a fundamental moral truth of the universe of the show, which feels less common and more annoying.

There's a related thing where… we met Regina and she's evil, and then we see her backstory and she becomes sympathetic even when she's still evil. (And the same with Rumple, and to some extent with Cora.)

But when she's the Evil Queen she tells her people to kill an entire village and they just do it. Regina has sympathetic motives, but her enforcers are anonymous. We don't see whether they're doing it out of fear, sadism, loyalty, a twisted sense of the greater good, whatever. We've been given a glimpse into the lives of the subjects who hate her, but not the ones who serve her.

(Stanley Glass is an admitted exception, and there was her father, and a brief conversation with a henchman of "I hired you even though you were drunk" or something, but there's really not much.)

Her followers don't get backstories or redemption narratives or people refusing to kill them because There Must Be Another Way. They just get killed, and the narrative doesn't care.

(Compete aside, Regina blames Snow for turning her evil, while Snow credits Regina for turning her good. Neat twist on "I made you? You made me first.")

  1. Confession: I totally missed that when I first wrote this. 

  2. I haven't actually seen 3%. From memory, this was inspired by a reddit discussion where someone said good things about the show as an exploration of meritocracy. Based on wikipedia, I disagree. 

  3. In the version I initially read, the poetry was (I assume) untranslated from the original Spanish. But even translated, I still don't know what it means. 

  4. I stopped watching part-way through season 3, so I don't know if it did. 

Posted on 12 June 2020 | Comments


This is a toy language I've been designing, or at least implementing, for about a year.

It's another take on "Haskell with a Lisp syntax". I'm aware of prior art: Hackett, Axel and Liskell. I haven't looked closely at any of them, because doing that seemed like it might make me less likely to keep working on Haskenthetical.

I call it "toy" in the sense of… well, right now it's a toy like an RC plane is a toy. But my vague goal is to make it into a toy like the Wright flyer would have been a toy if it had been built in 2003. I'd like to get it, say, 80% of the way to being a "real" language. I have no intention or expectation of taking it the other 800% of the way. I have no intention or expectation of taking on responsibility-according-to-me here.

(And honestly, even the first 80% is super ambitious. I don't expect to get that far, it would just be nice to. If I never touch the project again after this, I won't consider my time wasted.)

If you're curious, the source code is available here1. If you have stack, stack build --fast should suffice to build and then stack exec -- haskenthe -h to run the executable.

So far I've implemented basic Hindley-Milner type inference, the ability to define new types, and pattern matching. The only built-in types are ->, Float (which is a Haskell Double under the hood), and String (a Haskell Text). Builtin functions are +, - and * (all of type -> Float (-> Float Float) and err! (of type -> String $a). I don't yet have division because I haven't decided how I want to handle division by 0. I don't expect I'll come up with anything particularly exciting there, but also I haven't felt the need for division yet.

(Actually, the types Either and , are also builtin, along with functions to work with them: constructors Left, Right and ,, and destructors either, car and cdr. But that's just because I added them before I added custom types, and I haven't bothered to remove them yet.)

I have a long list of things I'd like to include in future. Probably the ones that interest me most right now are macros2, extensible records, and compilation. I don't know how macros and compilation are supposed to fit together, but I have some vague ideas in my head. And clearly it's been done in the past, so I assume I can find out how.

Other things include IO, comments, imports, exhaustiveness checking and FFI. Maybe typeclasses, but I'm curious whether macros and lazy evaluation can make those less useful. Maybe lazy evaluation, but I'm on the fence about that.

Open variants (the sum-type version of extensible records) might be on that list too, but I'm not familiar with any prior uses of them so I guess there's probably something that makes them difficult? Maybe they're just not very ergonomic in actual use, in which case cool, they'll fit right in.

What I have so far isn't very interesting as a language, but it might be interesting enough to be worth writing about.

General language overview

There are seven types of expression right now. I'm going to assume you'll understand most of them just by name.

Literal values are Strings and Floats. Variables are bare words with very few restrictions. (Right now they can contain any combination of printable characters other than whitespace, ", ( and ); except that they can't start with something that would parse as a Float.) Lambdas have the syntax (λ (arg1 arg2 ...) expr), or if there's only one argument, (λ arg expr) is also acceptable3. Function calls, unsurprisingly, look like (func arg1 arg2 ...), where all of those are expressions.

There are two forms of let-binding. Syntactically they're similar: ([let/letrec] ((name1 expr1) (name2 expr2) ...) body-expr). let is nonrecursive, so that exprN can only refer to nameM for M < N. (You can reuse names, e.g. (let ((a 1) (a (+ 1 a))) a) gives 2.) letrec is recursive, so that any exprN can refer to any nameM. (I haven't implemented any checks to forbid you from reusing names here, but probably only the first or last use of a name would have any effect.)

Finally there's pattern binding with (if~ val pat if-match else). Pattern pat can be a literal string or float, or a variable name prefixed with $, or a constructor name possibly with other patterns as arguments. If the value matches the pattern, if-match gets evaluated, with any variables bound to the relevant parts of the pattern. Otherwise, else-match gets evaluated. For example:

(if~ 0 0 "zero" "nonzero") # "zero"
(if~ 1 0 "zero" "nonzero") # "nonzero"
(if~ (Just 3) Nothing "Nothing" "Just") # "Just"
(if~ (Just 3) (Just $x) x 0) # 3
(if~ Nothing (Just $x) x 0) # 3
(if~ (, 1 2) (, $a $b) (+ a b) (err! "impossible")) # 3

(I'm leaning towards # for comments when I get around to that. Also, I'm assuming here that the Maybe type has been defined.)

There's no equivalent of a case statement that matches the same thing against multiple patterns. For that you'd need nested if~, and there's no exhaustiveness checking (i.e. nothing that would say "you missed a possibility").

This is all typechecked, so that you get a compilation error if you try to multiply a string by a float or whatever.

You can add explicit type declarations to expressions, or to parts of patterns or to the variables in let or letrec bindings.4 A type declaration looks like (: expr type) and a type is either a bare name for a type constructor, or a $name for a type variable, or a type application like e.g. Maybe $a or -> (Maybe Float) String. The root of a type application has to be a constructor, not a variable.

(: 0 Float)
(: (λ x (+ x 1)) (-> Float Float))
(if~ (Just 3) (Just (: $x Float)) x 0)
(if~ (Just 3) (: (Just $x) (Maybe Float)) x 0)
(let (((: id (-> $a $a)) (λ x x))) (id 3))

If a type declaration is more specific than it could be, it constrains the type of the expression; if it's more general, that's an error5:

(let (((: x (Maybe $a)) Nothing)) x) # valid, (Maybe $a) is the inferred type
(let (((: x (Maybe Float)) Nothing)) x) # valid, Float is more specific than $a
(let (((: x (Maybe Float)) (Just "foo"))) x) # not valid, Float /= String
(let (((: x (Maybe $a)) (Just "foo"))) x) # not valid, $a is more general than String

Apart from expressions, the statements I've implemented so far are def for global definitions, and type to declare a new type.

Currently all the def statements get pulled together and brought into a single letrec around the top-level expression. (Each program currently is required to have exactly one of those.) So

(def foo ...)
(foo bar)
(def bar ...)

is sugar for

(letrec ((foo ...)
         (bar ...))
  (foo bar))

Type declaration introduces new types, constructors, and eliminator functions. For example,

(type (Maybe $a) Nothing (Just $a))

introduces three values into the environment: Nothing of type Maybe $a; Just of type (-> $a (Maybe $a)); and elim-Maybe of type (-> $a (-> (-> $b $a) (-> (Maybe $b) $a)))6. This last is the standard Haskell maybe function, but you get one for free whenever you declare a type.

Other type declarations would look like:

(type Bool False True)
(type (List $a) Nil (Cons $a (List $a)))
(type (, $a $b) (, $a $b))
(type (Either $a $b) (Left $a) (Right $b))
(type Unit Unit)
(type (Proxy $a) Proxy)
(type Void)

(I'm tempted to name the standard unit type and its value instead. That's a bad name for the type, which is not an empty set, but it's a decent name for the value. It would be a fine name for the void type, but that type isn't useful enough to deserve such a concise name.)

letrec has something that's either a bug or, generously, a "missing feature that looks an awful lot like a bug when you don't realize that you're expecting it to be there". The way the typechecking works, inside the bindings for letrec, you can only use each bound variable at a single type. So you can't do

(letrec ((id (λ x
               (let ((a (id 3))
                     (b (id "foo")))

because that uses id at types -> Float Float and -> String String. (Never mind that if you could it would be an infinite loop.) Haskell has this limitation too, though I'm not sure I've ever run into it naturally; I couldn't think of a non-contrived example.

In Haskenthetical, this applies across an entire binding group. So you also can't do this:

(letrec ((id (λ x x)))
         (some-float (id 3))
         (some-str (id "foo")))

But that example would work if you translated it to Haskell. What gives?

Well, since id doesn't depend on some-float or some-str, you could easily rewrite that example as

(letrec ((id (λ x x))))
  (letrec ((some-float (id 3))
           (some-str (id "foo")))

And it turns out that Haskell just does that transformation for you automatically. It figures out what depends on what and groups them in such a way as to impose the fewest possible restrictions. If you make that impossible by adding some contrived mutual references, you can make Haskell fail in the same way:

let id_ x = const (const x someFloat) someStr
    someFloat = id_ (3 :: Int)
    someStr = id_ ("foo" :: String)
in ...
-- error: Couldn't match type ‘[Char]’ with ‘Int’

(You actually only need to reference one of someFloat or someStr, because once id_ is used at a specific type, it no longer generalizes to a -> a in the body of the let.)

I haven't implemented this in Haskenthetical yet.


I don't think there's anything particularly exciting about the implementation, if you're familiar with such matters. But for those who aren't, and who want to hear about them from me, read on.

I parse the input text into a list of syntax trees using Megaparsec. The syntax tree only knows about a few types of token:

data SyntaxTree
  = STString Text
  | STFloat Double
  | STBare Text
  | STTree [SyntaxTree]

Then I parse each tree into a statement (or expression, but that's just a type of statement) by recognizing specific STBare values (at the head of an STTree) as needing special handling and passing everything else through to "assume this is a function getting called".

Typechecking is Hindley-Milner. When I wrote that essay, I said I didn't know how to implement HM typechecking. I have some idea now, and would describe it vaguely like this:

Recurse down the parse tree. At each step there are a few relevant types that you get to say "unify" with each other, roughly meaning "these are two different ways of writing the same type". Sometimes you look those types up in the environment, sometimes you just generate fresh type variables, and sometimes you generate fresh type variables and then add them to the environment. But as you go, you're building up a big list of constraints, pairs of types that unify. Also, each node gets a specific type assigned to it, which will generally be placed in a constraint. This stage is called "unification". For example, if you see the function call (foo bar), you'll recurse down to get types t1 for foo and t2 for bar, and you'll generate a fresh type variable t3 for the result. Then you'll say that t1 unifies with -> t2 t3.

When you've finished, you loop back over the list of constraints, and build up a substitution. Any time you see "this type variable should be the same as this other type", you add that to the substitution, and you make that substitution in the remaining constraints before looking at them. If you see two types that should be the same but the non-variable parts of them don't match up, that indicates a type error in the program. This stage is called "solving". For example, if we have the constraint that types -> $a String and -> (Maybe Float) String unify, then whenever we see type variable $a in future we can replace it with Maybe Float; if the second one had instead been -> (Maybe Float) Float, then those don't match up and the program doesn't typecheck.

In the end, you apply your substitution to the type of the program as a whole that you got from unification, and that's the ultimate type inferred for the program. If there are any type variables left, the program doesn't fix them. (An example of this would be if the program was simply Nothing.)

Of course it's more complicated than that. For example, let and letrec need you to run solving during the unification phase. Also, declared types need to be treated specially so that you can reject if the user declares Just 3 as Maybe $a.

Aside, a thing I don't fully understand: I haven't tried timing it, but this implementation looks to me like it's something like O(n²) in the size of the input. It's supposed to be roughly linear. I'm not sure if I'm missing something or if there's just a more efficient algorithm.

Anyway, that's roughly how I do it. I take this approach mostly from Write You a Haskell (notably chapter 7, section "constraint generation"7, but also other chapters were useful for other parts of Haskenthetical). But I had to figure out how to handle letrec myself, because the language implemented there uses fix instead8. I also took a lot from Typing Haskell in Haskell, especially pattern matching. (I hadn't discovered it by the time I implemented letrec.) Neither source implements explicit type declarations9, so I had to figure out those for myself too. I'm not convinced I did a very good job.

Finally, evaluation: for the most part that's fairly straightforward. For example, when we evaluate a variable, we look up its value in the environment. When we evaluate a let, we evaluate something, add it to the environment under the relevant name, and go on to the next thing. There are a few types of values that we need only when evaluating:

  • A closure is the thing that gets returned when we evaluate a λ expression. It captures a snapshot of the current environment, the name of the argument, and the body expression. If a λ has multiple arguments, it returns nested closures.
  • A builtin is a regular Haskell function of type Val -> Either Text Val (plus a name to distinguish them). Builtins and closures are ultimately the only things that can be called as functions.
  • A Thunk is an unevaluated expression, with a copy of its environment. They get evaluated as soon as anything returns them. Currently they're used in two places. letrec needs them because we can't evaluate bindings before adding them to the environment or we'd get infinite recursion. Type eliminators are builtin values, but the Val they return is a Thunk (with empty environment) to avoid the Haskell file Env.hs from having to reference Eval.hs.
  • A tag is just a symbol (a Haskell Text under the hood) paired with a list of other values. Constructors wrap their arguments in a tag, and eliminators and pattern matching compare those symbols. There's no way to look at or manipulate the symbol directly in Haskenthetical, but I'd be curious to explore that direction.

I'll mention a couple other things that might be of note. These probably require more background knowledge of Haskell to make sense.

Firstly: I have the data type

data Pass = Parsed | Typechecked
type Ps = 'Parsed
type Tc = 'Typechecked

which some types use as a parameter, like

data TVar (p :: Pass) = TV !(XTV p) Name
type family XTV (p :: Pass)
type instance XTV Ps = NoExt -- data NoExt = NoExt
type instance XTV Tc = Kind  -- the kind of a Haskenthetical type

This lets us use a slightly different type TVar in different parts of the codebase. When we've merely parsed the program, we have no way to tell the kind of a type variable, so we have NoExt there. When it's been typechecked, the kind is known, so we include it. If there was a pass in which type variables simply shouldn't exist, we could write

type instance XTV NoTVarPass = Void

and we wouldn't be able to use a TVar in that pass at all.

This technique is called "trees that grow", and I copied it directly from GHC. I'm not currently using it everywhere I could, for no principled reason that I can recall. There's a chance it'll be more trouble than it's worth at the level I'm working at. An annoying thing about it is that you can't use a regular deriving clause, so I have

deriving instance Eq (TVar Ps)
deriving instance Eq (TVar Tc)
deriving instance Show (TVar Ps)
deriving instance Show (TVar Tc)
deriving instance Ord (TVar Ps)
deriving instance Ord (TVar Tc)

which kind of sucks10.

Secondly: builtin functions are kind of a pain to write manually. For example, either was previously defined as Builtin $ Builtin' "either" heither where

rbb :: Name -> (Val -> Either Text Val) -> Either Text Val
rbb name func = Right $ Builtin $ Builtin' name func

heither :: Val -> Either Text Val
heither l = rbb "either.1" $ \r -> rbb "either.2" $ \case
  Tag "Left" [v] -> call l v
  Tag "Right" [v] -> call r v
  _ -> Left "final argument of either must be an Either"

(Builtin is a constructor of type Val containing a Builtin, and Builtin' is the only constructor of type Builtin. These names do not spark joy.)

It works, but it always felt like I should be able to do better. I spent a while trying to figure that out and now the value is simply heither where

heither :: Val
heither = mkBuiltinUnsafe $ do
  l <- getArg "either"
  r <- getArg "either.1"
  e <- getArg "either.2"
  pure $ case e of
    Tag "Left" [v] -> call l v
    Tag "Right" [v] -> call r v
    _ -> Left "final argument of either must be an Either"

I dunno if this is much better, honestly, but there we are. It needs ApplicativeDo; I never managed to either figure out a Monad that could do this, or prove that no such monad exists. (There's no Monad instance for the specific type that I use to implement this, because to write join for that monad you'd need to be able to extract the inner [w] from ([w], r -> ([w], r -> a)) without having an r to pass to the outer function, and that's not a thing that even makes sense to be able to do11. But there might be a different type that enables what I'm trying to do and does admit a Monad instance.)

So that's where it's at right now. Feel free to point out ways that it sucks, although not-sucking isn't the point. I'm also interested in pointers to how I might implement some of the things on my future list (I'm aware of Implementing a JIT Compiled Language with Haskell and LLVM), or other cool things I may like to put on that list, or even things you might happen to like about Haskenthetical.

  1. I don't think I'll try for hygienic macros, despite recent events. My only experience with those has been in the small amount of Racket I've worked on, and I didn't manage to get my head around them. 

  2. I want unicode in languages to be more mainstream. There are good reasons why it's not, but at least some of those are chicken-egg problems. For example, most people aren't set up to easily write in unicode, but that's partly because most people never have to. Fortunately, I'm in a position where I can ignore all the good reasons not to do something. 

  3. While writing this I realized that while you can attach them to λ params as well, those currently aren't typechecked at all. 

  4. But the error doesn't seem to work for type declarations in pattern bindings. That's another thing I noticed while writing this. 

  5. Gee, you ever think maybe there's a reason Haskell doesn't use Lisp syntax? I feel like Lisp syntax kind of needs variadic applications to be readable, but Haskell semantics don't go well with those. I'm hoping to solve this disconnect with macros. 

  6. Be aware that the implementation of let on that page doesn't work. It's been fixed in the repository, but not on the website. 

  7. It's possible to implement fix in Haskenthetical without letrec, so maybe I didn't need to figure it out. I could have just waited until I get macros and then implemented letrec in terms of fix

  8. THIH does have them for binding groups (like found in let and at the top level), but not expressions. That made me wonder if those weren't in the Haskell 98 report, like how Elm doesn't have them. But they're there: §3.16, "Expression Type-Signatures". 

  9. If it annoys me too much, I can enable UndecidableInstances and do

    deriving instance Eq (XTV p) => Eq (TVar p)
    deriving instance Show (XTV p) => Show (TVar p)
    deriving instance Ord (XTV p) => Ord (TVar p)

  10. You could actually get somewhere by passing in undefined, as long as the inner [w] doesn't depend on the outer r and everyone involved is careful about strictness. I don't recommend this. 

Posted on 19 May 2020 | Comments

Chris Masterjohn on Coronavirus, Part 2

Update (May 8th 2020): Some small updates to the interferon and vitamin D sections.

Update (29th April 2020): I've made a significant update to the vitamin D section in response to version 3 of the report.

Background on this in part one.

General info: Interferon

Interferon is an antiviral that our immune system naturally releases when it detects invasion. But SARS-CoV and MERS-CoV (the virus that causes MERS) manage to avoid this response. The trigger is the presence of double-stranded RNA, which viruses need but our own cells don't. (We make single-stranded RNA from DNA, and then proteins from single-stranded RNA. We never need to copy our RNA. But viruses do need to copy their RNA, and apparently that means viral RNA needs to be double-stranded.) SARS-CoV and MERS-CoV hide their double-stranded RNA inside "double-membrane vesicles" to avoid detection.

Update May 8th 2020: I've seen multiple sources saying SARS-CoV-2 is a single-stranded RNA virus, and wikipedia says SARS-CoV and MERS-CoV are too. I'm a bit confused by this, and frustratingly Chris doesn't cite this part. From a quick google, the abstract of this paper sounds like a single-stranded RNA virus can still have double-stranded RNA, or produce double-stranded RNA at some point, or something? Note that Chris doesn't say that the viruses are double-stranded RNA viruses, just that they have double-stranded RNA.

They also have a bunch of other ways to limit production of interferon, and on top of that they limit the response to the interferon that does get produced.

This all sounds like "so we should ramp up interferon". But that's probably a bad idea. During early stages of the virus, interferon is suppressed, so the virus can replicate quickly. But when the infection is established, macrophages generate a lot of interferon, leading to a cytokine storm. In a mouse trial, deleting the genes for the main interferon receptor made the subjects very resistant to SARS-CoV, but very vulnerable to mouse hepatitis virus and influeza A, compared to normal mice. (0% died to a dose of SARS-CoV1 that killed 85% of normal mice; but 100% died to a dose of the other viruses that killed 10-20% of normal mice.)

(Question: "During the replication of the virus, macrophages are recruited to the lung". These are what release the interferon. What is a macrophage and what is recruiting them and why?)

We don't yet know that any of this applies to SARS-CoV-2 as well, but it seems likely. So high levels of interferon might be slightly helpful in early stages, but make later stages much worse. Thus, Chris recommends avoiding them.

(Question: LW user CellBioGuy is bullish on inhaled interferon pretreatment, which I take it means "inhaling interferon before the disease gets bad". Does this square with Chris' recommendations? It wouldn't surprise me if inhaling interferon increases your levels in the short term but not the long term, which is exactly what we want. On the other hand, he links a paper whose abstract says SARS-CoV-2 "is much more sensitive [than SARS-CoV] to type I interferon pretreatment", so maybe this is just a case of "generalizing from SARS-CoV didn't work here".)

On April 25, Chris wrote more about this on his mailing list. My own summary: in hamsters infected with SARS-CoV-22, eliminating the type 1 interferon response increased the amount of virus in various body parts including the lungs; but dramatically reduced lung damage. Chris notes that this is a bit weird, but takes it as supporting his hypothesis, and I'm inclined to agree.

Every-day optional add-on: Garlic or stabilized allicin

Allicin is the thing we care about here, garlic is just one way to get it. Garlic doesn't actually contain any, though. Garlic contains alliin. Alliin is converted to allicin when the garlic is crushed and left at room temperature in the open air for ten minutes.

(Who decided to name those things with an edit distance of one, and how do we stop them from naming anything ever again?)

Alternatively, we can mix garlic powder with water, and let that sit at room temperature in the open air for ten minutes. That gives us a more reliable dose, since garlic cloves vary in size. Or we can just take stabilized allicin supplements, which is a still more reliable dose. Most garlic extract won't work, and "potential allicin" is unreliable. Meals containing garlic won't work, because allicin isn't robust to heat or acids.

180mg allicin daily makes people less likely to get colds. (I note that Chris here seems more confident than the authors of that review, who only found one study matching their inclusion criteria and who say at different times that the evidence is "poor-quality" and "moderate". Looking at the excluded studies, two seem worth noting. Both were excluded because they grouped colds and influenza together. Andrianova 2003 found that a specific brand of garlic tablets "allicor" reduced morbidity from acute respiratory diseases 1.7-fold. So that seems to support Chris' recommendation. Nantz 2012 used aged garlic extract, and found that it didn't reduce the incidence of (colds + flu) but did reduce the number and severity of symptoms. It's not clear to me whether aged garlic extract even contains allicin, but these results seem a little confusing to me whether it does or not.)

We also see antiviral effects in vitro against six different viruses: "herpes simples virus type 1, herpes simplex virus type 2, parainfluenza virus type 3, vaccinia virus, vesicular stomatitis virus, and human rhinovirus type 2". It seems to work by damaging the lipid envelope of those viruses. So it might also work against coronaviruses, which also have a lipid envelope.

Separately, allicin has antibacterial effects, which work by the same mechanism as zinc and copper inhibit SARS-CoV enzymes, so maybe it also inhibits those enzymes. (Note, this is not the same mechanism as how copper surfaces destroy SARS-CoV.) And it inhibits papain, so maybe it also inhibits papain-like proteases. (It sounds like papain-like protease 2 could be targeted by both these effects, and there are other papain-like proteases that could only get targeted by the second?)

Chris recommends this as an optional daily add-on, because although it's never been tested directly against coronaviruses, it's plausible and seems to be safe.

If you get sick, optional add-on: Echinacea

I've never heard of this. Apparently it's a herb. It "has been used for the common cold", which isn't saying much (what hasn't?) but the citation suggests it was also at least somewhat effective. But this seems to be the only evidence Chris provides that it has any positive effects at all; Wikipedia is skeptical.

The mechanism of its effect seems to be boosting the immune system, so we might worry that it does so by increasing interferon. But instead it seems to work by increasing "inducible nitric oxide synthase" (iNOS). That doesn't seem to be protective against contracting SARS, but in mice it helps protect against the long-term damage that SARS does to the lungs.

Chris thinks this is less important than all of the preceding compounds, because "there is no clear evidence echinacea will offer specific protection against COVID-19". He no longer thinks it's safe long-term (though he hasn't updated the in-depth section of his report to reflect that), so he recommends only taking it when you get sick.

Everyday optional add-on: Vitamin C

Vitamin C supplements seem to prevent colds, but not necessarily cure them. Intravenous vitamin C reduced mortality of acute respiratory distress syndrome by half; but none of those patients had SARS. Studies have shown conflicting effects of vitamin C on interferon levels, variously increasing and decreasing it.

Chris recommends getting enough vitamin C "to support normal immune function", which means taking supplements if and only if you don't get enough from your diet. He thinks the chance of it increasing interferon is too risky to take high doses.

In the first version of the report, Chris considered this an optional add-on. He no longer includes it in his list of recommendations, but the sections where he explained his recommendation are still in the report, unchanged. I'm not sure what's up with that.

Another thing that I'm not sure what's up with. In one section, Chris says: Normally we get cysteine by eating protein. NAC helps us get more cysteine into our cells, and the extra cysteine is good for the immune system and lung health. In another section, he says NAC "is a precursor to glutathione, which is critical for lung function." He's not very detailed about either of these claims, so I'm not sure how to put them together, and a quick glance at wikipedia doesn't really help. (There's totally room in my lack-of-understanding for them both to be true, it's just a bit weird that we have two different descriptions of its effects.)

Apart from being generally good for the lungs, the reason for recommending it - if he does recommend it - seems to be a single case where large amounts of NAC helped someone to recover from pneumonia caused by H1N1. There's no reason to think it might help prevent COVID-19, but it might help mitigate the damage.

Limit: Vitamin A

Vitamin A is vital to the immune system, and you shouldn't get deficient. But its active metabolite is all-trans retinoic acid. In many rat studies, that's increased ACE2: "in the heart of rats whose blood pressure was raised by constricting their aortas, in the heart and kidney of spontaneously hypertensive rats, in the kidney of rats with glomerular sclerosis (to much higher levels than even healthy control rats), and in rat tubular epithelial cells subject to hypoxia-repurfusion." Since that effect seems fairly consistent, there's a significant risk that increased levels of vitamin A would increase levels of ACE2, which (as discussed in part 1) seems likely bad. So Chris recommends getting only enough vitamin A to avoid deficiency.

(Question: how easy is it to avoid publication bias / confirmation bias / etc. here? If some people found situations where ACE2 didn't get raised by Vitamin A, would that result have been published and would Chris have found it? He does mention studies like that for vitamin D, so that's at least evidence for yes and yes.)

Supplement to a point: Vitamin D

Like vitamin A, vitamin D runs the risk of increasing ACE2 levels. We have three rat studies supporting this, two rat studies contradicting it, and one human study contradicting it.

The supporting rat studies show that: "Calcitriol, the active metabolite of vitamin D, increases ACE2 mRNA and protein in rat pulmonary microvascular endothelial cells treated with lipopolysaccharide, synergizes with diabetes to increase ACE2 protein in the renal tubules of rats, and in the brains of both hypertensive and healthy rats."

Of the contradicting evidence, the human study and one of the rat studies looked at serum ACE2, i.e. ACE2 which has been shed from cells and is circulating (I guess in the bloodstream). Serum ACE2 won't help the virus gain entry to cells, and might help protect against it. (No citation or explanation for this, but I guess the reasoning is that if the relevant part of the virus binds to serum ACE2, it won't subsequently be able to bind to ACE2 on the cell wall.) Serum ACE2 might not correlate with docked ACE2. Additionally, the rat study showed that vitamin D depletion had no effect on serum ACE2, but that doesn't mean an abundance of vitamin D would also have no effect; and the human study only showed a statistically significant difference in people with stage 5 kidney disease (with less severe kidney disease, the difference wasn't statistically significant; in healthy controls there was no difference).

The final rat study was looking at injury response of rat kidneys; the injury in question would normally increase ACE2 levels but vitamin D (or at least calcidiol, its partially activated metabolite) reduces that effect. But this seems to be caused by vitamin D making the kidneys more resilient to injury, not by directly suppressing ACE2. So it probably isn't relevant here.

Weighing up, vitimin D seems likely to increase ACE2, and in versions one and two of the report Chris recommended against supplementing it (but still against becoming deficient).

Update (April 29th 2020): The rest of this section is based on info added in version three of the report, which Chris released less than an hour after I published.

However, vitamin D may affect some of the factors that predict a severe or lethal case of COVID-19. The factors Chris mentions are: "low lymphocytes, a high ratio of neutrophils to CD8+ T cells, and high interleukin-6 (IL-6)." (These are all in the bad direction, i.e. low lymphocytes predicts worse disease progression.)

There's lots of confusing and somewhat contradictory studies here. Some of it is to do with CD4 T cell counts, which I'm not sure what the relevance is. But it seems there's at least some reason to think that vitamin D supplementation may sometimes increase lymphocyte counts and/or decrease the ratio of neutrophils to lymphocytes. (I guess that last is good for at least one of those factors whether it incseases the numerator or denominator, as long as it doesn't decrease CD8+ T cells.)

There's also lots of studies on the effect of vitamin D on IL-6 in specific contexts. "Middle-age and older adults": four studies, no effect. "Hemodialysis patents": four studies, no effect. "Obese and overweight patients": eight studies, no effect. "Diabetes": five studies, no effect. "Heart failure": reduction in one study, not in another. "Diabetic kidney disease": three studies, reduction in all. "Ventilator-associated pneumonia": one study, reduction. Notably, this last is most relevant to COVID-19.

Chris sums up: "A reasonable interpretation of these studies is that vitamin D does not affect IL-6 under conditions of chronic low-grade inflammation, but does lower IL-6 during acute and highly inflammatory conditions." (I note two reasons to be unsure of this. Firstly, that with this many conditions being tested, and so few positives, we should probably make sure we're not accidentally p-hacking. Secondly, that the the pneumonia study involved "intramuscular injection of 300,000 IU vitamin D" which may not tell us much about oral vitamin D supplements of less than 1% that amount.)

But we also have observational studies between vitamin D levels and COVID-19 severity. Chris has written about one of them publicly3; the other is here, though I think all the specific numbers in the report are from the first one. The summary is that given a case of COVID-19, case severity is strongly correlated with low levels of vitamin D in the bloodstream.

We haven't established causality here, and there's reason to think it goes the other way (inflammation depletes vitamin D).

We have no information on high levels, and Chris is worried about a U-shaped curve. I note a complexity penalty: if the correlation is monotonic in the region of the space that we've explored, expecting it to shift direction in the space we haven't explored requires some justification. Is there a reason it doesn't shift in the space we explored, or just bad luck?

Chris doesn't really address this. But the space we haven't explored here is [anything above the level associated with the lowest all-cause mortality] ("in all relevant studies except a single outlier that was only published as a conference abstract"). Plus, if we assume the patients mostly weren't supplementing vitamin D, then maybe the space we haven't explored here is also [anything above the highest natural levels humans seem to reach], and it's not too surprising if things shift direction in that space. So it's not as suspicious as it looks at first.

Anyway. Based on all this, Chris now recommends aiming for a specific level of vitamin D in the bloodstream: 30 ng/mL. He thinks the ideal is to test your levels every 2-4 weeks, but the home testing kit he links to seems to cost $65/test plus shipping, and he hasn't convinced me it's worth that much. Failing that, he recommends supplementing 1000 IU/day.

(I also note that, by recommending against both supplementation and deficiency, Chris was always implicitly predicting a U-shaped curve - just with the inflection point at a lower level. By comparison, different institutions generally recommend daily intakes around 600 IU/day. So supplementing 1000 IU, on top of whatever you get from food and sunlight, seems like give-or-take doubling his previous recommendation.)

Update May 8th 2020: Chris has now also written about the second study as well as two others that don't change his recommendation. We now have a little data about patients with high vitamin D levels - not enough to draw strong conclusions, but at least one of them did have a severe case. He says these studies make the link between vitamin D and severity seem to be more robust but weaker - that is, he's more confident that there is a link, but the link seems less strong than previously.

Limit: Calcium

And balance it with phosphorus. There's a complex system keeping them balanced in the blood, and they often have opposite effects. Notably, a high calcium:phosphorus ratio suppresses fibroblast growth factor 23, and a low calcium:phosphorus ratio increases it. FGF23 suppresses ACE2, so a high calcium:phosphorus ratio might increase ACE2. Chris recommends limiting calcium supplements to 1000 mg/day, and matching supplemental calcium with supplemental phosphorus 1:1.

Avoid: Pelargonium Sidoides

This is also called Umcka. I don't think I've previously heard of it under either name. It has many of the same components as elderberry, but only a small amount of caffeic acid. Caffeic acid was what gave us most reason to think elderberry would work, so Umcka seems inferior to elderberry.

Umcka does have some antiviral effects, including against HCoV-229E, but the mechanism for that is unclear. In cell cultures it increaes iNOS like echinacea; but also interferon, so if that generalizes it would be risky to take. It also increases neutrophil activity; those are part of the immune system, so naively we might think that was a good thing, but high neutrophil counts seem to make SARS worse.

So basically this seems like it offers nothing we can't get from elderberry and echinacea, and is too risky. So Chris recommends avoiding it.

Avoid: Honeybee Propolis

This hasn't been found to have significant direct antiviral properties against any viruses. It increases interferon responses in mice and chickens, so it may be somewhat helpful against some viruses, but it's too risky to use here. Chris recommends avoiding it.

Avoid: Monolaurin

Monolaurin seems to damage lipid envelopes, as well as the membrane of cells without cell walls, so SARS-CoV-2 is probably vulnerable to it. But so are human T cells. The risk of impairing those seems unacceptably high, and Chris recommends avoiding monolaurin.

There's an argument that it hurts only pathogens "because it is effective against yeast and the bacteria that cause bacterial vaginosis, but not against lactobacillus", but Chris thinks that's just because lactobacillus has a cell wall, and there are probiotics which don't have anything like that which would probably be vulnerable too.

(I think the idea that this would only hurt pathogens seems absurd, but what do I know. The citation for that clause doesn't seem to present the argument, so I'm not sure who's making it or if Chris is representing it accurately, but people have been to occasionally say absurd things.)

  1. In the first reference to this study, Chris says this was a dose of SARS-CoV-2. I think that was just a typo/brain fart. Though actually, it wasn't SARS-CoV either, it was MA15 which is a strain of SARS-CoV adapted to infect mice. Chris doesn't mention this, but to be fair the paper calls it SARS-CoV. I guess the idea is that it's a close enough model as we ever really get in this kind of work? An alternative is mice engineered to express human ACE2

  2. This was actual SARS-CoV-2. Unlike mouse ACE2, hamster ACE2 is similar enough to human ACE2 that they can get a severe infection going from the regular virus. 

  3. He also briefly discusses some of the studies mentioned above, in that post. 

Posted on 28 April 2020 | Comments

In my culture: the responsibilities of open source maintainers

If you maintain an open source project, what responsibilities do you have towards your users? Some recent drama (that I won't link to) reminded me that there are large differences in how people answer that question.

(In the drama in question, this wasn't the only thing at issue. But it was a relevant question.)

I thought I'd take a stab at describing my best guess as to how we answer it in my culture: in the culture that exists only in my mind, but that I think (and hope) many copies of me would implement, if we had that opportunity. (That culture doesn't attempt to exclude people who aren't copies of me, so it does need to be robust to attack. In my culture, we do not just assume in defiance of all evidence that everyone is friendly and trustworthy.)

Some of this will probably seem obvious to many readers, like "in my culture, murder is considered bad". Probably different bits to different readers. I'm interested in discovering which bits seem obvious to almost everyone, and which bits are controversial.

A lot of it follows from how I think about responsibility in general. But if you start to think "extending this response to this other situation, you'd get this, and that's a terrible idea"… in my culture, we don't immediately assume from this that I'm endorsing a terrible idea. Instead we check. Maybe I disagree that that's how it extends. Maybe I hadn't thought about this, and you can change my mind about the initial response. Maybe I just straightforwardly endorse a terrible idea: in that case, it'll be much easier to push back once you've gotten me to admit to it.

I do not intend, in this essay, to discuss whether any particular person or group is living up to the standards I outline here. I may do that in future. But how likely that is, and what that follow-up looks like, depends on whether the responses to this essay suggest a lot of people agree with my culture.

I think there are at least three important limitations to this essay. One is that I've never been a maintainer of an open source project that had users other than myself (that I knew of), though I've made small contributions to a few. As such, I don't really know what the experience is like or how my culture deals with its challenges, much like I don't really know how my culture deals with the challenges of living in Antarctica. I can make educated guesses, but that's all they are. I'm not going to explicitly flag them as such in the main body. (I'm already using the words "in my culture, we…" far too much. I'm not changing it to "in my culture, my educated guess is that we…") Ideally, I wouldn't write this essay because there was already a conversation taking place between people who knew what they were talking about. Unfortunately, as far as I've seen there mostly isn't, so I offer what I can.

Another is that I don't talk at all about the responsibilities of users, which is also an important part of the question. I'd like to, but… this essay has been knocking around in my head for at least a year and a half, I've made at least one previous attempt to write it that I gave up on, and I'm worried that if I don't publish it quickly I never will. I hope and believe that even with this omission, it is better for this essay to be published than to not be published.

(I also omit the question "what is this responsibility thing anyway?", but that seems less important to me right now. I've probably also overlooked important parts of the things I do talk about, but that too may not be a big deal.)

And a third is that without specific examples, what I've written is less constrained than what's in my head. It may well be possible to read this and agree with what I've written, and then to discover that your culture has a much stricter or much looser conception of responsibility than mine does.

With all of that out of the way: there are three questions I'm going to be focusing on here. When is responsibility taken on; what does it entail; and how do we react if people fail at it?

When is responsibility taken on?

In my culture, taking on responsibility requires active involvement, but not explicit consent. It's possible to take on responsibility through negligence - to draw responsibility towards yourself without realizing that's what you're doing. In my culture, we're sympathetic towards people who do that, but we don't consider that this absolves their responsibility.

In my culture, merely making something available does not make you responsible for it. If you want to put something online and ignore anyone who tries to use it, you can do that. You are allowed to shithub.

In my culture, you take on more responsibility for a project if…

  • If you start encouraging people to use your project. If someone happens to stumble across a repository you never specifically shared, you have no responsibility for it. If you see them describe a problem they have, and you say "oh hey, I had that problem too, you might want to check out my repo" you have a little. If you create a website for your project where you describe it as "the best way to solve this problem", you have more.

  • If you have many users. If you shirk your responsibilities, most of the harm done is to your users1. With fewer users, the harm from not-acting-responsibly is lower, and so the responsibilities themselves are lower.

  • If your users are invested in your project. If they can't easily stop using it, you have more responsibility to treat them well.

    Your users' exit costs are probably low if you make a video game, or "libpng but 30% faster" or "find but with a nicer command line syntax". In that case your users can probably just play a different game, or easily replace your tool with libpng or find.

    They're higher if you make a programming language, where migrating away requires rewriting a codebase. Or a document editor that can only save in its native binary format or pdf, so that there's no way to edit the documents without your tool.

  • If you have the ability to accept responsibility. Handling responsibility takes time and it takes multiple skill sets. One person hacking on their side project simply cannot act in the same ways as a team of two full-time developers plus twenty volunteers. That team can't act in the same ways as the Mozilla Foundation.

  • If you act like you're accepting responsibility. If you have a history of ignoring issues and pull requests, people should probably pick up on this. If you tell people you're going to break backwards compatibility, they shouldn't expect you to maintain it. Words like "production ready" increase your level of responsibility. Words like "alpha" decrease it, as do version numbers below 1.0.

A common thread here is that responsibility is related to justified expectations. "Expectations" in both the moral sense and the probabilistic sense. If someone can make a compelling argument "this person morally ought to accept responsibility here", or a compelling argument "I predict based on past behaviour that this person will act as though they've accepted responsibility here", then in my culture, that person has some degree of responsibility whether they like it or not.

Accordingly, in my culture you can disclaim responsibility in advance, simply by saying that you're doing this. Note that a pro forma warranty disclaimer in your LICENSE file isn't sufficient here. Instead you should say it at the entry points to your project - probably the README and the website (if there is one). Something like…

Note that while I think this project will be useful to many people, I am not devoting much attention to it, and I'm prioritizing my own use cases. Bugs may be fixed slowly if at all. Features may be removed without warning. If this isn't acceptable to you, you should probably not rely on the project; or at least you should be willing to fork it, if necessary.

Or even:

Note that while I think this project will be useful to many people, I have no interest in accepting responsibility for other people's use of it. Feel free to submit issues and pull requests; but I may ignore them at whim, and my whim may be arbitrary and capricious.

This won't usually be necessary. But if you feel in danger of taking on more responsibility than you'd like, you can do this.

If you have responsibility, what does that entail?

In my culture, if we've taken on responsibility and now we want to discharge it, we do so with care. We give advance notice, and we try to find replacement maintainers. If we can't find a replacement maintainer, we still get to quit. But we try.

In my culture, we acknowledge that different people have different needs and goals, and not every project will be suitable for all users. We try to help users figure out whether our projects will be suitable for them before they get invested. We're open about the limitations of our projects, and about the design goals that we explicitly reject.

In my culture, we don't need to reply to every critic. But we do try to notice common threads in criticism, and address those threads. ("Address" doesn't mean that we necessarily try to solve them in a way that will satisfy the critics. It simply means we give our thoughts about them. We try to do that in such a way that even if the critics aren't happy with what we're doing, they at least feel like we've heard them and understood them.)

In my culture, we accept that some critics will be unkind, unfair and unproductive. This sucks, but it's a reality of life right now. We distinguish these critics from others. We are free to ignore them, and to ban them and delete their posts from the spaces we control. We do not use their actions to justify shirking our responsibilities to other critics.

In my culture, we also take care to check whether we're unfairly rounding people off as being unkind, unfair and unproductive. We don't require perfection from our critics. We try to hold ourselves and our supporters to the standards we expect of our critics. We moderate our spaces, but we try not to silence all criticism from them.

In my culture, we still introduce bugs, because we're still human. We try not to, of course. But we accept fallibility. When we fail, we do our best to fix it. We put more effort into avoiding more catastrophic bugs.

In my culture, we do not accept incompetence or indifference. These are signs that a person should not have taken on responsibility in the first place. We expect people to know the limits of their ability and of how much they care.

In my culture, we can set boundaries around how our future actions are constrained. We distinguish public APIs (where we mostly try to maintain backwards compatibility) from private APIs (where we mostly don't, and if people expect us to we point to the word "private"). We may reject bugfixes that we expect to cause too much ongoing future work, in hope of finding a better fix in future.

In my culture, sometimes we hurt people deliberately because the alternatives are worse. When we do, we own it. For example, sometimes we decide to remove a feature that people were relying on. We don't make that decision lightly. When we do it, we explain why we decided to do it, and we apologize to the people who were relying on it. We don't try to minimize their pain. We try to suggest alternatives, if we can; but we try not to pretend that those alternatives are any more suitable than they actually are. We give advance notice, if we can. We can be convinced to change our minds, if new information comes to light.

Of course, the degree to which we feel bound by all of this depends on the degree to which we've actually taken on responsibility. And note that in all of this, there's very little requirement for positive action. If we don't want to include a feature, we don't need to write it, or even merge the PR if someone else writes it. If we don't want to go in the direction our users want us to, we don't have to go there. We just have to make it clear that that's what we're doing.

What if someone fails to act responsibly?

That is, if someone has taken on responsibility, whether intentionally or not, and then has failed to act like someone who's taken on responsibility… how do we respond?

In my culture, we mostly respond by acting as though this will happen in future.

We don't treat this as an indelible black mark against all aspects of the person. We'll still accept their contributions to other open source projects, for example. (Though perhaps not technically complex contributions, that will likely need a lot of maintenance from the same person going forward.) We'll accept their conference talks. We'll stay friends with them.

But we won't rely on their projects, because we don't expect their projects to be reliable. We warn others about this, but if others decide to use their projects anyway, we consider that none of our business.

We accept that people have a right to run projects as they see fit. We acknowledge that our conception of "responsibility" is a mere social convention. But social conventions are important. We provide an opt-out for this particular social convention, but if you don't opt out explicitly, we'll make it explicit for you.

We give second chances. If they want to take on responsibility in future, and if they seem to understand how they failed in the past, and they say they'll try to do better, we give them a shot, because that's reason to think it won't happen in future. If they still fail, and want a third chance, we take their second failure into account too.

We may fork a project, even if the maintainer doesn't approve, but we'll rarely attempt a hostile takeover from the outside. We'll make sure our fork is clearly distinguished from the original project.

  1. To the extent that this isn't true, I think most of the responsibility comes from something other than "being an open source maintainer". For example, if you're making a web scraper, I would suggest you have a responsibility to make it well-behaved by default - sleep between requests, stop after a certain number of failures, and so on. But that responsibility comes from the product, not the process. You can't avoid it with a disclaimer. And so it's not the focus of this essay. 

Posted on 13 April 2020 | Comments

Chris Masterjohn on Coronavirus, Part 1

Update (April 28th 2020): Part 2 published. Also a minor update to the ACE2 section.

Update (April 5th 2020): Chris released version two of his recommendations on April 2nd. There are some fairly major revisions. In particular, Chris now thinks Coronavirus infects the throat, not just the lungs. He was already making recommendations just in case this turned out to be true, so he hasn't specifically added anything, but it's still an important update. The other thing of note is that he no longer recommends using copper spray or taking echinacea until you get sick, except (for the copper) before and after potential exposure. I've updated this post accordingly. You can still read version 1.

Chris Masterjohn is a nutritionist who has some advice on supplements to take to help protect against Covid-19, and some to avoid. The raw advice is available for free, but the full report with explanation and references costs $10. I bought a copy. (The public advice has not been updated to reflect version 2.)

Should we trust him? On the one hand, "nutritionist" is not a profession I necessarily hold much respect for (it's not a protected term). Nor do I tend to think highly of people emphasizing that they have a PhD. Also, his website looks super sketchy to my eyes. Also also, something that costs money gets fewer eyeballs than something free, and so mistakes are less likely to be discovered.

(Only the last one of those should be considered a problem with the report; the rest are just priors.)

On the other hand, Chris previously turned me on to zinc so he has some credibility with me. Also, although I'm out of my depth in actually evaluating the science, I do think I have a decent bullshit detector, and the report is superficially very convincing to me.1 I get the sense that he actually really knows what he's doing, is weighing the evidence appropriately and making claims that are justified by it. He admits to uncertainty, and some of his recommendations are "this probably won't help, but just in case"; but he does speak precisely enough to be falsifiable. This doesn't mean he's right, of course. But I think he's worth some people paying some attention to.

My intention here is mostly to summarize his reasoning, in more detail than is on the linked page but less detail than is in the full report. You can make up your own minds. Much of it I haven't even checked how actionable it is right now (i.e. are these things available in stores or for delivery, and how much do they cost). I do have some questions and commentary of my own that I've scattered about, possibly not always well-flagged.

The report makes fifteen recommendations about fifteen supplements (some supplements are in multiple recommendations, and some recommendations are about multiple supplements):

  • Four everyday "essentials": elderberry, nutritional zinc, ionic zinc, and copper.
  • Two everyday "potential add-ons": garlic or stabilized allicin, and vitamin C. Chris takes allicin, and plans to take a low-dose supplement of vitamin C if he runs out of fresh food.
  • Two things to use before and after every potential exposure: copper spray, and more ionic zinc.
  • Two additional "essentials" only when you get sick: again, copper spray and more ionic zinc.
  • One "optional add-on" only when you get sick: echinacea. Chris used to take this daily, but has since decided against that.
  • Four "things to limit or avoid": these are much less pithy. "Don't take high doses of vitamins A or D" (but don't get deficient either); "Limit Calcium and Don't Use Calcium Supplements That Aren't Balanced by Phosphorus"; "Don't Use Monolaurin"; "Don't Use High-Dose Vitamin C, Pelargonium Sidoides (Umcka), or Bee Propolis".

"Potential exposure" is quite broad, including:

  • Going into public spaces to perform essential errands.
  • Accidentally encountering someone you don’t live with face-to-face within six feet, or making physical content with such a person.
  • Touching any surfaces that someone you don’t live with may have touched in the last nine days without any personal protective equipment.
  • Incorrect use of personal protective equipment (for example, taking gloves off your hands in any other way than explained here or reusing a mask without following these guidelines).
  • Putting your fingers into your mouth or nose, regardless of whether you washed them beforehand.

I think he's mostly expecting these won't happen very often. Personally, I'm doing most of them about once a week to go shopping, and the last one probably several times a day.

It includes any physical contact with someone you don't live with, coming face-to-face with such a person within six feet, and putting your fingers in your mouth or nose even if you washed them first.

In this post I'm going to focus on elderberry, zinc and copper; my current plan is to do the rest in future posts.

So far, I've followed this advice to the extent of:

  • I've stopped taking daily vitamin D supplements, and advised my partner to do the same. (Somewhat to my embarassment, as they'd only started taking them on my recommendation. I've been taking it 5x/week for years.) He says not to supplement with vitamin D at all, but he also says to get normal amounts from sunshine and vitamin D-rich foods, and we're not doing that. So maybe we should take one a week instead of one a day.
  • I bought a bulb of garlic, but I haven't done anything with it yet.
  • I took one of the zinc capsules that came free with my last order of zinc lozenges. Then I had a bad gastrointesinal day. Possibly unrelated, but they also had something of an unpleasant smell that I don't know whether it was normal. I haven't tried again. Maybe I will. They're 50mg, which is more than Chris recommends at one time.
  • I recently started developing cold symptoms, so I would have been taking ionic zinc anyway. But I've probably taken more than I would have done otherwise. (I'm down to about 40 lozenges and they're currently out of stock, so I may not use them daily.)

If you're going to follow any of it yourself, you should at least read his public post, and ideally also the full report.

General info: ACE2

SARS-CoV-2 (the virus that causes the disease Covid-19) is not like the common cold or the flu, and things which work against those may not be helpful here. On the other hand, it's very like SARS-CoV (the virus that causes SARS). The genomes are 80% identical, and "the majority of its proteins are 85-100% homologous, with an average homology of 87%".

(Question: That seems like a really low average, given that range? Though I guess it includes the ones below 85%.)

The two main things the report focuses on are ACE2 and interferon.

ACE2 is an enzyme used in regulating blood pressure. SARS-CoV-2 enters our cells by docking to ACE22. It has this in common with just two other human coronaviruses, SARS-CoV and HCoV-NL63. So heightened levels of ACE2 could increase susceptibility to SARS-CoV-2.

(Question: "The binding of SARS-CoV to ACE2 is thought to downregulate ACE2 expression, leading to a loss of the anti-proliferative and anti-fibrotic actions of ACE2 in the lung, thereby contributing to the lung damage that occurs in SARS." That seems to point in favor of increasing levels of ACE2, so that it getting downregulated is not such a big deal. How to weigh these two concerns? I haven't read it closely, but this sounds like NL63 only affects ACE2 on infected cells, and doesn't so much "downregulate" it as "obliterate" it. So my super tentative take is: even if we start with high levels, that may not be sufficient to be protective if the infection gets bad; it would be more important to focus on stopping it from getting bad. But followup questions would be: what percent of cells express ACE2? What percent of those get infected, in a (light/moderate/bad) infection? When we do things that increase/decrease ACE2, does that increase the amount of ACE2 on individual cells, or the number of cells with any ACE2, or both? If a cell has lots of ACE2, does it do more of the sorts of things that ACE2 is good for?)

Update 28th April 2020: Chris has written more about this, see section "ACE2: Friend or Foe?" He points out that differing ACE2 levels are likely to have much more effect on viral replication (which is an exponential process) than on normal bodily functioning (which isn't).

I'll talk about interferon in a future post, because it's not relevant to anything in this post.

Daily essential: Elderberry

"In rhesus monkey kidney cell culture, elderberry has virucidal, anti-plaque, anti-replication and anti-attachment activity toward HCoV-NL63". Most of the effect seems to come from caffeic acid. That binds directly to ACE2, which suggests elderberry would be similarly effective against SARS-CoV and SARS-CoV-2.

(Question: is that in vitro? Doesn't seem very reliable in general, but maybe the specific mechanism makes it moreso.)

As a bonus, elderberry is also effective against avian infectious bronchitis virus through compromising the lipid envelope. Since all coronaviruses have one of those, that effect might generalize to SARS-CoV-2.

Other foods include caffeic acid, but only black chokeberries have anything like a comparable amount. And elderberry extract is the thing that's been actually studied, so that's what Chris recommends.

There are studies using elderberry in humans to treats colds, the flu, and cardiovascular disease, but Chris doesn't mention their results. He just uses them to determine a safe dose.

Daily essential: Nutritional zinc

This is zinc from food or most supplements (including tablets or capsules).

Zinc "inhibits at least two proteins required for SARS-CoV replication, papain-like protease-2 and helicase." So it probably inhibits the homologous proteins in SARS-CoV-2. (In version 1 Chris thought it also inhibited the protein helicase, but he no longer believes that.)

(Question: how similar do proteins need to be for this to be a decent guess? This suggests proteins called "homologous" might be only 40% similar in sequence. If I guess that "homologous" means this is likely to be a decent guess; and that these ones are >85% similar (based on base rates of similarity between the viruses)… that suggests it's probably a pretty good guess? But I'm not at all cofident in this thinking.)

So we should try to deliver zinc to the relevant tissues. What are those?

The infection would begin somewhere between the nose or throat (where the virus mostly enters our body) and lungs (where it primarily infects), wherever the virus first encounters ACE2.

(Question: this seems to assume that the virus doesn't first infect something outside of this path, and then move on to the lungs. Is that a safe assumption? I would guess so.)

In version one, Chris thought the virus wouldn't find any ACE2 until it reached the lungs. To quote the previous version of this post:

There are two papers trying to answer this question, and they give different opinions. …

Hamming et al (2004)3 suggests that the virus wouldn't find any ACE2 until it reached the lungs. They did find ACE2 in the oral and nasal mucous membranes, but on the wrong side to be any use to the virus.

Xu et al (2020)4 argues that ACE2 is highly expressed through the surface of the mouth, especially the tongue.

The two used different methods. Xu had better resolution, down to single-cell. But Hamming could tell where on the cells the ACE2 was expressed. The second thing matters, and the first doesn't. Xu's results are what Hamming would have predicted, they just aren't relevant. (The symptoms are also what Hamming would predict: a cough suggests lung infection, and is present. A runny nose or other cold symptoms would suggest throat infection, but they aren't present.)

But now there's direct evidence that the virus infects the throat too. (Note that the "not-yet peer reviewed" paper is now peer reviewed.)

So it seems that although the main infection site is the lungs, the infection starts in the mouth, nose or throat. We can specifically target those with ionic zinc, for which see below.

We don't have a specific mechanism to target the lungs with zinc. We just have to take it orally in the normal way (that is, in food or supplements) and let it be distributed there.

Chris recommends 7-15mg of zinc four times a day, away from sources of phytate ("whole grains, nuts, seeds, and legumes") which can inhibit zinc intake.

(At one point he says 10-15mg, and at one point 7-10, but I think this is just bad proofreading. Mostly he says 7-15.)

Conventional wisdom says we can't absorb nearly that much, but Chris thinks we just need more dakka: "the relevant studies have been limited to less than 20 mg/d. Supplementation with 100 mg/d zinc sulfate has been shown to more than double total zinc absorbed from 4.5 to 10.5 mg/d, while decreasing the fraction absorbed from 43% to 9%."

At such high doses, side effects have been observed. "Zinc at 50 mg/d has been shown to lower superoxide dismutase, and at 85 mg/d increased self-reported anemia. Both of these could probably have been averted by proper balancing with copper, which is addressed in the section below. However, the increased need for copper at high zinc intakes reflects increased expression of metallothionein, which can bind to many positively charged metals besides copper." I confess I'm not sure what this "however" is meant to mean. It kind of sounds like "so we still probably shouldn't go that high", but then we go that high anyway. I'm a bit concerned about this.

(I also confess that I have no idea what superoxide dismutase is.)

If you take zinc, you should balance it with copper.

Multi essential: Ionic zinc

This specifically means zinc from the kind of lozenges that work for a cold, or failing that from an ionic zinc throat spray.

(The mechanism that makes this work against a cold will not help with SARS-CoV-2.)

It delivers ionic zinc to the mouth, nose and throat tissues, like we couldn't do with the lungs. Chris recommends using one lozenge a day preventatively; plus several a day on the first sign of symptoms; plus an extra one before and after any potential exposure.

(Question: to be effective against the cold, this delivers ionic zinc to the surfaces of these tissues. Here we want it on the inside. Will that work?)

Daily essential: Copper

Copper surfaces work great against coronaviruses. This knowledge is not super helpful, since we are not copper surfaces.

It does suggest that copper ions in our cells might be toxic to the virus. But this has never been well studied.

Like zinc, copper inhibits papain-like protease 2 of SARS CoV. But it's much less effective at it.

The main reason to take copper is to keep the zinc-to-copper ratio acceptable. It should be between 10:1 and 15:1. (At one point he says 2:1 - 15:1, but again, I think that's bad proofreading.)

Essential if you get sick, before and after exposure: Copper spray

He doesn't actually have loads of detail on this in the report. That was less of a problem in version 1, when he considered it a hedge. Now that he calls it essential, it feels lacking. Fortunately, he's actually written about it publicly in more detail. (Though I think still less detail than the report usually contains.)

He recommends also using an ionic copper spray for the throat infections. This is similar to his recommendation for ionic zinc as well as nutritional zinc, but he doesn't say anything about copper lozenges. I assume those don't exist and/or wouldn't work for some reason. The goal in this case is to get the copper to the surface of the relevant tissues, in hopes that this destroys the virus in much the same way copper surfaces do. It's not clear whether he has specific evidence for this effect (even against other viruses), or just expects it to work based on theory (and if so how robust he considers that theory). Based on what he's written I don't think I'd put this in "essentials" (even acknowledging that "essential" is already an exaggeration).

He doesn't recommend using it daily. Partly because he thinks the effects wouldn't last long enough for that to be worthwhile. (Question: why is zinc different here?) But also partly because he's not convinced it's safe long term. Stop if you notice irritation.

He recommends a specific brand (named in the above link) that seems to have some evidence for short-term safety.

  1. There are some things that kind of smell like bullshit to me. Most notably, I feel like at times, the report goes into a lot of detail on things that aren't super relevant, like the renin-angiotensin system that ACE2 plays a part in. As far as I've seen so far, the precise mechanics of that don't really matter. Meanwhile, a lot of the important claims are speculative - necessarily so, because things are moving too fast to have good evidence here, but speculative all the same. In combination, this can kind of feel like… "throw a lot of impeccably researched, uncontroversial and unimportant science at the reader; then try to sneak in the difficult bits under the radar"? I'm sure there's a term for this that I'm forgetting.

    I don't actually think this is what's happening. My sense is that it's more likely to be bad editing, and I'm not even confident it's that. But it seemed important to note.

    I've also noticed some inconsistencies that I chalk up to bad proofreading. 

  2. Some papers talk about "ACE2 receptors". I currently think they're being imprecise, and there's no such thing.

    By comparison, the common cold is mostly caused by rhinoviruses, and most of those dock to ICAM-1. Some colds are caused by coronaviruses, but those dock to aminopeptidase N or sialic acid. The flu docks to sialic acid. So if something protects against those by preventing them from docking, it's likely to have no effect on Covid-19. 

  3. Hamming, I. et al. Tissue distribution of ACE2 protein, the functional receptor for SARS coronavirus. A first step in understanding SARS pathogenesis. J. Pathol. 203, 631–637 (2004). 

  4. Xu, H. et al. High expression of ACE2 receptor of 2019-nCoV on the epithelial cells of oral mucosa. Int. J. Oral Sci. 12, 8 (2020). 

Posted on 29 March 2020 | Comments

My Bet Log

I occasionally make bets about future events. I think I probably should do that more often than I currently do. Given that, I think it's good practice to have a public record of the bets I've made and their outcomes, when available.

I'll try to keep this updated going forward, if I continue to make bets. Currently I'm down £72.46 with one win and one loss.

(This has nothing to do with the matched betting that I previously wrote about. There I was earning money from sign-up bonuses. Here I'm trying to earn money by being better at predicting the future than other people.)

In order of date I made the bet:

Undecided: NO on Trump leaving before the end of his first term.

I placed this bet on 20-Oct-2019, on Betfair, after a friend said on Facebook that the odds seemed very favorable to him. The odds at the time were 1.3, giving an implied probability of ${1.3 - 1 \over 1.3} = 30\%$ that he'd leave. I bet £50 against this, giving me £15 profit if he doesn't. (Minus commission, so £14.25.) As of 19-Mar-2020 the odds are 1.12, for an implied probability of 11% that he'll leave. (I could cash out now, taking my £50 back immediately and £7.56 minus commission when the market closes1. I'm not inclined to do that.)

I lost: NO on a large increase in working from home.

LessWrong user Liron said:

I also bet more than 50% chance that within 3 years at least one of {Google, Microsoft, Facebook, Amazon} will give more than 50% of their software engineers the ability to work from home for at least 80% of their workdays.

I took him up on this for $100 each. We confirmed on 02-Jan-2020. I don't think I'd heard the term "coronavirus" at the time.

This cost me $100 (£92.69 over Paypal, which I think included £2.99 transaction fee). But I got a bunch of LW karma for it. Swings and roundabouts.

Undecided: YES on someone contracting Covid-19.

In a Telegram group chat I'm in, I bet another member €30 each that at least one person in the chat would contract Covid-19 by 28-Feb-2021. (We made the bet on 28-Feb-2020.)

I wasn't confident of winning at the time, and I'm still not. Confirmed infection rates are lower than I'd anticipated - for example, Hubei has had just over 0.1% of its population confirmed as having had the disease at one point, and Italy just over 0.05%.

I think an ambiguous result is more likely than I used to, because people who get mild cases of the disease often aren't tested.

I won: DECLINE in Trump's odds of being re-elected.

On 04-Mar-2020, I noticed that the betting markets' odds on Trump's re-election had gone up since Covid-19 became a thing. (According to Election Betting Odds, rising from about 40% mid-november to about 56% at the time.) At the time, the number of officially diagnosed cases in the US was very small. I predicted that as the official number rose, and as it became more widely known that the CDC and FDA had basically fucked up, the odds would drop again.

The obvious objection was: "if you can predict that happening, so can anyone. Why haven't the odds changed already?" I didn't have an answer for that, but the stock markets had seemed to react slowly to the news2 so maybe it would happen here too. (And the next obvious objection was: "oh, so you're expecting the same thing to work twice in a row?" I didn't have an answer for that either.)

Less obvious objections that I realized/someone pointed out after I'd made the bet:

  • Bush's popularity rose after 9/11.
  • Covid-19 seems to play very nicely into Trump's anti-immigrant, anti-globalist narrative.

In any case, I went ahead. Again on Betfair, I bet £100 against Trump at odds of 1.76. The odds started to drop pretty soon after that. Two weeks later, on 18-Mar-2020, I cashed out (by betting £110.26 for Trump at odds of 2.1, but actually by pressing the "cash out" button). I've claimed my initial £100 back, and no matter who wins I'll earn £21.29. (£20.23 after commission, but nothing if the market is voided.)

Undecided: BIDEN to win the 2020 election.

On 05-Sep-2020, I put £30 on Biden at the same market as my previous Trump bets, at odds of 1.98 (implied probability ~51%). At the time, FiveThirtyEight, Metaculus and Good Judgment Open all gave Biden at least 65% to win (with different caveats). I have no strong opinion; my bet was on the basis that I think I trust those predictors more than I do the market. If the market gets more in line with what they're saying, I may cash out, whether I'm up or down at the time.

Note that Betfair settles the market according to projected Electoral College votes. A faithless elector, or an incumbent refusing to stand down, don't affect the result. My interpretation is that 538 and Metaculus are also trying to forecast that, so that's not the source of the difference. (I can't see any fine print on GJ Open.) I can think of a few other possible sources:

  • People might misunderstand the Betfair rules. (Or the Metaculus ones, but that's close to 538's forecast, the fine print is much more obvious there, and it has a comment thread.)
  • I think 538 is modelling under the assumption that Biden stays in the race until the election. If he doesn't, I think including if he dies, I lose my bet. But the Metaculus and GJO forecasts aren't conditional on that (they don't mention Biden at all).
  • If a Trump victory would raise the value of GBP compared to a Biden victory, the Betfair market will exaggerate Trump's chances of victory. Lowering the value of USD might work, too. Or if people just want to hedge against a Trump victory.

But I don't really expect that any of these explains it.

  1. I can't take any profit now, because if he gets assassinated the market is void. 

  2. This didn't do me any good. I hadn't sold any stocks at the time, and I still haven't. 

Posted on 19 March 2020 | Comments

Tapping Out In Two

I'm one of those people who feels personally called out by xkcd's "Duty Calls" ("someone is wrong on the internet").

(Not as much as I used to. At some point I stopped reading most of the subreddits that I would argue on, partly for this reason, and Hacker News, for unrelated reasons, and now I don't do it as much.)

As pastimes go, there's nothing particularly wrong with internet arguments. But sometimes I get involved in one and I want to stop being involved in one and that's not easy.

I could just, like, stop posting. Or I could let them say something, and then respond with something like, "yeah, I'm still not convinced, but I don't really have time to get into this any more". But then the other person wins, and that's terrible. It potentially looks to onlookers like I stopped because I couldn't find something to say any more. And in at least some cases, it would feel kind of rude: if they've put a lot of thought into their most recent post, it's a bit dismissive to just leave. In the past, when people have done that to me, I've disliked it, at least some times. (Well, at least once. I remember one specific occasion when I disliked it, and there may have been others.)

Another thing I could do is respond to their most recent post, and then say "and now I'm done". But that feels rude, too, and I certainly haven't liked it when people have done it to me. (Why not? It puts me in the position where I don't know whether to respond. Responding feels petty, and kind of a waste of time; not responding feels like they win, and that's terrible.) If they do reply, then I'm shut off completely; I can't even make a minor clarification on the order of "no, you somehow interpreted me as saying exactly the opposite of what I actually said".

So I don't like those tactics. There are probably mental shifts I could make that would bring me more inclined towards them, but… about a year ago I came up with another tactic, which has seemed quite helpful.

What I do now is say something to the effect of: "after this, I'm limiting myself to two more replies in this thread."

This has various advantages. It feels less rude. It doesn't look like I'm quitting because I have no reply. It helps the conversation reach a more natural conclusion. And it also feels a lot easier to do, partly for the above reasons and partly for the normal reasons precommitment helps me to do things.

For some quantitative data, I went over my reddit history. It looks like I've used it ten times. (I don't think I've ever used it outside of reddit, though I initially thought of this after an argument on Facebook.)

  1. "(This doesn't seem super productive, so I'm going to limit myself to two more relies in this thread.)" This was my third comment, and we each made one more afterwards.
  2. "Limiting myself to two more replies in this thread." This was my third comment, and afterwards, I made two more and they made three more. Their final two replies got increasingly rude (they'd been civil until then), but have subsequently been deleted. This is the only time I ran into the two-more-comments limit. Also, someone else replied to my first comment (in between my final two comments in the main thread) and I replied to that as well.
  3. "I find myself getting annoyed, so I'm limiting myself to two more replies in this thread." This was my fourth comment, and afterwards we made each made one more. (This was the only person who questioned why I was continuing to reply at all. A perfectly reasonable question, to which I replied "not going down this rabbithole".)
  4. "(Limiting myself to two more replies in this thread.)" This was my third comment, and afterwards we made two more each. If their final comment had come after I hit the limit, I would have been tempted to reply anyway. (Uncharitably paraphrased: "oh, we've been talking past each other, this whole time I've been assuming X and you've been assuming Y, which is very silly of you" / "well I explicitly said I was assuming Y and not X in my very first post in this thread, and even if we assume X and not Y your behaviour still makes no sense".) All of their comments in this thread have since been deleted.
  5. "Pretty sure this is all a big waste of time, so I'm limiting myself to two more replies in this thread." This was my sixth comment linearly, but only second in reply to this specific user, and I'd previously made another two in different subthreads below my first comment. Afterwards, the person I'd been most recently replying to didn't make any more comments. But someone else replied to this comment, and I replied to them; and two other people replied in those other subthreads, and I replied to both of them as well (but one of those replies was just a link to the other).
  6. "Limiting myself to two more replies in this thread." This was my fifth comment, and neither of us replied afterwards, though I did quickly edit this comment to add a clarification that arguably could have counted as its own comment. Someone else replied to one of my comments higher up the chain, and I replied to them.
  7. "(I don't want to get sucked into this, so I'm limiting myself to two more replies in this thread.)" This was only my second comment, and neither of us replied further.
  8. "(Limiting myself to two more replies in this thread.)" But in their next reply, they made a mistake that I had previously explicitly pointed out. So instead of replying in depth, I said, "It's quite annoying that you continue to misunderstand that, so I'm out." They replied saying "I think you're struggling to justify your initial comments and that's why you're "out", but that's fine. I accept your unconditional surrender." I didn't reply further. Since it was /r/science, and I was quite miffed, I reported this comment as unscientific to see if it would help me feel better. I don't remember if it worked. The comment did not get removed.
  9. "I think I'm going to reply at most twice more." This was my fifth comment, and neither of us made any more afterwards. Their first and final comments are still there, but their other three have been deleted.
  10. "I'm going to limit myself to two more replies after this." This was my third comment, plus I'd made one in another subthread below my first comment. Afterwards we each replied twice more.

In thread #5, I made 11 comments in total; but in others, my max comment count was 6. This feels like I mostly did a decent job of not getting sucked in too deeply. And since I generally got the last word in (threads #2 and #7 were the only exceptions), I think (though I don't specificly remember) I rarely had the feeling of "if I don't reply now then they win and that's terrible". Thread #7 is the only one I think I still feel lingering resentment about. From before I thought of this tactic, I can think of at least two arguments where I didn't get the last word and I still feel lingering resentment. (One is the argument that triggered me to think of this tactic.)

So generally this seems like a success. Ideally we'd compare to a world where I didn't think of this tactic, but I'm not really sure how to do that. (We could go over my reddit arguments where I didn't use it, but that's clearly going to have sampling bias. We could go over my reddit arguments from before I thought of it, but we can't know when I would have used it or how things would have progressed. Possible experiment: going forward, each time I want to limit my future replies, I toss a coin for whether I actually do it in that comment. Keep track of when I did this. I am unlikely to actually run this experiment.) For a way it plausibly might not have been a success: I suspect it's the case that having limited my total investment, I spent more effort on many of these comments than I would have otherwise. If these arguments would have ended just as quickly in any case, then this tactic caused me to spend more time and effort on them.

I'm a bit surprised that I only ran into the two-more-comments limit once. A part of me would like to interpret that along the lines of: once I started putting more effort into my comments, I demolished my opponents' arguments so thoroughly that they accepted defeat. But this seems unlikely at best.

I will say that none of these comment chains felt like collaborative discussions. Some of them started that way, but by the end, they all just felt like "I am right and you are wrong". (This is not very virtuous of me, I admit.) My thinking had been that this tactic would be most valuable in collaborative discussions. But it seems I don't have many of those on reddit, at least not ones that I spend much time on. So, no direct evidence on that yet.

I'm not sure how to handle replies to other people, or replies in subthreads other than the main one. A strict wording would suggest that I should count subthreads against the limit, but I haven't so far and it hasn't caused me problems. Even a weak wording would suggest that I should count replies to other users against the limit, but I've only had one of those and didn't reach the limit whether you count them or not.

I'd ideally like to have a get-out clause like "…unless I actually decide that replying after that is worth my time". But I'm not quite sure that's the "unless" that I want. (Plus it seems kind of rude, but it's not like I'm being super polite as-is.) Anyway, I haven't needed that clause yet.

Posted on 05 December 2019 | Comments

The Effect pattern: Transparent updates in Elm

I don't like writing tests in Elm1. Naively, it seems like the fact that all Elm functions are pure would make it easy. But I've still never really been able to test the things I really want to.

It's possible that I'm not very good at writing tests in general. But even if that's true, I don't think it's the whole story.

I have another hypothesis, which is: much of the reason is that although Elm functions are pure, in the sense of "given the same input, they return the same output" - they aren't necessarily transparent, in the sense of "you can look at the output and verify that it's correct".

To recap2, I would describe the Elm architecture as consisting of:

  • Model - a type describing the data you're interested in. This is usually mostly transparent. There are some opaque blobs where components haven't exposed their internals, but I think those mostly aren't the kind of opacity that gets in the way3.

  • Msg - a type describing the messages your app passes around. Again, mostly transparent, with opaque blobs where components haven't exposed their messages. I feel like this opacity is maybe a bit more annoying, but still not very.

  • update : Msg -> Model -> (Model, Cmd Msg) - a function describing the logic of your app (or whatever it is you're architecting). The Model and Msg are still transparent, but the second return value is a Cmd Msg which is completely opaque, and which is the main source of the problem.

  • view : Model -> Html Msg - a function describing the rendering of your app. Html Msg is also arguably opaque, but you can inspect it using elmer4, or (as I discovered while writing this post) the Test.Html modules in the test package. I'm not sure if you can inspect it as deeply as you might want, I haven't explored these much yet.

Now, in my experience5, a lot of the logic in an app is contained in the way the update function chains. "After receiving this message, we send a command which eventually results in receiving that message, and then we want to put some value in the model." And we can't test those chains, because Cmd is opaque. So that's part of why I haven't liked writing Elm tests.

But I think I've found a method that will help, at least somewhat. You don't need to rewrite anything except your update function, and I actually think it's a mild improvement even if you ignore testing. I call it the Effect pattern.

We've implemented this pattern at work, in a couple of places. When I first did so, the first two tests that I wrote both revealed bugs in my code. (I think I'd known about both of those bugs but forgotten them. I wasn't consciously expecting the tests to fail.) Neither of those tests needed to be updated when I fixed the bugs. I think this is a good sign, but it's not very strong evidence. Beyond that, I mostly have theory.


I'm going to demonstrate with a simple app. It'll be buggy, and the bug will be hard to test for. Then I'm going to refactor the app to make it more testable, and write a test. Then I'm going to fix the bug, and verify that the test now passes. I think the app is too simple to make the benefits really obvious, but hopefully I can convince you anyway6.

So here's the app. It has a number, and a button to increase the number by one, and a button to multiply it by two. When you press one of those buttons, the number gets "saved to the server"; normally that would be an http request, but we emulate that by just sending a time delayed response. There's a message telling you whether the number has been successfully saved yet.

Actually, most of the app supports increasing the number by any integer, and multiplying it by any integer. That's not implemented in the view though. The Watsonian explanation for this is that the developer has a habit of overengineering things. The Doylist explanation is that it makes the interesting bits more interesting, but not significantly more complicated; and the view code is boring.

The bug is: if you press buttons quickly, the "saved" message will temporarily be wrong. You've sent two messages to the server, and one hasn't come back yet; the number that's been saved is not the number on display. Silly, but I think it works as a demonstration.

(There are surely also bugs related to the size of numbers, but I'm not interested in those right now. Additionally, a real life server would have other problems like "sometimes requests will fail" and "sometimes requests will be handled out of order"; the second in particular is hairy, but I haven't emulated these possibilities.)

Here's the update function:

update : Msg -> Model -> (Model, Cmd Msg)
update msg model =
    case msg of
        IncBy n ->
            ( { model | count = model.count + n, saved = False }
            , fakeSaveCount (model.count + n)
        MultBy n ->
            ( { model | count = model.count * n, saved = False }
            , fakeSaveCount (model.count * n)
        Saved ->
            ( { model | saved = True }, Cmd.none )

You can probably guess roughly what Model and Msg look like just from that.

As written, how would you test for the bug? I think the way to do it is to write a wrapper update1 msg model = Tuple.first <| update msg model that ignores the Cmd. Then test that

    |> update1 (IncBy 1)
    |> update1 (IncBy 1)
    |> Saved
    |> .saved

is False. And that works, but you need to know exactly what messages get sent in response to IncBy 1. In this case that's easy. In more involved cases, you'll need to know it for every Msg constructor, and you're going to make a mistake.

Here's how I'd rewrite for testability:

type Effect = ESaveCount Int

update : Msg -> Model -> (Model, Cmd Msg)
update msg model =
    let (newModel, effs) = updateE msg model
    in (newModel, Cmd.batch <| List.map (\e -> runEffect e model) effs)

updateE : Msg -> Model -> (Model, List Effect)
updateE msg model =
    case msg of
        IncBy n ->
            ( { model | count = model.count + n, saved = False }
            , [ESaveCount (model.count + n)]
        MultBy n ->
            ( { model | count = model.count * n, saved = False }
            , [ESaveCount (model.count + n)]
        Saved ->
            ( { model | saved = True }, [] )

runEffect : Effect -> Model -> Cmd Msg
runEffect eff _ =
    case eff of
        ESaveCount n -> fakeSaveCount n

That is, we split it into two parts. One part updates the model, and describes the side effects transparently. The other part turns the transparent side effects into opaque ones.

Even if we're not testing this, I kind of like it. It's nice to have a complete list of possible side effects in one place, where it isn't mixed in with model updates. But I do also want to be able to test it. Obviously, now we test updateE instead of update. Is that any better? I think so.

Here's a simple example of something we can now test that we couldn't before: "whenever update changes model.count, it makes a server request to save the new value; and whenever it makes a server request to save a value, that value is now in model.count." (We may need to write fuzzers for our Msg and Model to do this.) To do this, we need to trust that runEffect works properly, that ESaveCount actually generates a server request; but I think that's a fairly okay thing to trust.

But that's small fry. I spoke above about chaining the update function, and that's harder. If you're going to do it properly, you still need to know what messages get sent in response to every Effect constructor, and you can't learn that from calling the function. But I still think this is an improvement: if done right, I would expect (and my limited experience agrees) that typically Effect will have many fewer constructors than Msg, and for each one, the transformation to messages is fairly simple.

Here's a helper function that you can use to test updateE:

    : (state -> msg -> model -> (state, (model, List effect)))
    -> (state -> effect -> model -> (state, List msg))
    -> state
    -> model
    -> List (List msg)
    -> (state, model)
runUpdates updateE runEffect initialState initialModel messages =
    let go = runUpdates updateE runEffect
    in case messages of
        [] -> (initialState, initialModel)
        [] :: moreMsgs -> go initialState initialModel moreMsgs
        (msg1 :: moreMsgs1) :: moreMoreMsgs ->
            let (newState1, (newModel, effs)) =
                    updateE initialState msg1 initialModel
                (newState2, newMsgs) =
                        (\e (s, m) -> runEffect s e newModel
                            |> Tuple.mapSecond ((++) m)
                        (newState1, [])
            in go newState2 newModel ((moreMsgs1 ++ newMsgs) :: moreMoreMsgs)

On a high level, the way it works is this: you pass it your regular updateE function and a mocked runEffect function, together with an initial model and a list of messages to send. The messages get sent, one at a time, to the model. Any effects caused by updateE are handled by the mock runEffect, which returns a list of additional messages to be sent in future. We keep running until there are no more messages.

There are two complications. One is that we also thread through some arbitrary state, that can be updated both by updateE and runEffect7. We'll see the value of that later, but if you don't need it,

stateless : (a -> b -> c) -> (() -> a -> b -> ((), c))
stateless f () a b = ((), f a b)

is a helper function that lets you forget about it.

The other complication is that we pass in a nested list of messages. That's to give us some control over timing. Any messages returned by runEffect will be handled after the current "batch" of messages, but before the next "batch". So if you want to test the sequency "run msg1, then msg2, then handle the results of msg1", you would pass in [[ msg1, msg2 ]]. If you want to test the sequence "run msg1, handle its results, then run msg2", you would pass in [ [msg1], [msg2] ].

I have a feeling this won't always be enough control, and in future the mocked runEffect will need to return List (List msg) in place of List msg. (With the first item being messages to run immediately, the second being items to run between the current and the next batch, and so on. You'd get the current behaviour by returning [ [], msgs ].) But it suffices for now.

And here's one way to test the app:

type alias TState = (Int, List Expectation)

testSave : Test
testSave =
        mockUpdateE : TState -> Msg -> Model -> (TState, (Model, List Effect))
        mockUpdateE (n, exps) msg model =
            let (newModel, effs) = updateE msg model
                newState = case msg of
                    Saved ->
                        ( n - 1
                        , (newModel.saved |> Expect.equal (n - 1 == 0))
                            :: exps
                    _ -> (n, exps)
            in (newState, (newModel, effs))

            : TState
            -> Effect
            -> Model
            -> (TState, List Msg)
        mockRunEffect (n, exps) eff model =
            case eff of
                ESaveCount _ -> ( (n+1, exps), [Saved] )
    test "Doesn't pretend to be saved" <| \() ->
        let ((_, exps), _) =
                runUpdates mockUpdateE mockRunEffect (0, []) initialModel
                    [[IncBy 1, IncBy 1]]
        in Expect.all (List.map (\e () -> e) exps) ()

I admit, this is pretty ugly. But I think it's conceptually quite simple. The state keeps track of two things: how many save requests are currently "in flight", which gets updated as we step through; and a list of assertions, which we verify at the end. Every time we send a request (with ESaveCount), we increase the in-flight count. Every time we receive a Saved message, we decrease it, and add a new assertion to the list: model.saved should be True iff there are no requests remaining in-flight.

You can see this version of the app here. Note that to avoid the hassle of playing with Test.Runner, I've replaced the Test with an Expectation by commenting out the test "..." line (but nothing underneath), and put the result in the view. You can remove the second IncBy and check that it now passes (because if there's only one IncBy, the bug doesn't exhibit).

Now to fix the bug, and see what effect that has on the tests. If you can't fix a bug without changing the tests you wrote for it, that's a bad sign about your tests.

The test uncovered the bug by keeping track of some additional state. So the obvious thing to do is to move that state into the model. Count the number of requests in-flight, and only count as "saved" if there are none8. In fact, we no longer need saved as a field at all; as written in the test, saved is true iff there are no in-flight requests.

(This doesn't work in the real world, where requests can fail. I think it's fine as a demonstration.)

Removing saved actually does mean we need to rewrite the test. But only minorly, and we can do that before fixing the bug. All we need is to replace the reference newModel.saved with isSaved newModel where isSaved = .saved. Then when we fix the bug, we rewrite that as isSaved m = m.inFlight == 0, and the test passes with no more changes. We'll want to use that function in the view anyway. (Avoiding this kind of thing is one reason people like to write opaque models.)

You can see the fixed version of the app here.

However… now that we've put that state in the model, although the test still works, it's looking a bit silly. We've got some state in the tests that should always be identical to the state in the model; why not just test that the model is updating the state correctly?

So here are some other tests that we could write (but that I'm not going to, for time reasons):

  • We could test that, given the same messages as above, the model doesn't become isSaved until the very end of runUpdate. We can still do that with runUpdates; our state parameter is now List Model, we use mockUpdateE to keep track of every intermediate model (and mockRunEffect no longer needs to deal with state at all), and then finally verify that isSaved is false for all but the last. (This test we could have written with the old model. As described it fails if, in future, Saved starts having other effects - maybe it wants to send something to a port. We can get around that by doing more bookkeeping, but we could also simply decide that mockRunEffect will pretend it doesn't have those effects.)

  • We could test that any update which returns an ESaveCount also increases inFlight; and further, that any messages sent in response to ESaveCount will decrease inFlight. I think this test is most valuable if it uses a mocked runEffect that's also used in other tests.

Really though, I'm not sure what's best. I do think the Effect pattern will help.

Further thoughts

We haven't used this much yet, so there's a lot left to explore. Here are some thoughts I've had, but not developed very far, in no particular order.

Nomenclature. We've gone from update to update, updateE and runEffects. I currently think runEffects is an okay name, but I feel like with no historical baggage, the name update would have gone to updateE. Then what would update be called? I left it as-is partly to avoid stepping on historical toes, and partly because I don't know. runUpdate? That would give us the pattern "update functions update the model and return some kind of what-next value, run functions produce Cmds". (Of course, runUpdates violates that pattern. Maybe that should become foldUpdate…?)

Also, we'll want a helper function to convert updateE and runEffects into the correct form. What do we call that function? The only sensible suggestion I have is liftUpdate, but I'm not sure I like it.

Randomness. The test I've demonstrated was deterministic. So have all the tests I've written for Effect so far. (At least one used a fuzzed input, but that doesn't feel like it counts.) To get randomness I imagine you'd need to put a Seed in the state parameter of runUpdates, and then use Random.step (maybe with Test.Runner.fuzz to get you the Generator that you need).

Simpler test helpers. runUpdates is quite complicated, and as mentioned, I suspect it'll need to be even more so. I think most tests probably won't use all of its features. What simpler variants of runUpdates are worth creating (and what should they be named)? An obvious candidate is a variant with no state. Maybe also a variant that returns a list of models instead of just the final model.

Full state machine. In Haskell we use quickcheck-state-machine, which I really like. Could we do something similar in Elm? I think it would be possible, though probably looking quite different. Elm doesn't have all of the type-level features that QSM relies on, but it also doesn't have all of the features that QSM needs to support.

Should runEffect see the model? Reasons yes: it simplifies the Effect constructors; it reduces total amount of code; it allows one stateless function to work with both updateE and runEffects. Reasons no: it gives runEffect more rope to hang you with (more likely to be complicated, more likely to diverge in the codebase and test suite). We went with yes because in one of our components, many constructors would have needed many parameters. But we probably could have refactored a little so that many constructors would have needed only one parameter each.

Can you reuse mocked runEffect functions? I touched on this above. That is, will you be able to write just one or two per component and use those in most of the tests for that component? (If not directly, then just wrapping them in some state-updating code that doesn't change the Msg they return.) Or will each test need a separate one? I certainly hope you can reuse them. If not, that might diminish a lot of the value. The fewer you have, the easier it is to keep them all correct when the real one changes.

(What if you have a potentially large number of functions, but selected by a configuration parameter? I think depending on how this is handled, it could be closer to either end of the scale.)

Composability. A component using this pattern can be embedded in one that doesn't, no problem. You just don't get any of the benefits in the parent.

What if you embed a non-Effect component in an Effect component? This forces you to put some Cmd in your Effect type. How do you test that? I'm not sure you reasonably can. So far I think we've just avoided testing those parts of our components.

What if you embed an Effect component in another Effect component? In the codebase, I imagine your parent's Effect type has a constructor ChildEffect Child.Effect which in turn calls Child.runEffect. That lets you test the parent component, but… you may need to mock runEffect for every child component, and that doesn't sound fun. (If the mocked functions do turn out to be fairly reusable, maybe not so bad.)

Also, if you want to provide an Effect component in a library, you may want the Effect type to be opaque (like Model and Msg often are today). But then if you want people to be able to test your component in theirs, maybe that makes you responsible for providing suitable mocked runEffect functions.

Unconventional updates. I've been assuming your existing update function has type Msg -> Model -> (Model, Cmd Msg). But a lot don't. A common pattern is something like Msg -> Model -> (Model, Cmd Msg, MsgToParent) where MsgToParent is something that the parent may need to act on. (In our case, it's often just global state that needs to be sent back up in turn.) How do you handle this kind of thing?

In the codebase, I think the correct decomposition for this is still to change only the Cmd Msg return value, since that's the only opaque one, and for runEffect to be the same as it would be otherwise. (You could give it MsgToParent as an extra parameter, but like with Model that won't give you any extra power. By default I wouldn't do it, but I don't think it would be an obviously bad idea.) If you had two opaque return values… honestly I don't think I've seen this and I don't think I want to speculate about what to do in that situation.

In the test suite, you won't be able to use the existing runUpdates function. What to do instead? It might prove useful to have a runUpdates3 (or something) which can handle this case, but I'm not sure exactly how that would work.

Another possibility would be using wrappers to put your updateE in the right shape. For example, maybe you can do type EffectOrMsg = Effect Effect | MsgToParent MsgToParent, and then combine the final two results of updateE. I don't know if that would always get you what you want.


You can take your existing function update : Msg -> Model -> (Model, Cmd Msg), and decompose it into two others:

  • updateE : Msg -> Model -> (Model, List Effect)
  • runEffects : Effect -> Model -> Cmd Msg

defining the new Effect type as whatever makes this decomposition work nicely. I think this is a slight improvement in the codebase; and independently of that, I think it helps you to write better tests.

  1. I'm not a fan of writing Elm code in general, but that's for mostly unrelated reasons

  2. I'm not going to recap in detail. This post is mostly targeted at people who already know Elm. I kind of hope others can get something from it too, but I'm not trying hard for that. 

  3. I'm somewhat philosophically opposed to this kind of opacity, even if practically I don't think it's a big problem. But my philosophical objections aren't relevant here. 

  4. I'm confused about how Elmer works. It uses kernel code, which isn't supposed to be available to mere mortals. The elm.json file says its package name is "elm-explorations/elmer", which would explain how it can use kernel code; but the name typically points at a github repository, and that repository doesn't exist. Has Elmer found some way to circumvent the kernel code restrictions? It seems to be something to do with the elmer-test library on npm, but… maybe I'm blind, but I can't even find the source code for that online? It'll be in my filesystem at work somewhere. Maybe some day I'll investigate further.

    I wish Elmer had a README section like "oh, you thought this was impossible? Surprise, sucker! Here's how it works!" I've seen many projects that I wish had that. 

  5. My vast experience of writing one Elm app, in one company, mostly with other developers who have approximately the same amount of experience. 

  6. Or, if it happens that I'm wrong about the benefits, I hope I can explain myself well enough that someone can understand why I'm wrong, and then convince me

  7. My initial version only exposed state to runEffect, because we didn't need it for updateE on the tests we've written so far at work. But I couldn't see a way to avoid it, or something like it, for this example.

    I weakly guess that if the state updates generated by updateE depend on the message, that's already a bad sign. We need it for the test I'm about to write, but that's precisely because the model doesn't keep track of everything it needs to. 

  8. I had planned to take a different approach, storing both "current count" and "the count saved on the server" (which would be placed in the model through a new argument on the Saved constructor). But that has its own bugs: suppose the count starts at 0 and quickly goes 1, 0, 1. Then we'll get messages Saved 1, Saved 0, Saved 1. After the first Saved 1, we'll think we're saved until Saved 0 arrives. 

Posted on 20 October 2019 | Comments

A reckless introduction to Hindley-Milner type inference

(I've been editing this post on and off for almost a year. I'm not really happy with it, but I suspect I never will be.)

Several months ago I gave a talk at work about Hindley-Milner type inference. When I agreed to give the talk I didn't know much about the subject, so I learned about it. And now I'm writing about it, based on the contents of my talk but more fleshed out and hopefully better explained.

I call this a reckless introduction, because my main source is wikipedia. A bunch of people on the internet have collectively attempted to synthesise a technical subject. I've read their synthesis, and now I'm trying to re-synthesise it, without particularly putting in the effort to check my own understanding. I'm not going to argue that this is a good idea. Let's just roll with it.

I'm also trying to tie in some quasi-philosophy that surely isn't original to me but I don't know if or where I've encountered it before.1


When people write software, sometimes it doesn't do exactly what we want. One way to find out is to try running it and see, but that's not ideal because any complicated program will have way too many possible inputs to test. (Especially when you consider that inputs include things like "amount of free space on disk" and "time taken for a web server to respond to a request".) So it would be nice if we could mathematically prove whether our software does what we want, without actually running it. Can we do that?

That's not a very well-defined question, but we can ask more precise versions of it. Here's a well-known one: given some possible input to our software, we might want to prove that our software will eventually stop running. Can we prove that?

That question is known as the halting problem, and the simple answer is that we can't, not in general; the halting problem is undecideable. But the full answer is more complicated.

To solve the halting problem, we want a program that, when shown another program and some input to be fed to that program, satisfies three different conditions:

  1. It will always return an answer.
  2. The answer will always be either "yes, this always terminates" or "no, sometimes this doesn't terminate".
  3. The answer is always correct.

And that's not possible. But we can compromise on any of the three. We can make a program that sometimes doesn't return an answer, or one that sometimes gets the answer wrong. But perhaps most interestingly, we can make a program that sometimes says "I don't know".

And when you allow that answer, you can create a language on which the halting problem is decideable. You can write a program that will tell you truthfully whether any program written in that language will terminate; and for any other program, will say "I don't know". (Perhaps expressed in words like "syntax error on line 1".)

Now, the halting problem is tricky. It turns out that if you create a language like that, there are a lot of interesting things that programs written in that language just won't be able to do; the language will necessarily be Turing incomplete.2 But there are also lots of interesting things that they can do. To give three examples of such languages3:

  • Regular expressions are really useful for certain operations on strings, but that's about all they're good for.
  • SQL is really useful for working with databases. According to some people on stack overflow, the ANSI SQL-92 standard was Turing incomplete and the ANSI SQL-99 standard is Turing complete. (No mention of the SQL-96 standard that came between these, but reading between the lines, probably Turing incomplete.) If I understand correctly, the feature required to make SQL-99 Turing complete4 is one I've literally never used; so for my purposes, it may as well be Turing incomplete.
  • Coq is used for proving math theorems. It's an interesting one because when you write your program, you have to also provide a proof that your program terminates. (I think this is slightly false, but again, good enough for the point I'm making.)

So although these languages can't do everything, they can still be incredibly useful in their domains. More useful than a more general purpose language might be. One reason for this is that being able to prove non-termination is a useful property of the language. If you had to write a SQL query in C, it would be all too easy to write some C code that would accidentally loop forever.

I'm trying to illustrate here something that seems to me important, which is that there's a tradeoff between what I'll call expressiveness and legibility. A programming language is expressive if you can easily write many interesting programs in it5; it's legible if you can easily say many interesting things about the programs you've written in it. And I claim that the most expressive programming languages won't be the most legible, and vice-versa; though there will certainly be languages which are neither expressive nor legible. This tradeoff seems fundamental to me, and I expect that some approximation of it has been proven as a theorem.6

I haven't defined these very well, but hopefully some examples will help. I will also clarify that both of them are highly dimensional; and that "raw computational power" is one of the things that expressiveness can point at, but not the only thing; and "human readability" is not really one of the things that legibility points at, but many things that increase legibility will also increase human readability.

  • Perl-compatible regular expressions can classify sets of strings that normal regular expressions can't. But they're harder to make time and space guarantees about. And it's possible to prove whether two regular expressions are equivalent, but that's not possible in general for PCREs (proof: PCREs can encode CFGs; CFGs can't be proved equivalent).

  • Under certain assumptions, Haskell's monadic IO lets you look at the type of a piece of code and know that it won't depend on external state. In return, a function can only bring in external state if its caller allows it to (which requires having permission from its own caller, and so on).

    The assumptions in question are false (partly because unsafePerformIO exists), but I've been able to get away with pretending they're true (partly because unsafePerformIO is punishable with excommunication).

  • Custom operators (at least as implemented in Haskell and Elm) are equivalent to named functions, and don't gain or cost much in terms of legibility and expressivity. They simply make code more or less readable. But operator overloading, at least when combined with dynamic typing, gains expressivity at the cost of legibility (you no longer know that a + b will do anything remotely like an addition).

  • Macros make it easier to do things like create DSLs, reduce boilerplate, and set compile-time config options. But they mean that a function call might not look like one, or vice-versa; expressions might get evaluated many times, or not at all; and the code might perform differently depending on the phase of the moon when it was compiled.


So we've got this tradeoff, and in our programming language design we try to navigate it. We try to find kinds of legibility that can be bought for little cost in expressiveness. Or more precisely, we try to find kinds of legibility that we care about, and that can be bought for little cost in kinds of expressiveness that we care about.

And Hindley-Milner type systems are a tradeoff that's proved fairly successful, both in direct use and as inspiration. At my company7, we use Elm8, which runs on an approximately HM type system. (I don't think it's pure HM, due to extensible record types.) We also use Haskell9, which runs on a type system that extends HM in many directions. Haskell's system is more expressive and less legible, but still successful. (I'll mostly be using Elm for examples in this post, and not extensible records.) ML and OCaml are other notable languages based on HM, though I haven't used either.

The legibility HM offers is, roughly, the ability to prove that a program typechecks. I'm not going to clarify exactly what that means, but we probably all have a decent idea. It's the thing that lets the Elm compiler say "no, that program is trying to add a string to an int, bad program", while the Python interpreter doesn't know that's going to happen until it's too late. The Elm compiler will refuse to compile your program unless it can logically prove that it will typecheck.

More precisely, what HM offers isn't type checking but the more general type inference. (And beyond that, type inference in roughly linear time.) Type inference doesn't just tell you whether a program typechecks, but what its type is; a program fails to typecheck iff no type can be inferred for it.

What this means is that there's no need to supply type annotations. And indeed, in Elm you can get away without them, except I think for extensible records. In Haskell you sometimes can't, because Haskell loses some of the legibility that HM offers.

(We typically do supply type annotations, but that's because they're useful. Partly as documentation for humans, partly to help pinpoint errors when our programs fail to typecheck.)

And so in an HM system you get no runtime type errors. And although not all runtime errors are type errors, in many cases they could be. For example, an array out-of-bounds exception isn't a type error. But when designing a language, you can decide that array out-of-bounds exceptions won't exist, any array lookup will return either a value from the array or null. If type errors are possible, you've just eliminated one source of errors by pushing them somewhere else, and possibly somewhere harder to debug. But in HM, you've eliminated one source of errors by pushing them somewhere more visible, where they can be ruthlessly executed.

Elm actually tries to promise no runtime errors, period, provided you stay inside Elm. On one level, I think that's a fairly minor imposition on language design, something you get "for free" by deciding that none of the built-in functions you provide will ever throw a runtime error. On another level, it seems completely impractical to decide for example that cons will return a meaningful value if it can't allocate more memory. I'm not aware that Elm even tries to handle those errors.

(Haskell doesn't try to promise the same thing, and allows functions to return undefined. This is another legibility-expressiveness tradeoff.)

So HM's legibility gain is: type inference, powerful type system, no runtime type errors, optionally no runtime errors at all. It's good.

Meanwhile, the expressiveness cost is that you need to write your programs in ways that the type inference algorithms can work with, which forbids some things that you might like to do.

For example, suppose you want to clamp a number to between -1 and +1. In Python, you could write that like

def clamp(x): sorted([-1, x, 1])[1]

and as long as sorted always returns a list of the same length it started with, that works fine10. But it only works because the Python interpreter allows you to be reckless with array indexing. Elm doesn't let you be reckless, and so Elm has no equivalent way to perform array lookup. If you tried to write the same function in the same way in Elm, the result in the compiler's eyes would not be a number but a Maybe number - AKA "either a number or Nothing". (Nothing is roughly equivalent to None in python or null in many other languages, but you have to explicitly flag when it's allowed.) When you actually run this code, you will always get a number and never Nothing. But the compiler can't prove that.

(Again, I stress that you will never get Nothing as long as your sort function always returns a list of the same length it started with. That's something you can prove for yourself, but it's not something the Elm compiler can prove. It's not even the sort of thing the Elm compiler knows can be proven. And so in turn, it can't prove that you'll never get a Nothing here.)

And then the Elm compiler would force you to account for the possibility of Nothing, even though there's no way that possibility could occur at runtime. One option is to pick an arbitrary result that will never be exposed. That works fine until the code goes through several layers of changes, an assumption that used to be true is now violated, and suddenly that arbitrary result is wreaking havoc elsewhere. Or in Haskell, your program is crashing at runtime.

To be clear, that's not significantly worse than what we get in Python, where the code can also go through several layers of changes that result in it crashing at runtime. But we were hoping for better.

And in this case "better" is easy enough, you can just write your function to avoid indexing into a list, and then it can return a number with no need for trickery. The point isn't that you can't do the thing. The point is that (a), even if the thing is safe, the compiler might not know that; (b), if you decide it's safe anyway and find some way to trick the compiler, the compiler no longer protects you; and (c), if you want to do it in a way the compiler knows is safe, you might need to put in some extra work.

For another example, HM type systems can't implement heterogenous lists. So this is really easy in python:

def stringify_list(l):
    return [ repr(x) for x in l ]

                ["here's a", "nested list", {"and": "maybe", "a": "dict"}],
                "it can even be passed itself, like so:",

but impossible in Elm. You can sort of get the same effect by creating a type with many constructors

type HeteroType = HTInt Int
                | HTString String
                | HTBool Bool
                | HTList (List HeteroType)
                | ...

but it's not quite the same, because it can only accept types you know about in advance. Also, it's a massive pain to work with.

For a third example: Haskell is known for its monads. But Elm has no equivalent, because an HM type system can't support generic monad programming. You can implement the generic monad functions for specific cases, so there's Maybe.map and List.map, but there's no equivalent of Haskell's fmap which works on all monads.

Hindley-Milner type systems

I've talked about the tradeoffs that HM type systems offer, but not what HM type systems actually are. So here is where I get particularly reckless.

This bit is more formal than the rest. It's based on the treatment at wikipedia, but I've tried to simplify the notation. I'm aiming for something that I would have found fairly readable several months ago, but I no longer have access to that version of me.

Also, this part is likely to make more sense if you're familiar with at least one HM-based language. That's not a design feature, I just don't trust myself to bridge that inferential gap.

For an HM system, you need a language to run type inference on, and you need types to run type inference with, and you need some way to combine the two. You could use the language with no type inference, if you didn't mind crashes or weird behaviour at runtime, when you made a mistake with typing. (Haskell allows this with a compiler option.11) And you could run type inference without caring about the semantics of the language, treating it as essentially a SuDoku, an interesting puzzle but meaningless. (Haskell supports this, too.) But by combining them, the semantics of the language are constrained by the type system, and runtime type errors are eliminated.


Types come in a conceptual hierarchy which starts with type constants. That's things like, in Elm, Int, Float, Bool, String, Date, (). It also includes type variables, which in Elm are notated with initial lower-case, like a and msg. (Though the type variables number, comparable and appendable are special cases that I won't cover here.)

Next in the type hierarchy is applied types. Here a "type function" is applied to arguments, which are type constants and/or other applied types. These are things like List Int, Maybe (List Float), Result () Date, and a -> String. (In that last one, the type function is the arrow; Haskell would allow you to write it (->) a String. Also, (->) is the only type that HM specifically requires to exist.) Notably, an applied type must have a specific named type function as its root; you can't have m Int, which you would need for generalised monads.

Type constants and applied types are monotypes. You get a polytype by optionally sticking one or more "∀"s in front of a monotype. So for example a -> Int is a monotype, but ∀a. a -> Int is a polytype. So is ∀a. ∀b. a -> Int -> b, which is written equivalently as ∀a b. a -> Int -> b. ∀b. a -> Int is also a polytype; since the quantified variable doesn't show up, it's equivalent to the monotype a -> Int. We can do something like that to any monotype, so for simplicity we might as well decide that monotypes count as a special case of polytypes, not as a distinct set.

Type signatures in Elm typically have an implied "∀" over whichever variables it makes sense to quantify. (There's no syntax for explicitly writing the "∀".) So the type of List.map would be written

map : (a -> b) -> List a -> List b

but I'll be writing

map : a b. (a -> b) -> List a -> List b

for clarity. Because there's one place where Elm doesn't give an implied ∀, which is when you have scoped types. To demonstrate by example,

const : a b. a -> b -> a
const x = let foo : b -> a
              foo y = x
           in foo

const has a polytype here, but foo has a monotype, because (in context) its argument type and return type are constrained. If you tried to swap a and b in the type signature for foo, or rename either of them, the Elm compiler would complain.


The language has four kinds of expression, and each has a rule relating it to the type system. You need variables and constants, function calls, lambda expressions, and let statements.

Variables and constants

Variables and constants are things like True, 0.2, Just, "Hello", [], (), List.map. Each of these has a declared type, which in Elm is notated with :. So True : Bool, 0.2 : Float, Just : ∀a. a -> Maybe a, "Hello": String, [] : ∀a. List a, () : (), List.map : ∀a b. (a -> b) -> List a -> List b.

The rule that relates these to the type system is that type declarations imply type judgments. Mathematically it looks like

$$ \frac{x : π \quad π ⊑ μ}{x \sim μ}. $$

Reading clockwise from top left, this says: if you have a variable $x$ declared to have some polytype $π$, and if the monotype $μ$ is a specialisation of $π$, then $x$ can be judged to have type $μ$. ($π$ always denotes a polytype, and $μ$ always denotes a monotype.)

A type judgment, as opposed to a declaration, provides a type that an expression can be used as. A judgment is always as a monotype.

And type specialisation, denoted $⊑$, is the process of replacing quantified variables with less-quantified ones. So for example the type ∀a b. a -> b -> a might be specialized to ∀a. a -> String -> a, or to ∀b. Int -> b -> Int; and from either of those, it could be further specialised to Int -> String -> Int. Of course String -> Int -> String and List Float -> (Float -> String) -> List Float are valid specialisations too.

Thus: we have the type declaration [] : ∀a. List a, and we have (∀a. List a) ⊑ List Int, and so we can form the type judgment [] ~ List Int. We also have (∀a. List a) ⊑ List String, and so [] ~ List String. And [] ~ List (List (Maybe Bool)), and so on.

Function calls

Function calls are things like not True, (+ 1), List.Map Just. And the rule relating them to the type system is that function calls consume function types. This is the simplest of the rules. Mathematically it looks like

$$ \frac{f \sim μ → μ' \quad v \sim μ}{f v \sim μ'}. $$

Or: if $f$ can be judged to have a function type $μ → μ'$, and $v$ can be judged to have type $μ$, then the function call $fv$ can be judged to have type $μ'$.

Thus: we can infer the type judgment toString ~ (Int -> String), and we can infer 3 ~ Int, and so we can infer toString 3 ~ String.

Also, we can infer List.map ~ ((Int -> Maybe Int) -> (List Int -> List (Maybe Int))), and we can infer Just ~ (Int -> Maybe Int). So we can infer List.map Just ~ (List Int -> List (Maybe Int))

Lambda expressions

Lambda expressions are things like \x -> Just x, and in Elm they're used implicitly when something like const x y = x is turned into const = \x -> \y -> x. The type system rule is that lambda expressions produce function types. Mathematically:

$$ \frac{x : μ ⇒ e \sim μ'}{λx.e \sim μ → μ'}. $$

Or: suppose that the type declaration $x : μ$ would allow us to infer the judgment $e \sim μ'$. In that case, we could judge that $λx.e \sim (μ → μ)'$.

Typically $e$ would be some expression mentioning the variable $x$, but it's no problem if not. In that case, if you can get $e \sim μ'$ at all, you can get it assuming any $x : μ$, and so you have $λx.e \sim (\mathtt{Int} → μ')$ and $λx.e \sim (\mathtt{String} → μ')$ and $λx.e \sim (\mathtt{Result String (List (Maybe Float))} → μ')$ and so on.

Thus: given the declaration x : Int, we can infer the judgment [x] ~ List Int. And so we can infer the judgment (\x -> [x]) ~ (Int -> List Int).

Let expressions

Let expressions read like let x = y in a. Semantically, this is very similar to using a lambda expression, (\x -> a) y. But HM treats them differently in the type system, allowing a let expression to introduce polytypes. That permits code like

let f x = [x]
in (f "", f True)
-- returns ([""], [True])

If you tried to rewrite this as a lambda, you would get

(\f -> (f "", f True))(\x -> [x])

But type inference fails here, because there's no monotype declaration for f that allows a type judgment for (f "", f True). So the precondition for the lambda rule never obtains, and so in turn, no type judgment can be made for the expression \f -> (f "", f True).

Let expressions compensate for this deficiency, with the rule let expressions are like polymorphic lambda applications. (I don't have a good name for it.) Mathematically:

$$ \frac{a \sim μ \quad x : \bar{μ} ⇒ b \sim μ'}
        {(\mathtt{let}\ x = a\ \mathtt{in}\ b) \sim μ'} $$

Or: suppose that $a$ can be judged to have type $μ$, and that the declaration $x : \bar{μ}$ would allow us to infer the judgment $b \sim μ'$. In that case, we could judge that $(\mathtt{let}\ x = a\ \mathtt{in}\ b)$ has type $μ'$.

This introduces the notation $\bar{μ}$, which generalises a monotype to a polytype. How it works is: if $μ$ mentions a type variable $a$, and $a$ isn't quantified over in the surrounding context, then $\bar{μ}$ contains a "$∀a$".

Thus: we can infer (\x -> [x]) ~ (a -> List a), where a is a type variable unused in the surrounding context. That type generalises to ∀a. a -> List a. And given the declaration f : ∀a. a -> List a, we can infer (f "", f True) ~ (List String, List Bool). So in total, we can infer

$$ (\mathtt{let\ f\ x\ =\ [x]\ in\ (f\ "",\ f\ True)})
   \sim \mathtt{(List\ String,\ List\ Bool)}. $$

(It seems a little strange to me that the approach here is to first construct a meaningless type, and then quantify over it. Still, that's my understanding. It's of course possible I'm mistaken.)

Why do we need both let and lambda? Well, we can't replace lambda expressions with let expressions: they're not re-usable. (When you translate a let expression into a lambda expression, you actually generate a lambda applied to an argument. There's no way to translate a lambda expression by itself into a let expression.) Meanwhile, I'm not entirely sure why we can't make lambdas polymorphic in the same way let expressions are. I think the answer is that if we tried it, we'd lose some of the legibility that HM offers - so let can be more powerful in the type system because it's less powerful in the language. But I'm not sure exactly what legibility would be lost.


There's an interesting thing about the system I just described: it may or may not be Turing complete.

The problem is that there's no specified way of doing recursion. A function can't call itself, and it can't call any other function that can call it.

But a fixed-point combinator allows recursion, and might be included in the initial set of variables. Failing that, the proper recursive types can be used to define one. (Elm and Haskell allow us to define such types12.)

Failing both of those, we can introduce a new kind of expression

$$ \frac{x : μ ⇒ a \sim μ \quad x : \bar{μ} ⇒ b \sim μ'}
        {(\mathtt{letrec}\ x = a\ \mathtt{in}\ b) \sim μ'}. $$

This is much the same as let, but makes the variable x = a available when evaluating a. It's only available as a monotype when evaluating a, and still doesn't get generalised to a polytype until evaluating b.

(Elm and Haskell provide letrec as let and don't provide simple let at all.)

But if an HM language doesn't provide the appropriate variables or types, and doesn't implement letrec or something similar, it won't be Turing complete. Legibility gain, expressivity cost.

Wrapping up

And modulo some small details, that's the entirety of a Hindley-Milner type system. If you have a language with those features, and a suitable set of types, you can perform type inference.

What we have is a set of rules that allows us to construct proofs. That is, if we look at a program written in this language, we would be able to construct a proof of its type (or lack thereof). But I already said HM is better than that: it lets us mechanically construct a proof, in (roughly) linear time.

I confess, I'm not entirely sure how to do that. The outline is obvious, recurse down the parse tree and at each step apply the appropriate rule. But since a constant can be judged as one of many types, you need to keep track of which types are acceptable. Wikipedia hints at how it works, but not in a way that I understand particularly well.

Elm and Haskell both support many things not covered so far. To look at some of them briefly, and occasionally getting even more recklesss,

  • It seems obvious, but both allow you to evaluate the language, something I haven't touched on much. And it does need to be touched on, because there's more than one way to do it. Haskell uses a lazy evaluation model, while Elm is strict.

  • Both have ways to introduce new types. That doesn't change what we've seen, but it does separate the languages into two parts. One part describes the types used in a program and one part implements the semantics of a program.

  • Both also support case statements along with destructuring, like

    mHead : Maybe (List a) -> Result Bool a
    mHead ml = case ml of
        Just (a::_) -> Ok a
        Just _ ->     Err True
        Nothing ->    Err False

    To implement these, you'd want to add a fifth class of language expression. But I think it would be possible in theory to write a "thin" first-pass compiler to translate these statements into the existing language. By "thin" I mean to do this in such a way that we don't lose any of the legibility guarantees we care about.13 (For example, if this compiler turned $n$ bytes of code in a case statement into more than $O(n)$ bytes of code in the base language, or if it ran in more than O(n) time, this condition would fail.)

    If I'm right about that, then case statements neither make the language more expressive nor less legible, at least in one important sense.

  • (By comparison, if-then-else statements are also another class of language expression, but one which can obviously be thinly compiled down to the existing ones.)

  • In the type system, Elm supports record types, which are a lot like tuples but with nicer syntax. I believe these too could be thinly compiled down. But it also supports extensible records, which are more complicated. On one level you can think of a type like {a | x : Int, y : Int} like a tuple ∀a. (a, Int, Int). But then this tuple needs to be unpacked and manipulated when you pass it into a function expecting an {a | x : Int}.

    I believe this is unresolvable, and extensible records represent an extension of Elm from HM. (But one with fairly minor legibility costs, in comparison to the expressiveness gains.)

  • Haskell supports typeclasses, which are a way of allowing functions to operate on multiple different types. (For example, the show function can be applied to a String, an Int, a (), a [Float], ….) Elm doesn't, but simple typeclasses can be emulated with only a little added verbosity.

Another thing I'll say is that I've been talking about legibility and expressivity of a language. But a type system is itself a language, and may be more or less legible and expressive. I don't have a strong intuition for how these interact.

There's a lot more I could add to this post. Some things that I omitted for brevity, some that I omitted because I don't know enough about them yet14, and some that I omitted because I don't know about them at all. I don't know what a sensible cutoff point is, so I'm just going to end it here.

From writing my original talk, and subsequently this blog post, I think I understand HM type systems much better than I used to. Hopefully you think the same. Hopefully we're both correct. If you see any inaccuracies, please point them out.

  1. While writing this essay I came across the talk Constraints Liberate, Liberties Constrain. From the title and the context I encountered it, it sounds like it's on the same subject. But I haven't watched it, because it's in the form of a 50 minute video. 

  2. If the halting problem is decideable on a language, the language is Turing incomplete. I don't know whether the reverse is true: are there Turing incomplete languages on which the halting problem is still undecideable? I'm mostly going to assume not. At any rate, I don't think I'm going to discuss any such languages. 

  3. To nitpick myself: these aren't just languages for which you can prove termination, they're languages which never terminate, at least not for finite inputs. I don't offhand know any languages which are Turing incomplete but have the ability to loop forever, though such a thing can exist. 

  4. Specifically, it looks to me like SQL-99 without recursive common table expressions is Turing incomplete. I've only ever used nonrecursive CTEs. 

  5. I've subsequently discovered that wikipedia uses the same name for this concept. 

  6. I think this is related to the way that ZF set theory can encode Peano arithmetic. Thus, ZF is more expressive than PA. But because ZF allows you to construct objects that PA doesn't, there are more things you can say about "all objects in PA" than about "all objects in ZF". So PA is more legible than ZF. I don't understand the Curry-Howard correspondence, but I think that's related too. 

  7. "My company" is a phrase which sometimes means "the company I own or run" and sometimes "the company I work for". Here it means the latter. I don't know an unambigous way to phrase that which I don't find slightly awkward, so instead I'm using a super-awkward footnote. But, y'know. Ironically, or something. 

  8. We use Elm 0.18. 0.19 is a fairly significant version change, but I think not different enough to be relevant for this post. 

  9. Specifically GHC, which offers many extensions over Haskell proper. Whenever I refer to Haskell, I'm really talking about the language that GHC implements. 

  10. At any rate, it works fine when you pass it a number. If you pass it something else, it might do anything. 

  11. Well, sort of. It still performs type inference, it just allows it to fail. I'm not sure if "no type inference at all" would work for Haskell; but I do think it would work for a pure HM system, if you count things like "3 is of type Int" as a raw fact, non-inferred. 

  12. Minor brag: I myself contributed the Elm implementation on that page. 

  13. I think it might look something like this:

    mHead ml =
        if *isJust ml && (*fromJust ml (\_x -> *isCons _x)) then
            *fromJust ml (\_x -> *fromCons _x (\a _ -> Ok a))
        else if *isJust ml then
            *fromJust ml (\_ -> Err True)
        else if *isNothing ml then
            Err False

    functions marked with a * can be hidden from the language user. Additionally, *fromJust, *fromCons and *fail would be able to throw runtime errors. These don't violate Elm's "no runtime errors" policy, because the compiler would only generate them in contexts where it could prove they wouldn't throw. (In the case of *fail, when it could prove that code branch was unreachable, so it could also just not bother.)

    I'm very much spitballing here. I wouldn't be surprised to discover that the approach I've described is completely unworkable. 

  14. Not that that stopped me from writing this entire post. 

Posted on 05 May 2019 | Comments

"Now here's why I'm punching you..."

Related: be nice, at least until you can coordinate meanness.

A premise of this post is that punching people is sometimes better than the alternatives.

I mean that literally, but mostly metaphorically. Things I take as metaphorical punching include name calling, writing angry tweets to or about someone, ejecting them from a group, callout posts, and arguing that we should punch them.

Given that punching people is sometimes better than the alternatives, I think we need to be able to have conversations about when "sometimes" is. And indeed we can and do have those conversations. Many words have been spilled on the subject.

But I think it's probably a good idea to try to avoid having those conversations while actually punching people.

Here's what I mean. Alice thinks that punching Bob is better than the alternatives. But she thinks that if she just starts punching, Carol and Dave and Eve might not understand why. Not even if she tells them what Bob has done. She thinks punching Bob is better than the alternives, but she thinks the reasons for that are slightly complicated and haven't previously been articulated very well, at least not in a way that makes them common knowledge.

So she writes an essay in which:

  1. She proposes a theory for when punching people is better than the alternatives. (She readily admits that the theory is not complete, nor is it intended to be, but it covers part of the space.)

  2. She describes the situation with Bob, and how the theory justifies punching him.

  3. She punches Bob.

I think this could be a mistake. I think she should maybe split that post into at least two parts, published separately. In the first part, she proposes the theory with no mention of Bob. Then, if Carol and Dave and Eve seem to more-or-less agree with the theory, she can also publish the part where it relates to Bob, and punch him.

I think this has a few advantages.

  • Suppose Alice can't convince anyone that the theory holds. Then Bob is kept out of things entirely, unless Alice wants to go ahead and punch him even knowing that people won't join in. In that case, people know in advance that Alice is punching under a theory that isn't commonly subscribed to.

  • Suppose the theory is sound, and also justifies punching Fred. Then someone can link to the theory post separately, without implicitly bringing up the whole Bob thing. This is especially good if the theory doesn't actually justify punching Bob, but it's somewhat good regardless.

  • Suppose Bob disagrees with some part of the argument. When he gets punched, he's likely to be triggered or at least defensive. That's going to make it harder for him to articulate his disagreement. If it comes split up, the "thing he has to react to while triggered" may be smaller. (It may not be, if he has to react to the whole thing; but even then, he may have seen the first article, and had a chance to respond to it, before getting punched.)

  • Suppose that splitting-things-up like this becomes a community norm. Now, if Alice just wants to come up with excuses to punch Bob, it's harder for her to do that and get away with it, harder for her to make it look like an honest mistake.

It might seem even better to split into three posts: theory, then application ("and here's why that justifies punching Bob"), and then wait for another post to actually punch him. But since "arguing that we should punch Bob" is a form of punching Bob, splitting those two out isn't necessarily possible. At best it would be "theory, then application and mild punching, then full-strength punching". It's more likely to be worth it if there's a big difference between the two levels. "Here is why I think I should kick Bob out of the group" is considerably weaker than "I hereby kick Bob out of the group". But "here is why I think you all should stop trusting Bob now" is not much weaker than "you all should stop trusting Bob now".

However, I don't think this is always unambiguously a good thing. There are some disadvantages too:

  • The initial post is likely to be drier, less compelling, without concrete examples.1 And perhaps harder to evaluate, especially for less abstract thinkers.
  • You can't really remove the initial post from its context of "Alice thinks we should punch Bob". You can hide that context, but that doesn't remove its influence. For example, if there are cases similar to Bob's that would be covered by the same theory, Alice's post is likely to gloss over the parts of the theory that relate to them-but-not-Bob, and to focus too much on the parts that relate to Bob-but-not-them.

  • Suppose the theory is sound, but the facts of the case don't support punching Bob. Splitting the posts adds more opportunity for sleight-of-hand, such as using a term to mean different things in different places. This would be harder to notice in a split post than a monolithic post, if each part is internally consistent.

  • It may be harder to write this way, which may cause some better-than-the-alternatives punching to go unperformed.

  • It's slower. Sometimes that's probably neutral-to-good. But often, if punching someone is better than the alternatives, it's because they're currently hurting other people. If punching them will make them stop, then ideally we want to punch quickly.

I'm not sure how all these factors really shake out, and I expect it'll vary from case to case. So I don't want to offer a blanket suggestion. I think my advice is: if you're thinking of writing one of those all-in-one posts, consider splitting it up. It won't always be the right thing to do, but I think it's an option to bear in mind. Here are some questions to ask that might sway you in one direction or the other:

  • How hard are you punching?2 If someone googles Bob, will they find your punches? (At least once, Scott Alexander used a pseudonym for a person he was punching; this seems like a useful tool.)
  • If the punching is delayed, does anything bad happen?

  • Does the theory apply more generally than it needs to for this specific case? Thinking of similar cases might help, especially real ones but also fictional. (If you can think of lots of real cases, the value of having a reference post for the theory goes up, and its value as a reference post goes up if it has less baggage.)

(As an aside: I want to note that a post which looks like an all-in-one might not be. It may be recapping previously established theory. Common knowledge is rarely absolutely common, so I suspect this will usually be a good idea.)

  1. See for example, this post. (Though the reason I don't have examples here is different. My motivating example hasn't been written yet3, and I didn't go looking for others. Still, I expect the effects of not having examples are similar.) 

  2. And not just you personally, but your audience. If your audience is large and vicious, then no matter how gently you yourself punch someone, they're going to experience a lot of pummelling. 

  3. And there's a decent chance it won't ever, given my track record. 

Posted on 16 October 2018 | Comments

The Mathematics of Matched Betting

I've been doing a little bit of matched betting lately1. The idea is that you place two opposite bets on the same event, calibrated so that your profit will be the same no matter which bet wins. If you do this entirely with your own money, your profit will (under reasonable assumptions) be negative. But bookmakers often offer free bets; you can use matched betting to extract most of the amount of that free bet as actual money.

This post isn't advice about how to get into matched betting. That market is probably saturated; if you want to learn, I used this guide and it served me well. (However, if anyone is inspired to try it by this post, I have a referral code for smarkets: safto14. It looks like if you sign up with that and bet $£20$, both you and I will receive $£10$ in risk-free bets. I swear that's not why I'm posting this. I might not even get around to using it.)

(Um, but after that I feel obliged to give at least some safety information. So here goes: if you're not a UK citizen currently in the UK, this may be a bad idea. Don't use credit cards to deposit funds; it seems they interpret that as a cash transaction and charge fees. Start small; that way there's less at risk in case you do something silly like use a credit card to deposit funds. Probably don't expect to earn lots of money this way, either in total or per-hour.)

Instead, I want to go into the math behind it, in more depth than I've seen in other places. None of this math is complicated, but some of it is useful, and I haven't found it anywhere else. (I even deliberately went looking.)

A simple example

(If you've seen one example of matched betting, you've seen them all, and you can skip this particular one.)

You have a $£10$ free bet at a bookmaker. You find a football game, say Manchester Utd versus Liverpool, that you want to bet on. The bookmaker offers odds of $4$ on Liverpool, and you bet your $£10$ on them.

(A note on odds: the usual convention in gambling seems to be to use decimal odds. Odds of $x$ mean that your potential winnings are $x-1$ times your stake. Thus, odds of $4$ mean a bet of $£10$ has the potential to pay out $£30$. If you're used to odds notated $a:b$ or (equivalently) $a/b$, then the decimal odds are given by $a/b + 1$.)

So if Liverpool wins, you'll earn $£30$; if they lose or draw, you lose nothing. You then look up the same match at a betting exchange. An exchange allows you to take both sides of a bet, which a bookmaker won't. The exchange offers odds of $4.3$ to lay Liverpool; this means that you win your bet in the exchange only if Liverpool doesn't win. You accept a stake of $£6.98$, which means your own stake is $£23.03$.

Now if Liverpool wins the match, the bookmaker pays you $£30$ and you lose $£23.03$ in the exchange, for a net profit of $£6.97$. And if Liverpool loses, you earn $£6.98$ in the exchange and lose nothing at the bookmaker, for a net profit of $£6.98$. You've turned a $£10$ free bet into almost $£7$ of actual money.

(I'm ignoring for now the commission that the exchange will usually collect when you win a bet on them. With $2\%$ commission, you would instead accept stakes of $£7.01$, wagering your own $£23.13$; if Liverpool doesn't win, you would earn $£7.01 · 0.98 = £6.87$, which is also what you'd earn if Liverpool does win.)

Before bookmakers will give you a free bet, you'll usually have to place a bet with them using your own money. You lose a small amount of money on this bet, but you can use the same principles to ensure that you lose the same amount no matter who wins. You might lose around $£0.50$ on a $£10$ qualifying bet, in which case you end up with around $£6.50$ profit when all's said and done.

This has been a very brief introduction to matched betting. Now, into the math. I'm going to be focusing on two kinds of bet: qualifying bets, which are usually known as just bets, and free bets, where you don't lose anything if your back bet loses. I'm also going to ignore rounding; let's just pretend that the sterling is infinitely divisible.

Some definitions

We can think of a paired bet as having six parameters, $(O_b, O_l, S_b, S_l, C_b, C_l)$. These are three parameters for each of a pair of back and lay bets.

$O_b, O_l ≥ 1$ are the odds on the back and lay bets. It's typically safe to assume $O_b < O_l$; otherwise, modulo commission, you could make a profit even on your qualifying bets2. They can't be less than $1$ because we're using decimal odds; that would correspond to a probability below $0$.

$S_b, S_l ≥ 0$ are the stakes on the back and lay bets. Note that $S_l$ is the stake offered by the other party to your lay bet; it's (roughly) the amount you stand to win on that bet, not the amount you stand to lose. This may seem strange, but it's the convention used.

And $C_b, C_l ∈ [0, 1]$ are the commission charged on your winnings on each side. Usually $C_b = 0$: bookmakers don't charge commissions, they make money by offering low odds. The two exchanges I've used have $C_l = 2\% = 0.02$ (Smarkets) and $C_l = 5\% = 0.05$ (Betfair).

I'm also going to introduce the symbol $C^+ = (1 - C_l)(1 - C_b)$. If you passed $£1$ through your bookmaker and exchange, and they each charged commission and nothing else, you would have $£C^+$ left at the end. $C^+$ isn't enough for us to fully understand a matched bet, we need the individual back and lay commissions as well, but it'll be convenient shorthand.

Now let $R_{xy}$ (where $x,y ∈ \{b,l\}$) be your return on side $y$ if your bet on side $x$ wins. So for a qualifying bet (which is just a regular bet, outside the context of matched betting), we have:

$$ \begin{align*}
     R_{bb} &= S_b (O_b - 1) (1 - C_b)     \\
     R_{bl} &= - S_l (O_l - 1)             \\
     R_{lb} &= - S_b                       \\
     R_{ll} &= S_l (1 - C_l).
\end{align*} $$

For a free bet, the only change is

$$ R_{lb} = 0. $$

So your profit is $R_{bb} + R_{bl}$ if your back bet wins; and $R_{lb} + R_{ll}$ if your lay bet wins.

And now we can say that a matched bet is simply a paired bet, where your profit is the same in either case. I won't need to talk about simple paired bets from now on; all bets are matched. When I talk about a "free bet" or "qualifying bet", those are matched bets too.3

Incidentally, the six prameters are over-determined. Most commonly we want to learn $S_l$ given the other five; but knowing any five of them will fix the value of the sixth.

Optimal lay stake

The first question we'll ask is, given $O_*$, $C_*$ and $S_b$, what must $S_l$ be to make our bet a matched bet? Or in other words, what $S_l$ should we choose to eliminate all risk?

We need

$$ R_{bb} + R_{bl} = R_{lb} + R_{ll} $$

which after some (perhaps silly-seeming) substitution and rearrangement gives

$$ S_l = { R_{bb} - R_{lb} \over R_{ll}/S_l - R_{bl}/S_l }. $$

This looks circular, but when we substitute for the values of $R_{**}$, $S_l$ disappears from the right hand side. For a qualifying bet, this gives

$$ S_l = S_b { (O_b - 1)(1 - C_b) + 1 \over O_l - C_l }, $$

and for a free bet,

$$ S_l = S_b { (O_b - 1)(1 - C_b) \over O_l - C_l }. $$

A thing to note here is that $O_l$ and $C_l$ only appear in the term $O_l - C_l$. In other words, the effect of lay commission is to decrease the effective lay odds in the most natural way. It would be nice if this happened in other contexts too, but unfortunately I haven't come across it. The $O_l - C_l$ term is common, but it's usually accompanied by another $O_l$ and/or $C_l$ somewhere else in the expression.


Next, we'll want to know how much profit we make. This is given by $R_{lb} + R_{ll}$, where we calculate $R_{ll}$ using the lay stake we just found. But since both of these terms are proportional to $S_b$, we'll find it more convenient to think in terms of profit per unit of back stake,

$$ P = { R_{lb} + R_{ll} \over S_b }. $$

Under a qualifying bet, this is

$$ P_q = C^+ { O_b + C_b/(1 - C_b) \over O_l - C_l } - 1, $$

and for a free bet, it's

$$ P_f = C^+ { O_b - 1 \over O_l - C_l }. $$

We can look at these functions graphically:

(Graph of P_q(O_b, O_l)) (Graph of P_f(O_b, O_l))

(all images link to larger versions)

each line represents a contour of the function, a set of points that all have the same profit. The sets of contours look superficially similar, but they're generally steeper for a free bet, and they get cut off on the bottom edge instead of the left edge. In both cases, profit increases with $O_b$ and decreases with $O_l$.

We can reparameterise in terms of $O_b$ and $σ = O_l - O_b$, the spread between the back and lay odds. Since $O_l ≥ O_b$, we only need to consider $σ ≥ 0$. This gives us

$$ \begin{align*}
    P_q &= C^+ { O_b + C_b/(1 - C_b) \over O_b + σ - C_l } - 1  \\
    P_f &= C^+ { O_b - 1 \over O_b + σ - C_l }.
\end{align*} $$

(Graph of P_q(O_b, σ)) (Graph of P_f(O_b, σ))

These are slightly more distinct. Looking at these graphs, it seems that for a qualifying bet, having low $σ$ is more significant than having high $O_b$; but for a free bet, having high $O_b$ is more significant than having low $σ$. If so, that suggests you might want to be looking for different sorts of bets at each stage. (It is so, and we'll make it more precise later.)

We can also look at $P_f - P_q$, the difference in profit between a qualifying bet and a free bet. This isn't particularly useful to compare bets: you place qualifying bets to get free bets, and you place free bets to get money, and if you're doing pure matched betting, I don't think you'll ever be asking yourself should I place this bet free or as a qualifier? Still, the difference is

$$ P_f - P_q = 1 - { (1 - C_l)(1 - 2C_b) \over O_l - C_l }. $$

The more $O_l$ grows, the worse a qualifier becomes relative to a free bet. This is another suggestion that you should be looking at different sorts of bets for your qualifiers and your free bets.


One more thing is important when making a matched bet: lay liability. This is how much you stand to lose on the exchange where you make your lay bet. (It's only important for boring real-world reasons like liquidity and the possibility of doing something stupid, but those are still important.) You need to have this much money in your account at the exchange, which means you need to be able to spare it from your bank account for a week or so. Low-liability bets are also safer if something goes wrong, which makes them a good choice for early dabblers in matched betting.

Liability is simply given by $-R_{bl} = S_l (O_l - 1)$, which is

$$ S_b (O_l - 1) { (O_b - 1)(1 - C_b) + 1 \over O_l - C_l } $$

for a qualifying bet and

$$ S_b (O_l - 1) { (O_b - 1)(1 - C_b) \over O_l - C_l }  $$

for a free bet.

(Graph of liability(O_b, O_l), qualifying bet) (Graph of liability(O_b, σ), free bet)

(I made the graphs in $σ$ as well, but they're too boring to include inline)

Unlike profit, liability increases with both $O_b$ and $O_l$. But it increases arbitrarily with $O_b$, and asymtotically with $O_l$; it's bounded above by roughly $S_b O_b$ for a qualifying bet and $S_b (O_b - 1)$ for a free bet. (If the graphs were extended further, as they stretch upwards the lines would become ever more vertical, but they'd always stay separate. To the right, the lines would become ever more horizontal, all of them converging on $O_l = 1$.)

Improving on a free bet

Matched bet calculators aren't hard to find, and what I've given so far is nothing that they can't do for you. But they don't tell you everything you might want to know. Let's look at a bet, and see how we might find a better bet. Since the two types have different behaviours, we'll treat them separately.

To maximise profit, we usually need to consider that $S_b, C_b$ and $C_l$ are fixed, and find the dependence of $P$ on $O_b$ and $O_l$. For a free bet, that means we want to maximise the term

$$ P_f ∝ {O_b - 1 \over O_l - C_l}. $$

This tells us a few things. The first is that we want high back odds and low lay odds. We already knew that, and it's not very helpful; we expect back and lay odds to more-or-less rise and fall together. It also tells us that adding a constant to both odds will increase profit; odds of 5 and 6 will be better than odds of 4 and 5. (This, too, we could have deduced before; or we could have seen it on the graph of $P_f(O_b, σ)$.)

But consider what happens when $σ = 0$. Then the term in question is

$$ { O_b - 1 \over O_b - C_l } $$

which, as $O_b$ ranges from $1$ to $∞$, takes all values in $[0, 1)$. But when $σ > 0$, the possible values are exactly the same; high $σ$ changes the $O_b$ that gives you any particular profit, but it doesn't make any profit value available or unavailable.

What that means is: given any free bet, we can construct another free bet with equal profit but $σ = 0$, not changing $S_b$ or $C_*$.

Or: given odds $O_b, O_l$, we can calulate the odds $O'$ that would give you the same profit, if you could find these odds for both a back and a lay bet.

In turn, that tells you that if you want to improve your profits, you can ignore bets with $O_b < O'$. (Because for those bets, $P_f(O_b, σ) < P_f(O', σ) ≤ P_f(O', 0)$. The first inequality comes from adding a constant to both odds, and the second comes from reducing $O_l$.) This is a useful thing to know, that matched bet calculators don't tell you.

To find $O'$, we set

$$ { O_b - 1 \over O_l - C_l } = { O' - 1 \over O' - C_l } $$

and deduce

$$ \begin{align*}
    O' &= { O_l - O_bC_l \over 1 + O_l - O_b - C_l } \\
       &= O_b { 1 - C_l + σ/O_b \over 1 - C_l + σ }.
\end{align*} $$

The expression with $σ$ isn't exactly simpler, but I think it's more aesthetically pleasing. (Consider that $1-C_l$ is approximately as fundamental as $C_l$ itself.) Graphically:

(Graph of P_f(O_b, Ol)) (Graph of P_f(O_b, σ))

We can also calculate $O'$ simply as a function of profit, and vice versa:

(Graph of P_f(O'))

$$ P_f = C^+ { O' - 1 \over O' - C_l }                 \\
   O' = { C_lP_f - C^+ \over P_f - C^+ } $$

$P_f$ approaches an asymtote at $C^+$, but slowly. With $C_b = 0, C_l = 0.02$, extracting $80\%$ of a free bet is only possible if $O_b ≥ 5.36$. For $90\%$, you need $O_b ≥ 12.03$. Such bets are somewhat rare in my experience, and typically have high spread.

We can go more general. Given a profit, we can calculate the level curve of all bets which generate that profit; the case $σ=0$ gives us only a single point on that curve. The curve divides bet-space into two regions, so that it's easy to see whether a bet gives more or less than this amount of profit.

(Earlier we saw this level curve graphically, for certain specific profits. Now we find the explicit formula for the curve, which I secretly already used to draw the graphs.)

We already have

$$ \begin{align*}
    P_f &= C^+ { O_b - 1 \over O_l - C_l }    \\
        &= C^+ { O_b - 1 \over O_b + σ - C_l },
\end{align*} $$

and it's just a matter of rearranging these:

$$ O_b C^+ = P_f (O_l - C_l) + C^+\\
   O_b (C^+ - P_f) = P_f (σ - C_l) + C^+. $$

These two equations can be used to find $O_b$ in terms of $O_l$ or $σ$, and vice-versa. Both are very simple at heart: they're linear relationships, that could be rearranged to the form $y = mx + c$.

Looking more closely at the second one, notice that $C^+$ is the upper bound on profit. So the term $C^+ - P_f$ can be thought of as how much profit is being left on the table, compared to what you could hypothetically get if odds of $∞$ were a thing. The less profit you leave behind, the less $σ$ has to change to compensate for a given change in $O_b$. In other words, when profit is high, the level curve on the graph of $P_f(O_b, σ)$ becomes shallower, as we saw above.

Improving on a qualifying bet

For a qualifying bet, we can't quite do the same thing. If we temporarily assume $C_b = 0$, then the term we want to maximise is

$$ P_q + 1 ∝ {O_b \over O_l - C_l}. $$

This doesn't work the same as the equivalent term for a free bet. If you keep $σ$ fixed and consider profit as a function of $O_b$, then this function acts differently depending on the sign of $σ - C_l$. If $σ ≤ C_l$, then regardless of $O_b$ you get more profit than is ever possible with $σ > C_l$.

This isn't immediately practically important, because $σ > C_l$ is a pretty safe assumption. But it's mathematically significant. For a free bet, setting $σ$ to $0$ doesn't rule out any profit levels, so we could ask "how would we get this particular profit with $σ = 0$?" If we try to ask that for a qualifying bet, the answer is typically that we can't. So the approach we used for a free bet doesn't work on a qualifying bet.

We also can't set $O_b$ to its best possible value, because it can go arbitrarily high. But we can try setting it to its limiting worst value ($O_b = 1$). We find $σ'$ such that

$$ { O_b + C_b/(1 - C_b) \over O_b + σ - C_l }
   = { 1 + C_b/(1 - C_b) \over 1 + σ' - C_l }, $$

which gives us

(Graph of σ'(O_b, σ))

$$ σ' = { σ + (O_b - 1)(1 - C^+) \over 1 + (O_b - 1)(1 - C_b) }. $$

Now we know that any bet with a spread less than $σ'$ will give better profit than the bet we started with. Unfortunately, I think this still isn't as good as what we got for a free bet, for three reasons.

  1. For a free bet, we had an easy negative test: some bets (those with $O_b < O'$) could be ruled out on a glance, but verifying them took more work. Here, the test is positive: some bets (those with $σ < σ'$) can be accepted on a glance, but verifying the others takes more work.4

    In practice, I expect the positive test will almost alway be inconclusive, meaning you still need to do the more difficult check on every bet. (I haven't done enough betting myself, while writing this, to say from experience.)

  2. My workflow is to find a plausible-looking back bet and then see how it would be matched. With a free bet, I can run the easy test without looking for the match. For a qualifying bet, I need to find both sides of the bet before I can run the easy test.

  3. Qualifying bets often must be placed at a minimum odds (on the back side) in order to count. That typically rules out the lowest-spread bets (see below digression).

Still, this is what we have. Following a similar theme as before, we can calculate $σ'$ and $P_q$ as functions of each other:

(Graph of P_q(σ'))

$$ P_q = { 1 - C_l \over 1 - C_l + σ' } - 1 \\
   σ' = { 1 - C_l \over P_q + 1 } + C_l - 1. $$

(Note that these equations don't contain $C_b$. That's not because we're assuming it's $0$: when you set $O_b = 1$, $C_b$ completely disappears from the equation for $P_q$.)

Interestingly, the bounds of $P_q$ don't depend on commission at all. As $σ'$ grows, $P_q$ always approaches an asymtote at $-1$, which isn't surprising: you can't quite lose your entire back stake, but you can get arbitrarily close to that, even with no commission.

On the other edge of the graph, we always have $P_q(O_b=1, σ'=0) = 0$. (That may not be clear on this image, but it's easy to see algrebraically.) That's because at $O_b = O_l = 1$, both bets are completely one-sided. On the back side you have a chance of losing money, but no way to win it; on the lay side you have a chance of winning money, but no way to lose it. In particular, if the back bet wins, you make no profit or loss on either bet, so commission is irrelevant. And so the lay stake is calibrated for your lay winnings, after commission, to cancel out your back loss. (But if someone is willing to give you free maybe-money, you might as well ask for as much maybe-money as they're willing to give you.)

And again, given profit, we can calculate the level curve of bets which return that profit. Unsurprisingly, we find another linear relationship; it comes to

$$ O_bC^+ + C_b(1 - C_l) = (Pq + 1)(Ol - Cl) \\
   O_b(1 - C_b - Λ) + C_b = Λ(σ - C_l), $$


$$ Λ = { P_q + 1 \over 1 - C_l }. $$

I'm afraid I can offer no particular interpretation of what $Λ$ means, though I observe that we can substitute it into a previous equation, $σ' = 1/Λ + C_l - 1$. Note that if $Λ ≥ 1 - C_b$, equivalently if $P_q + 1 ≥ C^+$, then $σ$ and $O_b$ start to move in opposite directions: for fixed profit, $σ$ goes up as $O_b$ goes down. At this point, you get more profit with low $O_b$ as well as with low $σ$, which would be convenient if it was ever realistically going to happen.

(It turns out that $P_q + 1 ≥ C^+$ precisely when $σ ≤ C_l + C_b/(1 - C_b)$. I noted above that if $C_b = 0$, the possible values of $P_q + 1$ depend on the sign of $σ-C_l$. This is the same result, generalized to all values of $C_b$.)

A digression on odds

Note that in general, you can expect spread to be lower at lower odds. That's because odds are more sensitive to evidence when they're high than when they're low.

There's a technical interpretation of that, but I'm just going to illustrate by example. Consider the probabilities $1/5$ and $1/6$. These are complementary to $4/5$ and $5/6$ - the two pairs of probabilities encode the same information. Converting to decimal, probabilities $1/5$ and $1/6$ are decimal odds $1.25$ and $1.2$; and probabilities $4/5$ and $5/6$ are decimal odds $5$ and $6$.

So the difference between the odds $1.25$ and $1.2$ is, in a very important sense, the same as the difference between $5$ and $6$. But when it comes to betting, the spreads of $0.05$ and $1$ are very different.

The takeaway from this is that for qualifying bets, you should be looking at bets with low odds. High odds have better returns, but the effect of low spread is much more significant, and low spread comes with low odds.

The effects of commission

I want to explore one more question: how does profit depend on commission? For this, we'll keep $C_b$ and $O_b$ fixed, and explore how $O_l$ and $C_l$ affect profit.

Conveniently, the term we want to maximise is the same,

$$ \begin{align*}
    P_q + 1 ∝ { 1 - C_l \over O_l - C_l }  \\
    P_f ∝ { 1 - C_l \over O_l - C_l }.
\end{align*} $$

So if we find the same lay bet on two different exchanges, we can compare them without regard for the back bet we'd be matching.

The two exchanges I've used have $C_l$ of $0.02$ and $0.05$, so they give equal profits when

$$ { 0.98 \over O_S - 0.02 } = { 0.95 \over O_B - 0.95 } $$

where $O_S$ is the odds offered on Smarkets and $O_B$ is the odds offered on Betfair. This rearranges to

$$ 98·O_b = 95·O_S. $$

Since $98/95 ≈ 1.03$, it's better to use Betfair than Smarkets if the offered odds are roughly $3%$ lower, which happens to be the difference in commission. So for example, odds of $6$ on Betfair correspond to roughly $6.19$ on Smarkets.

It should be easy to take a bunch of equivalent bets on the two sites, and compare to see which seems likely to give better profit in general. I started doing that, but then I looked at three football games and got bored and stopped.

They all had exactly the same odds on all three positions (win/draw/win), even when they fluctuated slightly pre-game. (I did look at one game as it began, and the two sites didn't quite stay in sync then. But betting while odds are fluctuating a lot is a bad idea.) This suggests that Smarkets typically offers better profits. But Betfair is a more popular site, which probably has advantages; in particular, it's plausible that a large bet would be harder to fully match on Smarkets.

And that's it. There are plenty more interesting questions you could ask5, but I'm going to stop there.

Something that would be nice would be a calculator that can make use of this. The online calculators all seem pretty crap: they only tell you profit, lay stake and lay liability, and only for one bet at a time. Being able to compare bets seems like it would be a pretty important feature, but I haven't seen it anywhere. (Some of them have features to work with more complicated types of bets than I've looked at, but I don't care about that. Here's one that's no worse than any other.) I've also seen an excel calculator, which had the neat feature of automatically adding bets to a spreadsheet. But again, only one bet at a time; plus, I don't have excel, and don't keep track of my bets in a spreadsheet. (I use ledger, though it's not a great fit.)

I've written a command-line tool that can show you multiple bets at a time for comparison purposes. It also shows you, for a free bet, the lowest possible back odds to improve on your highest-profit bet ($O'$); or, for a qualifying bet, the highest possible spread that won't improve on it ($σ'$).

But the interface isn't very good, I think partly because of the limits of the command line and partly because of a questionable design decision (see the README). And it can't show graphs, which I think would be nice. If you want to use it anyway, it's on github.

If I were to dedicate more time to the project, I currently think I'd start again in Javascript. I think I have a vague idea of how a decent one could work. But right now, as far as I can tell there are no good calculators.

  1. Well, I haven't done any for a few months now. This post took a long time to write. 

  2. I think maybe sometimes bookmakers will offer improved odds as a special offer, and maybe then sometimes you'll find $O_b > O_l$. But it's not a case I'll particularly consider in this post. 

  3. I'm also assuming that all free bets are "stake not returned". In a standard bet, if you win, you get your stake back and also collect your winnings. If you win a stake not returned free bet, you collect your winnings; but you don't also get to keep the stake, because that wasn't yours in the first place. If you have a "stake returned" free bet, that effectively increases the odds on the back side by $1$. I've not yet encountered one of these, myself.

    Another type is the "risk-free" bet, which I won't go into here partly because I'm not 100% sure what it means. But I think that "£10 in risk-free bets" allows you to make a bet of more than £10, and if you lose, you get £10 back. I think the way to treat it is as putting £10 into a free bet (stake not returned) and the remainder of your stake into a qualifying bet, and so by default you should put in no more than the risk-free amount. 

  4. Shape of the "more profitable" space

    Another way to look at this is by the shape of the "more profitable" space on the graphs of $P_q(O_b, σ)$ and $P_f(O_b, σ)$, as seen here. On the $P_f$ graph, $O'$ carves up the "less profitable" space without entering the "more profitable" space. On the $P_q$ graph, $σ'$ carves up the "more profitable" space without entering the "less profitable" space. There's no equivalent of $O'$ for a qualifying bet, or of $σ'$ for a free bet. 

  5. A thing I'd like to explore at some point, is qualifying bets where you only win a free bet under certain circumstances. For example, you bet on a team to win a football match, and you win a free bet iff your team is winning at the end of both halves. I think you can still guarantee a profit with the right combination of bets, but there are more variables involved. (On the exchange, you lay your team to win, like normal. That way you turn a profit if they win both halves, and lose a small amount otherwise. So you also lay them to win both halves - that's not a standard win/lose/draw bet, but I think it's generally available. If you pick the right stakes, you can turn a profit in both cases. Though probably not exactly the same profit in both cases, because you can't easily predict exactly how much the free bet will be worth.) 

Posted on 02 June 2018 | Comments

Pareto improvements are rarer than they seem

this is surely not an original insight, but I haven't seen it before

A Pareto improvement is where you make one party better off and no parties worse off.

Suppose Adam has a rare baseball card. He assigns no intrinsic value to baseball cards. Adam likes Beth, and somewhat values her happiness. Beth collects baseball cards, and would happily pay $100 for Adam's card.

If Adam just gives Beth his baseball card, is that a Pareto improvement? Naively, yes: he loses the card that he doesn't care about, and gains her happiness; she gains the card. Both are better off.

But I claim not, because if Adam has the card, he can sell it to Beth for $100. He would much prefer doing that over just giving her the card. But if Beth has the card, he can't do that. He assigns no intrinsic value to the card, but he can still value it as a trading chip.

Now suppose Adam has the baseball card but Beth also has a copy of that card. Then Beth has less desire for Adam's card, so this situation also isn't a Pareto improvement over the original. By giving something to Beth, we've made Adam's situation worse, even though Adam likes Beth and values her happiness.

And I think situations like this are common. The ability to give someone something they want, is a form of power; and power is instrumentally useful. And the less someone wants, the less able you are to give them something they want1.

For a closer-to-reality example, the reddit comment that sparked this post said:

bringing Platform 3 back into use at Liverpool Street Underground Station was denied because the platform would not be accessible. Neither of the platforms currently in use for that line is accessible, so allowing Platform 3 to be used would be a Pareto improvement

The model here is that there are two parties, people who can access the platforms at Liverpool St and those who can't. If Platform 3 is brought back into use, the first group gains something and the second group loses nothing.

But I think that if Platform 3 is brought back into use, the second group loses some power. They lose the power to say "we'll let you bring back Platform 3 if you give us…". Maybe Platform 3 can be made accessible for $1 million. Then they can say "we'll let you bring it back if you make it accessible", but they can't do that if it's already back in use.

And they lose some power to say "if you ignore us, we'll make things difficult for you". Maybe it would take \$1 trillion to make Platform 3 accessible. If Platform 3 remains out of use, people are more likely to spend \$1 million to make their building projects accessible, because they've seen what happens if they don't. Conversely, if Platform 3 comes back, people are more likely to exaggerate future costs of accessibility. "If I say it costs \$1 million, I'll have to pay. If I say it costs \$10 million, maybe I won't."

I haven't researched the situation in question, and I expect that the actual power dynamics in play don't look quite like that. But I think the point stands.

(My original reply said: "If it's easier to turn an unused inaccessible platform into a used accessible platform, than to turn a used inaccessible platform into a used accessible platform - I don't know if that's the case, but it sounds plausible - then opening the platform isn't a Pareto improvement." That still seems true to me, but it's not what I'm talking about here. There are lots of reasons why something might not be a Pareto improvement.)

This doesn't mean Pareto improvements don't exist. But I think a lot of things that look like them are not.

Update 2018-02-02: some good comments on reddit and LessWrong. Following those, I have two things in particular to add.

First, that I like /u/AntiTwister's summary: "If you have the power to prevent another party from gaining utility, then you lose utility by giving up that power even if you are allies. There is opportunity cost in abstaining from using your power as a bargaining chip to increase your own utility."

Second, that there is a related (weaker) concept called Kaldor-Hicks efficiency. I think that a lot of the things that look-like-but-aren't Pareto improvements, are still Kaldor-Hicks improvements - meaning that the utility lost by the losing parties is still less than the utility gained by the winners. In theory, that means that the winners could compensate the losers by giving them some money, to reach a Pareto improvement over the original state. But various political and practical issues can (and often do) get in the way of that.

  1. This feels like it generalizes far beyond questions of Pareto efficiency, but I'm not sure how to frame it. Something like game theory is more competitive than it appears. Even when no two players value the same resource, even when all players genuinely want all other players to do well, players still have an incentive to sabotage each other. 

Posted on 27 January 2018 | Comments

Improving goimports

Epistemic status: Pure dilettantism. I have never used Go. This might make me unqualified to say what I'm about to say. But it's okay, because I use the word "seems" a lot.

In Go, if you have an unused import, your program fails to compile. This has made a lot of people mildly annoyed and been sometimes regarded as a bad idea, but not universally.

The devs decline to add a compiler flag to change this behaviour, "because compiler options should not affect the semantics of the language and because the Go compiler does not report warnings, only errors that prevent compilation". This strikes me as reasonable, if not the decision I personally would make.

Instead, they suggest this pattern:

import "unused"

// This declaration marks the import as used by referencing an
// item from the package.
var _ = unused.Item  // TODO: Delete before committing!

- that is, add a line of code to trick the compiler into thinking that you're using the import, even though you're not. (Unless you forget to delete the trick line, which the compiler won't verify for you.)

This does not strike me as a very good substitute. With a compiler flag, I could turn it on while debugging and turn it off for production to make sure I had no unused imports. I could use a commit hook to keep unused imports out of the repository. By tricking the compiler, there's no particularly easy way to do this. (I suppose I could put the string "FAKEIMPORT" is a comment on those lines, and grep for that string. This is still not a great solution.)

I also refer to the devs' opinion that "if it's worth complaining about, it's worth fixing in the code. (And if it's not worth fixing, it's not worth mentioning.)" I claim that there's no fundamental difference between

import unused


import unused
var _ = unused.Item

Neither affects the semantics of the program, if we ignore that the first one doesn't compile1. If one is worth complaining about, so is the other. But the devs are sending mixed signals. It seems the first is worth complaining about, because the compiler complains. But it seems the second is not, because the devs recommend it. This should be a sign that something about this whole situation is less than ideal2.

(For myself, I would say that both are worth complaining about in production, and neither is worth complaining about when debugging. But the trick works equally well in both instances.)

There is another solution, in the form of a tool called goimports. The idea is that you don't write imports at all. If you have a symbol that isn't imported, it searches your filesystem for an appropriate package and adds an import line. If you have an unused import, it deletes it.

But word on the street is that some names are ambiguous, supplied by multiple packages, and goimports has no way to choose which one you mean. So if you comment out a line, and then uncomment it, goimports might add back a different import than the one you originally used. This, too, seems less than ideal.

All of which is to say: although I don't use Go, it seems to me that Go has a problem, and that the existing solutions aren't perfect.

I propose a new solution, which could be implemented as two new modes for goimports to run in or as one or two completely new tools.

In the first mode, this tool acts like goimports, but more conservatively. Instead of removing unused imports, it merely comments them out. And instead of searching your filesystem for packages, it merely searches your comment lines, and uncomments them if there's a match.

So if you're debugging, and comment out the only use of a package, this tool will comment out the import for you. When you uncomment that use, the import will be added back in, but without the ambiguous naming problem. At no point do you have to trick the compiler, so you don't have to remember to stop tricking the compiler.

In the second mode, this tool checks for commented out import lines, and tells you whether it found any (or optionally deletes them). It can be called in commit hooks, to prevent such lines from cluttering up a repository.

This seems to me like it would be an improvement on the status quo.

  1. At least, they don't look like they do. Not knowing Go, it's conceivable that they could. If the second can have effects, then the devs' recommendation seems particularly bad. In that case, if a var _ = unused.Item line shows up in a codebase, it's probably a mistake - but you can't be sure without checking, which takes much longer. 

  2. I want to clarify three things. First, I do not think the devs are wrong to recommend using this trick. Second, I do not think the second version is just as bad as the first. The second version has a more obvious smell, which is valuable, because bad code that smells bad is less bad than bad code that smells good. Third, I said the situation is less than ideal, and that's what I meant. I do not think the situation is terrible. 

Posted on 13 January 2017 | Comments

The Parable of the Clock

The monk Dawa had a clock that had stopped, and he was content. When he wished to know the hour, he would glance at the clock, and discover that it was noon.

One day a visiting friend commented on the clock. "Why does your clock say that the hour is noon, when I am quite sure that it is six in the evening?"

Dawa found this unlikely, for the hour had always been noon in his experience. But he had been instilled with the virtues of curiosity and empiricism. If the hour is noon, I desire to believe it is noon. If the hour is six in the evening, I desire to believe it is six in the evening. Let me not become attached to beliefs I may not want. Thus fortified, he sought out other clocks.

The time was indeed six in the evening. In accordance with the virtue of relinquishment, and gently laughing inside at his past foolishness, Dawa serenely set his broken clock forwards by six hours.

Posted on 22 December 2016 | Comments

Data surgery is not rocket science

This is a story of me failing to do something, and some thoughts on how I might have succeeded.

A friend had a problem. He'd been recording some audio on his phone when the battery died, leaving him with a .m4a file that couldn't be played. He had a look at the contents and it contained a bunch of data, so he offered $40 if someone could recover the audio for him.

This seemed like a fun challenge that would exercise skills there isn't much call for. I didn't necessarily expect to succeed, but I decided to give it a go. (I only remember doing something like this once before, when I had a corrupt FAT partition that I managed to fix with some bit-flipping.)

To help, the friend provided two small files from the same app: one a successful recording, and one corrupt like the target file.

The simplest thing was to simply try playing the broken files with mplayer, just in case. It didn't work, and gave an error message saying (among other things) "moov atom not found".

The next thing was to look at all the files in a hex editor, which in this case was "hexdump -C piped into less" because I don't think I have a dedicated hex editor installed. I quickly noticed that the good recording had the bytes moov at location 0x1d, while the corrupt recordings both had the bytes free there.

I also noticed that all three files had the bytes mdat at location 0xc95, followed by some low-entropy data, and then some apparently high-entropy data that seemed to go on to the end of the file. I guessed that this was the actual audio data, while the start of the good recording was a valid audio header.

00000000  00 00 00 18 66 74 79 70  6d 70 34 32 00 00 00 00  |....ftypmp42....|
00000010  69 73 6f 6d 6d 70 34 32  00 00 0c 79 66 72 65 65  |isommp42...yfree|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000c90  00 3f 3f 3f 3f 6d 64 61  74 01 40 22 80 a3 7f f8  |.????mdat.@"....|
00000ca0  85 2d 2d 2d 2d 2d 2d 2d  2d 2d 2d 2d 2d 2d 2d 2d  |.---------------|
00000cb0  2d 2d 2d 2d 2d 2d 2d 2d  2d 2d 2d 2d 2d 2d 2d 2d  |----------------|
00000da0  2d 2d 2d 2d 2d 2d 2d 2d  2d 2d 2d 2d 2e ff f1 0a  |------------....|
00000db0  5a 5a 5a 5a 5a 5a 5a 5a  5a 5a 5a 5a 5a 5a 5a 5a  |ZZZZZZZZZZZZZZZZ|
00000eb0  5a 5a 5a 5a 5a 5a 5a 5a  5a 5a 5a 5d e0 e2 14 b4  |ZZZZZZZZZZZ]....|
00000ec0  b4 b4 b4 b4 b4 b4 b4 b4  b4 b4 b4 b4 b4 b4 b4 b4  |................|
00000ed0  b4 b4 bc 00 e2 36 2e 70  5b 0c 09 88 8b b1 2a ae  |.....6.p[.....*.|
00000ee0  9d 55 6c 14 4c 2a 2a 55  45 44 01 c5 35 93 14 a0  |.Ul.L**UED..5...|

The start of the target file. hexdump prints a * to indicate that several lines have been skipped which were identical to the one above. The audio data seems to start around position 0xed2.

So that gave me another simple thing to try. I took the first 0xc95 bytes from the good file, and byte 0xc95 onwards from the small corrupt file, and concatenated them.

head -c 3221 tinytest_notbroken.m4a > fixtest-1.m4a
tail -c +3222 tinytest_broken.m4a >> fixtest-1.m4a

After this, fixtest-1.m4a played in mplayer. It printed lots of warning messages while it did so, but whatever.

So I did the same thing with the target file, and recovered two seconds of the original audio. I could report to my friend that it started with him saying "um".

This is what I was expecting, since the good recording was only two seconds, and it would presumably have had the length in the header. But now I needed to work out how to lengthen it.

I played around with ffmpeg, but it didn't seem to have an option for "ignore the audio duration stated in the header", so it seemed I would have to fix the header myself.

ftyppm42 seemed like it might be a file type indicator, so I googled that. A few clicks led to the MP4 registration authority, which suggested that the relevant standard was called "ISO/IEC 14496-14". So I then googled "ISO/IEC 14496-14 track header", and found a pdf of the relevant standard. Unfortunately this is a high-context document, I didn't particularly understand it, and it didn't help me very much.

I also found wikipedia on MP4, which pointed me to ISO base media file format, ISO/IEC 14496-12. Google then gave me a pdf of that standard, which was much more detailed, helpful, and lengthy.

I didn't attempt to understand it all. But I searched for "mdat", and shortly after one of the hits, I found the following description of a data structure:

aligned(8) class MovieHeaderBox extends FullBox(‘mvhd’, version, 0) {
    if (version==1) {
        unsigned int(64) creation_time;
        unsigned int(64) modification_time;
        unsigned int(32) timescale;
        unsigned int(64) duration;
    } else { // version==0
        unsigned int(32) creation_time;
        unsigned int(32) modification_time;
        unsigned int(32) timescale;
        unsigned int(32) duration;
    template int(32) rate = 0x00010000; // typically 1.0
    template int(16) volume = 0x0100; // typically, full volume
    const bit(16) reserved = 0;
    const unsigned int(32)[2] reserved = 0;
    template int(32)[9] matrix =
        { 0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 };
        // Unity matrix
    bit(32)[6] pre_defined = 0;
    unsigned int(32) next_track_ID;

Promising! My header contained the following two lines:

00000020  00 00 00 6c 6d 76 68 64  00 00 00 00 d4 6b 4f 0d  |...lmvhd.....kO.|
00000030  d4 6b 4f 0d 00 00 03 e8  00 00 08 17 00 01 00 00  |.kO.............|

That's mvhd from the description, followed by four null bytes, followed by two identical 32-bit ints - the creation and modification times would naturally have been the same - and then two more ints.

timescale was 0x3e8, decimal 1000, which per spec means 1000 time units pass in a second. duration was 0x817, decimal 2071, indicating a track 2.071 seconds long. So presumably I need to edit this value.

What to set it to? As long as it's sufficiently long, it's not important. So I set it to 0x10000817, using emacs (in hexl-mode) as my editor, and tried again. No dice, it still cut off at two seconds.

So I searched "duration" in the same file, and found two more places to edit. One soon after the bytes tkhd, which used the same timescale as the mvhd; and one soon after the bytes mdhd, which had its own timescale. In this case the timescale was 0xac44, decimal 44100.

But adding 0x1000000 to both of these durations didn't help either.

(After one of these three edits - I don't remember which - mplayer started reporting the file as being 74 hours long. But it still cut out after two seconds.)

At this point I was up late and out of ideas. But I realized that I could just ask my friend to record another long track, and then use the header from that to do the job. So I left him instructions, plus some notes on where I'd got to, so someone else could pick up where I'd left off if that didn't work.

Then while I was asleep, someone else came along and fixed it for him before he tried that. I don't know how.

But here are some thoughts on how I might have proceeded, if my friend hadn't been able to record the longer audio.

For one thing, I might have been able to generate my own long file in ffmpeg. But attempting this now, I can't make it work. I concatenate several copies of the good recording, and get a file that starts with these three lines:

00000000  00 00 00 18 66 74 79 70  4d 34 41 20 00 00 02 00  |....ftypM4A ....|
00000010  69 73 6f 6d 69 73 6f 32  00 00 00 08 66 72 65 65  |isomiso2....free|
00000020  00 07 93 5e 6d 64 61 74  01 40 22 80 a3 7f f8 85  |...^mdat.@".....|

It plays fine. But when I try to mix it with the target file, it doesn't play. It complains about the missing moov atom, even though that's also missing in the working concatenated file. I'm not sure what's wrong.

Failing that, I could have recorded the long file myself in the same app my friend used, to spare his attention for things he actually wanted to be doing. (Which, after all, was the reason he was willing to pay someone else to fix the file.) I could also, for curiousity's sake, have recorded another short file, and attempted to find more durations by comparing the headers.

But perhaps the simplest thing would have been to take a completely different approach from the beginning. It turns out that other people have encountered this problem before, and paved the way for those like me. For example, someone on facebook posted this page, which teaches you to fix these files using a piece of software called "faad.exe". More research reveals that this is open source, and even available in portage.

(I also find references to FAAD if I google "fix corrupt m4a file".)

It looks like this was a case of "more haste, less speed". At least the haste was fun, but probably not worth the $40 I might have earned by being thorough.

Posted on 17 December 2016 | Comments

The Sally-Anne Fallacy

I'd like to coin a term. The Sally-Anne fallacy is the mistake of assuming that somone believes something, simply because that thing is true.1

The name comes from the Sally-Anne test, used in developmental psychology to detect theory of mind. Someone who lacks theory of mind will fail the Sally-Anne test, thinking that Sally knows where the marble is. The Sally-Anne fallacy is also a failure of theory of mind.

In internet arguments, this will often come up as part of a chain of reasoning, such as: you think X; X implies Y; therefore you think Y. Or: you support X; X leads to Y; therefore you support Y.2

So for example, we have this complaint about the words "African dialect" used in Age of Ultron. The argument goes: a dialect is a variation on a language, therefore Marvel thinks "African" is a language.

You think "African" has dialects; "has dialects" implies "is a language"; therefore you think "African" is a language.

Or maybe Marvel just doesn't know what a "dialect" is.

This is also a mistake I was pointing at in Fascists and Rakes. You think it's okay to eat tic-tacs; tic-tacs are sentient; therefore you think it's okay to eat sentient things. Versus: you think I should be forbidden from eating tic-tacs; tic-tacs are nonsentient; therefore you think I should be forbidden from eating nonsentient things. No, in both cases the defendant is just wrong about whether tic-tacs are sentient.

Many political conflicts include arguments that look like this. You fight our cause; our cause is the cause of [good thing]; therefore you oppose [good thing]. Sometimes people disagree about what's good, but sometimes they just disagree about how to get there, and think that a cause is harmful to its stated goals. Thus, liberals and libertarians symmetrically accuse each other of not caring about the poor.3

If you want to convince someone to change their mind, it's important to know what they're wrong about. The Sally-Anne fallacy causes us to mistarget our counterarguments, and to mistake potential allies for inevitable enemies.

  1. From the outside, this looks like "simply because you believe that thing". 

  2. Another possible misunderstanding here, is if you agree that X leads to Y and Y is bad, but still think X is worth it. 

  3. Of course, sometimes people will pretend not to believe the obvious truth so that they can further their dastardly ends. But sometimes they're just wrong. And sometimes they'll be right, and the obvious truth will be untrue. 

Posted on 09 April 2016 | Comments

Political Polarization in the US House of Representatives

I've created an interactive graph of historical levels of political polarization in the US House of Representatives. It would be tricky to embed in this blog, so I'm only linking it. Summary:

The x-axis on this graph is based on DW-NOMINATE left-right scores of each member of each U.S. House of Representatives from 1865 to 2015. This uses a member's voting record to measure the direction and extremity of their political views, regardless of party affiliation.

If a member's score on this axis is known, it's possible to predict their vote on any given issue with high confidence, given no other information about the member. Members whose votes are typically left-aligned receive negative scores, while members whose votes are typically right-aligned receive positive scores.

(However, see The Thin Blue Line That Stays Strangely Horizontal, which questions the validity of DW-NOMINATE.)

Background: I made this last year for a Udacity course, "Data Visualization and D3.js". I needed to submit a visualization for marking, and this was my submission. I'm grateful for feedback provided by Udacity and by some of my friends. Without that, the result would certainly have been worse.

The source is available on GitHub, including datasets and some python scripts I used to process them. The README also documents some of the design history.

I'm aware of one bug: in firefox (38.6.1 on linux), the legend appears to display the 5-95 and 25-75 percentile boxes identically. They're implemented as rects with fill-opacity: 0.3: the 25-75 boxes have two of these rects on top of each other. This is also how the paths on the graph itself are colored.

I assume there are other bugs.

Posted on 26 February 2016 | Comments

Officially faceblind

I realized a few years ago that I was at least somewhat faceblind/prosopagnosic. A while back I took an online test out of curiousity, and scored low. They said that if I was in London and interested in further tests, I should leave my email address. A few days ago I went in for those tests, and now I have a PhD student (Katie) also telling me I'm faceblind. Which makes it official, I guess.

Next she wants to run EEGs on me, which should be cool. That will help work out where my brain is going wrong, in the long chain between "photons stimulate nerve endings in my eyeballs" and "I recognize a face" (whatever that means). Also, apparently there's a phenomon which sounds to me like blindsight, where some prosopagnosics' brains are clearly reacting to faces on some level that doesn't reach their consciousness. She wants to learn more about that too.

What follows is discussion of the tests, my scores, and what they mean. I've been given a powerpoint with my scores reported as percentiles, along with absolute scores and average control scores. 2% or lower is counted as "impaired". Percentiles are only given as integers, or as "<1%". On the day, Katie also gave me some numbers in terms of standard deviations (σ). Under a normal distribution, 2.5% would be approximately -2σ, but I'm not sure any of these results will be normally distributed, so I don't know if σ scores really tell me anything.

A note: if you think you might be faceblind, and you'd be interested in getting more detailed tests, it might be a good idea not to read the below. I expect it wouldn't significantly bias the results if you did, except for one bit that I've rot13ed. But I don't trust myself to make that call. If you're in London, you can take the above test like me and see what happens. Otherwise I'm not sure how you'd go about getting more tests.

The object/face recognition tests were "memorise these things, then we show you a sequence of things and you have to say if each of these things was a thing in the first set". The things were houses, cars, horses, and bald women's faces. I was bad at all of these: 4% for cars, 2% for houses, and <1% for horses and women. (Average score was higher for women than horses, and my score was higher for horses than women, so I'm worse at women than horses. I think Katie told me I was somewhere between -5σ and -6σ for women. Under normality, -5σ is one in thirty million, but this is clearly not normal.) So it seems I have some level of general object agnosia, but more specific prosopagnosia on top of that.

I was 11% for reading emotions from eyes, which is a point for Team Phil Does Not Have Aspergers (some of my friends are divided about that). In fact, the average score is 26 and I scored 23, and there were a few cases where I said an answer, then thought "wait no it's this" and didn't say anything because I wasn't sure if I should. (I was speaking my answer and Katie was recording it. I had to choose from four emotions, so I'm not sure why this wasn't recorded by a computer like most of the other tests.) So plausibly I'm actually above 11%.

I was <1% at famous face recognition, recognising five out of fifty that I'd been exposed to, out of sixty in total. (I got Jvyy Fzvgu, Uneevfba Sbeq, Dhrra Ryvmnorgu, Ebova Jvyyvnzf, and surprisingly Ovyy Pyvagba.) It seems that controls tend to get above 40, so even counting that "exposed to" is vague, I did really badly at this. I think Katie said I was -9σ, which would be one in 10^19 under normality.

I'm <1% at the Cambridge Memory Test for Faces, which is the one I linked above. I actually scored worse in the lab than online. (47% versus 58%, IIRC, with a control average of 80%, and 60% indicating impairment. But the lab score I've been given is 34 against control of 58, so it's clearly been adjusted.) There could be any number of reasons for this, including "chance". But when I took it online, I often thought that one of the faces looked a little like Matt Damon, and chose that one. I like to think that "mistaking people for Matt Damon" is the way forward in face recognition.

I was somewhat okay at half of the Cambridge Face Perception Test. In this one, they showed me a face at an angle, and below it the same face face-on, six times, with varying degrees of modification. I had to order them according to how similar each was to the original face, within a minute. For half the tests, the faces were all upside down. For all of the tests, they all looked incredibly similar and my instinctive reaction was WTF.

On the upright test, I got <1%. On the inverted test, I got 7%. One strategy I used a few times was to focus on the lips, specifically on the size of the dip in the bow. I just ordered them according to that. I guess it helps, but I found it a lot easier for inverted faces.

Doing better at inverted would seem to suggest that I'm doing some kind of holistic face processing that goes wrong and blocks off later avenues for perception. Buuut, objectively I scored worse on the inverted faces, just not as much worse as average, so I'm not sure if this does suggest that. (And I'm not sure it is "objectively" - if all the faces had been assigned to the other condition, would my scores have changed?)

Hypothetically, high scores on both tests could indicate my problem is with memory, not initial perception. The low score here doesn't mean I don't have a problem with memory, but it does seem to hint that I do have a problem with initial perception. And I suspect the famous faces test points at me also having a memory problem.

Posted on 19 January 2016 | Comments

A Scream of Swifties

Inspired by Slate Star Codex

Explaining a joke is like dissecting a frog: it's one way to learn about frogs. If you want me to explain any of these, ask, and I will explain without making fun of you.

"I hear someone playing a triangle in the corridor," said Tom haltingly.

"We've got to overturn every last insect in this garden," said Tom flippantly.

"Goose feathers are annoyingly fragile," said Tom, breaking down.

"Anastasia gives me pins and needles," said Christian gratingly.

"I miss my submissive," René opined.

"I didn't do it, and nor did any of my siblings," Tom insisted.

"It's not much paint, it won't hurt to run your tongue up it," said Tom metallically.

"I'm so sick, even my flu has the flu," said Tom metallurgically.

"It's pitch black and I can hear a monster doing arithmetic," said Tom gruesomely.

"Man City don't hold a candle to the real heros of Manchester," said Tom manually.

"I just bought Manchester United," said Tom virtuously.

"Lancelot told me I was his favourite!" said Tom, surprised.

"I don't think this tube of semen is entirely straight," said the incumbent.

"I can fit inside my GameCube!" said Tom inconsolably.

"In a former life, I was a priest in pre-Columbian Peru," said Tom inconsolably.

"I need a name for my squid-and-flatfish restaurant," said Tom inconsolably.

"I make a living as the red tellytubby," said Tom, apropos.

"I'm doing crunches so I can get a six-pack," said Treebeard absently.

"I'm half-fish and made of lithium," said Treebeard limerently.

"Figure three plots counts of close-ups on male versus female genitals," said Tom pornographically.

"My breasts don't have enough room in this corset," said Victoria, double depressed.

"Bring me the head of my enemy," said Emacs vicariously.

"I have affirming the consequent, base rate neglect, and now also ad hominem," said Tom, getting cocky.

"We lost the treaty, so we had to ratify it again," said Tom, resigned.

"I'm in the group supporting Shiva's wife," said Tom with satisfaction.

Posted on 01 January 2016 | Comments

Unix pro tip: Print and pipe

Sometimes you (or at least, I) want to run a command for its output, but also want to pipe it through another command. For example, see the results of a find but also count how many hits it got. I've sometimes lamented that there's no easy way to do this. But the other day I had a flash of insight and figured out how:

find . | tee /dev/stderr | wc -l
proc1 | tee /dev/stderr | proc2    # general case

(I'm pretty proud of this. I don't know if it's original to me, but I discovered it independently even if not.)

tee will print the output of proc1 to both stdout and stderr. stderr goes to the terminal and stdout goes to proc2.

You can make it more convenient with an alias:

alias terr='tee /dev/stderr | '
find . | terr wc -l

(Putting a pipe in an alias seems to work in both zsh and bash.)

If you want to concatenate the streams, to pipe them to another process, you can use subshells:

proc1 | ( terr proc2 ) 2>&1 | proc3

but note that stderr output from proc2 will also get sent to proc3, unless you send it somewhere else. I haven't yet thought of a use for this.

There are potential issues with buffering here. I'm not aware that tee makes any promises1 about which order it writes the streams in. It's going to be interlacing them while it writes, so that it doesn't need to keep a whole copy in memory. So (if the input is large enough) proc2 will be receiving input before it's finished being written to stderr, and might start writing output, and then the output streams can get interlaced.

For some values of proc2, commands which start printing before they've finished reading, this is inevitable. But I think useful proc2s are likely to be aggregators2 - by which I mean, commands which can't do anything until they've finished reading all their input3. In my tests so far, those have been safe, but that doesn't prove much.

We can do a more reliable check with strace:

find . | strace tee /dev/stderr | wc -l

By the looks of things, tee will read into a buffer, then write it to stdout (the pipe), then write it to the specified target (stderr, which goes to the terminal), and repeat to exhaustion. But the important thing is, it doesn't close any file descriptors until it's finished writing everything, and then it closes the target before it closes stdout. If this is consistent amongst tee implementations - and it seems sensible4 - then aggregators almost definitely won't interlace their output with the output from proc1. I don't want to say "definitely", because there might be other stuff going on that I haven't accounted for. But at any rate, tee will finish writing before the aggregator starts.

Anyway, I see this as being the sort of thing that you're likely use manually, not in an automated process. So if the output does get interlaced a little, it's probably not that big a deal.

  1. Note how I didn't say guaranTEEs? You're welcome. 

  2. In fact, wc is the only proc2 I can currently think of, that I expect to be regularly useful. (And then just piping through cat -n does something similar, but it's not ideal.) numsum or perhaps some other tools from num-utils could conceivably be handy from time to time. 

  3. You can turn a non-aggregator into an aggregator with moreutils' sponge command, which is like cat but reads all its input before writing anything. 

  4. They could reasonably close their outputs in either order, which might make a difference. But I don't think it's usual for programs to explicitly close their std* streams, that gets taken care of by the surrounding infrastructure. So if tee explicitly closes its target, I'd expect that to happen before stdout gets closed. 

Posted on 07 October 2015 | Comments

London Cycle Hires and Weather

Preface: I wrote this report for Udacity's "Explore and Summarize Data" module. The structure is kind of strange for a blog post, but I'm submitting the finished report essentially unchanged.

One thing I will note. I find that the cycle hire usage doesn't change much throughout the year. Shortly after submitting, I read this article which finds that it does vary quite a lot. I'm inclined to trust that result more. It's intuitively sensible, and it looks directly at the number of rides taken, instead of looking at a proxy like I do.

Take this as evidence for how much to trust my other results.

My goal is to investigate usage of the London cycle hire scheme, and in particular how it varies with the weather. I'm running an analysis from July 2013 to June 2014.

I'm using two data sets here. Daily weather data comes from Weather Underground, using the weather station at London Heathrow airport.

(London City Airport is closer to the bike stations that I use, but the data from that airport reports 0 precipitation on every single day. The data from Heathrow seems to be more complete, and I expect it to be almost as relevant.)

I collected the cycle hire data myself, over the course of the year, by downloading CSV files from an unofficial API which now appears to be defunct. It has a granularity of about ten minutes. That's about 50,000 entries per docking station for the year, so for this analysis, I'm only using the data from four docking stations near my office.

All data and source code used for this project can be found in the git repository.

Exploring the weather data


plot of chunk temp.1v plot of chunk temp.1v

These variables measure the minimum, average, and maximum daily temperatures. The graphs all look similar, and overlap a lot. The shape is a little surprising, as I didn't expect the density graphs to be bimodal. It could potentially be caused by significant differences between summer and winter, with an abrupt shift between the two.


plot of chunk rain.1v plot of chunk rain.1v

According to the rain column, There are over 225 rainy days and only about 125 non-rainy days. But by far the most common bin for precip.mm is the leftmost one. Table of values of precip.mm:

##     0  0.25  0.51  0.76  1.02  2.03  3.05  4.06  5.08   6.1  7.11  7.87 
##   207    35    20     9    17    22    12     8    12     4     4     2 
##  8.89  9.91 10.92 11.94 13.97 
##     3     5     2     1     2

Although more than half of observations have rain == TRUE, more than half of them also have precip.mm == 0, which needs more investigation. Rainfall as measured by precip.mm versus as measured by rain:

plot of chunk rain.precip.2v plot of chunk rain.precip.2v

The two measures don't always agree. Sometimes rain is false but precip.mm is nonzero; and often rain is true but precip.mm is zero. Neither of those is surprising individually: if rain is only counted when the rainfall exceeds a certain threshold, then that threshold could be large (giving false/nonzero) or small (giving true/zero). But the combination suggests that that isn't what's going on, and I don't know what is.

This table counts the anomalies by turning precip.mm into a boolean zero/nonzero (false/true) and comparing it to rain:

##         FALSE TRUE
##   FALSE   119    9
##   TRUE     88  149

There are 88 instances of true/zero, 9 instances of false/nonzero, but the cases where they agree are the most common.

I find precip.mm to me more plausible here. I feel like fewer than half of days are rainy. This website agrees with me, saying that on average, 164 days out of the year are rainy (rain - 237, precip.mm - 158).


plot of chunk wind.1v plot of chunk wind.1v

These three measures of wind speed are all averages. wind is simply the average wind speed over a day. wind.max is the daily maximum of the average wind speed over a short time period (I think one minute). gust is the same thing, but with a shorter time period (I think 14 seconds).

Unlike with temperature, the three measures look different. All are right-skewed, although gust looks less so. There are several outliers (the isolated points on the box plots), and the quartiles don't overlap. The minimum gust speed (about 24) is almost as high as the median wind.max.

Exploring the bike data

Time between updates

plot of chunk dt.1v

There are a few outliers here. Not all the lines are visible due to rendering artifacts, but above 5000, we only have five entries:

##                     name        prev.updated             updated
## 46779    Earnshaw Street 2013-10-03 08:50:23 2013-10-13 09:20:28
## 46899  Southampton Place 2013-10-03 08:50:22 2013-10-13 09:20:27
## 46918       High Holborn 2013-10-03 08:50:24 2013-10-13 09:20:30
## 47049         Bury Place 2013-10-03 08:50:26 2013-10-13 09:20:32
## 175705 Southampton Place 2014-06-20 17:36:06 2014-06-30 08:30:03

The first four of these happened when my collection script broke and I failed to realize it. The other occurred when Southampton Place was taken out of service temporarily.

Let's zoom in on the lower ones:

plot of chunk dt.1v.left

There are several instances where the time between updates is unusually large, on the order of hours or days. The times of entries with between 2000 and 5000 minutes between updates:

##                     name        prev.updated             updated
## 32650       High Holborn 2013-08-31 15:10:07 2013-09-02 12:30:05
## 32660         Bury Place 2013-08-31 15:10:08 2013-09-02 12:30:07
## 32672  Southampton Place 2013-08-31 15:10:05 2013-09-02 12:30:04
## 32674    Earnshaw Street 2013-08-31 15:10:06 2013-09-02 12:30:05
## 38546       High Holborn 2013-09-14 22:39:00 2013-09-16 08:24:22
## 38719         Bury Place 2013-09-14 22:39:02 2013-09-16 08:24:23
## 38734  Southampton Place 2013-09-14 22:38:58 2013-09-16 08:24:20
## 38735    Earnshaw Street 2013-09-14 22:38:59 2013-09-16 08:24:21
## 84066         Bury Place 2013-12-27 15:40:08 2013-12-29 23:10:14
## 84069       High Holborn 2013-12-27 15:40:06 2013-12-29 23:10:13
## 84073  Southampton Place 2013-12-27 15:40:05 2013-12-29 23:10:11
## 84078    Earnshaw Street 2013-12-27 15:40:05 2013-12-29 23:10:12
## 84186    Earnshaw Street 2013-12-30 00:10:05 2013-12-31 13:10:07
## 84202       High Holborn 2013-12-30 00:10:06 2013-12-31 13:10:09
## 84269  Southampton Place 2013-12-30 00:10:05 2013-12-31 13:10:06
## 84330         Bury Place 2013-12-30 00:10:07 2013-12-31 13:10:11
## 89443  Southampton Place 2014-01-12 20:20:10 2014-01-14 18:40:07
## 89459       High Holborn 2014-01-12 20:20:13 2014-01-14 18:40:11
## 89467         Bury Place 2014-01-12 20:20:14 2014-01-14 18:40:16
## 89524    Earnshaw Street 2014-01-12 20:20:11 2014-01-14 18:40:09
## 121381   Earnshaw Street 2014-03-15 14:50:06 2014-03-17 01:50:04
## 121398      High Holborn 2014-03-15 14:50:07 2014-03-17 01:50:05
## 121444        Bury Place 2014-03-15 14:50:10 2014-03-17 01:50:07
## 121591 Southampton Place 2014-03-15 14:50:05 2014-03-17 01:50:04
## 133765      High Holborn 2014-04-11 16:59:37 2014-04-14 01:29:07
## 133900   Earnshaw Street 2014-04-11 16:59:36 2014-04-14 01:29:05
## 133961        Bury Place 2014-04-11 16:59:38 2014-04-14 01:29:08
## 134027 Southampton Place 2014-04-11 16:59:35 2014-04-14 01:29:05

It looks like these happened to all stations simultaneously, suggesting problems with either my collection script or the API, rather than problems with individual locations.

Entries with less than 60 minutes between updates, no longer on a log scale:

plot of chunk dt.1v.60

In the vast majority of cases, updates are approximately ten minutes apart. This encourages me to take a subset of the data (bikes.all -> bikes), considering only entries with d.updated less than 15 minutes. This eliminates many outliers in future graphs.

Date and time of update

plot of chunk date.time.1v plot of chunk date.time.1v

All times of day are approximately equally represented to within ten minutes, which is good. There are five noticeable troughs preceeded by spikes, but they probably don't signify much. Dates are a lot less uniform, however. Even apart from the ten-day period where my script was broken, many days have significantly fewer updates than typical, and some have none at all.

Number of days spent with a given number of active docks

plot of chunk ndocks.time.2v

It was common for every station to report less than a full complement of docks. At least two had a full complement for less than half the time (High Holborn and Bury place are unclear in that respect). This isn't surprising, since a bike reported as defective will be locked in, using up a slot but not being available for hire.

Journeys taken throughout the year

plot of chunk date.journeys.2v

The time of year makes very little difference to the number of rides. There appears to be a slight sinusoidal relationship, but it's very weak. (I didn't do a PMCC test because that assumes that any relationship is linear, which we would naively expect not to be the case here, and also doesn't look true from the graph.)

Journeys by weekday

plot of chunk weekday.journeys.2v plot of chunk weekday.journeys.2v plot of chunk weekday.journeys.2v

Fewer journeys are taken on weekends. The median number of bikes available doesn't change much throughout the week (5 on monday and friday, 4 on other days), but the distribution does. Saturday and Sunday have noticeably different shapes to the others. They have a single peak, while weekdays are somewhat bimodal, with a small peak where the station is full (probably when people are arriving at work).

(Since the stations have different numbers of docks, I did a graph of fullness rather than of number of bikes. The density plot doesn't show peaks exactly at 0 and 1 because of how the density window works, but histograms of num.bikes and num.spaces show that that's where they are. It would be difficult to use a histogram for this graph because there's no sensible binwidth.)

Change in number of bikes between updates

plot of chunk bikes.prevbikes.name.mv

##  Pearson's product-moment correlation
## data:  bikes$num.bikes and bikes$prev.num.bikes
## t = 2466.8, df = 173250, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9859301 0.9861908
## sample estimates:
##      cor 
## 0.986061

There's very strong correlation between the number of bikes in adjacent entries. This is as expected, especially given what we saw about d.num.bikes previously. The colors here don't show any particular station-dependent trends.

Number of bikes at any given time