A Reasonable Approximation

Latest posts

The Sally-Anne Fallacy

I'd like to coin a term. The Sally-Anne fallacy is the mistake of assuming that somone believes something, simply because that thing is true.1

The name comes from the Sally-Anne test, used in developmental psychology to detect theory of mind. Someone who lacks theory of mind will fail the Sally-Anne test, thinking that Sally knows where the marble is. The Sally-Anne fallacy is also a failure of theory of mind.

In internet arguments, this will often come up as part of a chain of reasoning, such as: you think X; X implies Y; therefore you think Y. Or: you support X; X leads to Y; therefore you support Y.2

So for example, we have this complaint about the words "African dialect" used in Age of Ultron. The argument goes: a dialect is a variation on a language, therefore Marvel thinks "African" is a language.

You think "African" has dialects; "has dialects" implies "is a language"; therefore you think "African" is a language.

Or maybe Marvel just doesn't know what a "dialect" is.

This is also a mistake I was pointing at in Fascists and Rakes. You think it's okay to eat tic-tacs; tic-tacs are sentient; therefore you think it's okay to eat sentient things. Versus: you think I should be forbidden from eating tic-tacs; tic-tacs are nonsentient; therefore you think I should be forbidden from eating nonsentient things. No, in both cases the defendant is just wrong about whether tic-tacs are sentient.

Many political conflicts include arguments that look like this. You fight our cause; our cause is the cause of [good thing]; therefore you oppose [good thing]. Sometimes people disagree about what's good, but sometimes they just disagree about how to get there, and think that a cause is harmful to its stated goals. Thus, liberals and libertarians symmetrically accuse each other of not caring about the poor.3

If you want to convince someone to change their mind, it's important to know what they're wrong about. The Sally-Anne fallacy causes us to mistarget our counterarguments, and to mistake potential allies for inevitable enemies.

  1. From the outside, this looks like "simply because you believe that thing".

  2. Another possible misunderstanding here, is if you agree that X leads to Y and Y is bad, but still think X is worth it.

  3. Of course, sometimes people will pretend not to believe the obvious truth so that they can further their dastardly ends. But sometimes they're just wrong. And sometimes they'll be right, and the obvious truth will be untrue.

Posted on 09 April 2016 | Comments

Political Polarization in the US House of Representatives

I've created an interactive graph of historical levels of political polarization in the US House of Representatives. It would be tricky to embed in this blog, so I'm only linking it. Summary:

The x-axis on this graph is based on DW-NOMINATE left-right scores of each member of each U.S. House of Representatives from 1865 to 2015. This uses a member's voting record to measure the direction and extremity of their political views, regardless of party affiliation.

If a member's score on this axis is known, it's possible to predict their vote on any given issue with high confidence, given no other information about the member. Members whose votes are typically left-aligned receive negative scores, while members whose votes are typically right-aligned receive positive scores.

(However, see The Thin Blue Line That Stays Strangely Horizontal, which questions the validity of DW-NOMINATE.)

Background: I made this last year for a Udacity course, "Data Visualization and D3.js". I needed to submit a visualization for marking, and this was my submission. I'm grateful for feedback provided by Udacity and by some of my friends. Without that, the result would certainly have been worse.

The source is available on GitHub, including datasets and some python scripts I used to process them. The README also documents some of the design history.

I'm aware of one bug: in firefox (38.6.1 on linux), the legend appears to display the 5-95 and 25-75 percentile boxes identically. They're implemented as rects with fill-opacity: 0.3: the 25-75 boxes have two of these rects on top of each other. This is also how the paths on the graph itself are colored.

I assume there are other bugs.

Posted on 26 February 2016 | Comments

Officially faceblind

I realized a few years ago that I was at least somewhat faceblind/prosopagnosic. A while back I took an online test out of curiousity, and scored low. They said that if I was in London and interested in further tests, I should leave my email address. A few days ago I went in for those tests, and now I have a PhD student (Katie) also telling me I'm faceblind. Which makes it official, I guess.

Next she wants to run EEGs on me, which should be cool. That will help work out where my brain is going wrong, in the long chain between "photons stimulate nerve endings in my eyeballs" and "I recognize a face" (whatever that means). Also, apparently there's a phenomon which sounds to me like blindsight, where some prosopagnosics' brains are clearly reacting to faces on some level that doesn't reach their consciousness. She wants to learn more about that too.

What follows is discussion of the tests, my scores, and what they mean. I've been given a powerpoint with my scores reported as percentiles, along with absolute scores and average control scores. 2% or lower is counted as "impaired". Percentiles are only given as integers, or as "<1%". On the day, Katie also gave me some numbers in terms of standard deviations (σ). Under a normal distribution, 2.5% would be approximately -2σ, but I'm not sure any of these results will be normally distributed, so I don't know if σ scores really tell me anything.

A note: if you think you might be faceblind, and you'd be interested in getting more detailed tests, it might be a good idea not to read the below. I expect it wouldn't significantly bias the results if you did, except for one bit that I've rot13ed. But I don't trust myself to make that call. If you're in London, you can take the above test like me and see what happens. Otherwise I'm not sure how you'd go about getting more tests.

The object/face recognition tests were "memorise these things, then we show you a sequence of things and you have to say if each of these things was a thing in the first set". The things were houses, cars, horses, and bald women's faces. I was bad at all of these: 4% for cars, 2% for houses, and <1% for horses and women. (Average score was higher for women than horses, and my score was higher for horses than women, so I'm worse at women than horses. I think Katie told me I was somewhere between -5σ and -6σ for women. Under normality, -5σ is one in thirty million, but this is clearly not normal.) So it seems I have some level of general object agnosia, but more specific prosopagnosia on top of that.

I was 11% for reading emotions from eyes, which is a point for Team Phil Does Not Have Aspergers (some of my friends are divided about that). In fact, the average score is 26 and I scored 23, and there were a few cases where I said an answer, then thought "wait no it's this" and didn't say anything because I wasn't sure if I should. (I was speaking my answer and Katie was recording it. I had to choose from four emotions, so I'm not sure why this wasn't recorded by a computer like most of the other tests.) So plausibly I'm actually above 11%.

I was <1% at famous face recognition, recognising five out of fifty that I'd been exposed to, out of sixty in total. (I got Jvyy Fzvgu, Uneevfba Sbeq, Dhrra Ryvmnorgu, Ebova Jvyyvnzf, and surprisingly Ovyy Pyvagba.) It seems that controls tend to get above 40, so even counting that "exposed to" is vague, I did really badly at this. I think Katie said I was -9σ, which would be one in 10^19 under normality.

I'm <1% at the Cambridge Memory Test for Faces, which is the one I linked above. I actually scored worse in the lab than online. (47% versus 58%, IIRC, with a control average of 80%, and 60% indicating impairment. But the lab score I've been given is 34 against control of 58, so it's clearly been adjusted.) There could be any number of reasons for this, including "chance". But when I took it online, I often thought that one of the faces looked a little like Matt Damon, and chose that one. I like to think that "mistaking people for Matt Damon" is the way forward in face recognition.

I was somewhat okay at half of the Cambridge Face Perception Test. In this one, they showed me a face at an angle, and below it the same face face-on, six times, with varying degrees of modification. I had to order them according to how similar each was to the original face, within a minute. For half the tests, the faces were all upside down. For all of the tests, they all looked incredibly similar and my instinctive reaction was WTF.

On the upright test, I got <1%. On the inverted test, I got 7%. One strategy I used a few times was to focus on the lips, specifically on the size of the dip in the bow. I just ordered them according to that. I guess it helps, but I found it a lot easier for inverted faces.

Doing better at inverted would seem to suggest that I'm doing some kind of holistic face processing that goes wrong and blocks off later avenues for perception. Buuut, objectively I scored worse on the inverted faces, just not as much worse as average, so I'm not sure if this does suggest that. (And I'm not sure it is "objectively" - if all the faces had been assigned to the other condition, would my scores have changed?)

Hypothetically, high scores on both tests could indicate my problem is with memory, not initial perception. The low score here doesn't mean I don't have a problem with memory, but it does seem to hint that I do have a problem with initial perception. And I suspect the famous faces test points at me also having a memory problem.

Posted on 19 January 2016 | Comments

A Scream of Swifties

Inspired by Slate Star Codex

Explaining a joke is like dissecting a frog: it's one way to learn about frogs. If you want me to explain any of these, ask, and I will explain without making fun of you.

"I hear someone playing a triangle in the corridor," said Tom haltingly.

"We've got to overturn every last insect in this garden," said Tom flippantly.

"Goose feathers are annoyingly fragile," said Tom, breaking down.

"Anastasia gives me pins and needles," said Christian gratingly.

"I miss my submissive," René opined.

"I didn't do it, and nor did any of my siblings," Tom insisted.

"It's not much paint, it won't hurt to run your tongue up it," said Tom metallically.

"I'm so sick, even my flu has the flu," said Tom metallurgically.

"It's pitch black and I can hear a monster doing arithmetic," said Tom gruesomely.

"Man City don't hold a candle to the real heros of Manchester," said Tom manually.

"I just bought Manchester United," said Tom virtuously.

"Lancelot told me I was his favourite!" said Tom, surprised.

"I don't think this tube of semen is entirely straight," said the incumbent.

"I can fit inside my GameCube!" said Tom inconsolably.

"In a former life, I was a priest in pre-Columbian Peru," said Tom inconsolably.

"I need a name for my squid-and-flatfish restaurant," said Tom inconsolably.

"I make a living as the red tellytubby," said Tom, apropos.

"I'm doing crunches so I can get a six-pack," said Treebeard absently.

"I'm half-fish and made of lithium," said Treebeard limerently.

"Figure three plots counts of close-ups on male versus female genitals," said Tom pornographically.

"My breasts don't have enough room in this corset," said Victoria, double depressed.

"Bring me the head of my enemy," said Emacs vicariously.

"I have affirming the consequent, base rate neglect, and now also ad hominem," said Tom, getting cocky.

"We lost the treaty, so we had to ratify it again," said Tom, resigned.

"I'm in the group supporting Shiva's wife," said Tom with satisfaction.

Posted on 01 January 2016 | Comments

Unix pro tip: Print and pipe

Sometimes you (or at least, I) want to run a command for its output, but also want to pipe it through another command. For example, see the results of a find but also count how many hits it got. I've sometimes lamented that there's no easy way to do this. But the other day I had a flash of insight and figured out how:

find . | tee /dev/stderr | wc -l
proc1 | tee /dev/stderr | proc2    # general case

(I'm pretty proud of this. I don't know if it's original to me, but I discovered it independently even if not.)

tee will print the output of proc1 to both stdout and stderr. stderr goes to the terminal and stdout goes to proc2.

You can make it more convenient with an alias:

alias terr='tee /dev/stderr | '
find . | terr wc -l

(Putting a pipe in an alias seems to work in both zsh and bash.)

If you want to concatenate the streams, to pipe them to another process, you can use subshells:

proc1 | ( terr proc2 ) 2>&1 | proc3

but note that stderr output from proc2 will also get sent to proc3, unless you send it somewhere else. I haven't yet thought of a use for this.

There are potential issues with buffering here. I'm not aware that tee makes any promises1 about which order it writes the streams in. It's going to be interlacing them while it writes, so that it doesn't need to keep a whole copy in memory. So (if the input is large enough) proc2 will be receiving input before it's finished being written to stderr, and might start writing output, and then the output streams can get interlaced.

For some values of proc2, commands which start printing before they've finished reading, this is inevitable. But I think useful proc2s are likely to be aggregators2 - by which I mean, commands which can't do anything until they've finished reading all their input3. In my tests so far, those have been safe, but that doesn't prove much.

We can do a more reliable check with strace:

find . | strace tee /dev/stderr | wc -l

By the looks of things, tee will read into a buffer, then write it to stdout (the pipe), then write it to the specified target (stderr, which goes to the terminal), and repeat to exhaustion. But the important thing is, it doesn't close any file descriptors until it's finished writing everything, and then it closes the target before it closes stdout. If this is consistent amongst tee implementations - and it seems sensible4 - then aggregators almost definitely won't interlace their output with the output from proc1. I don't want to say "definitely", because there might be other stuff going on that I haven't accounted for. But at any rate, tee will finish writing before the aggregator starts.

Anyway, I see this as being the sort of thing that you're likely use manually, not in an automated process. So if the output does get interlaced a little, it's probably not that big a deal.

  1. Note how I didn't say guaranTEEs? You're welcome.

  2. In fact, wc is the only proc2 I can currently think of, that I expect to be regularly useful. (And then just piping through cat -n does something similar, but it's not ideal.) numsum or perhaps some other tools from num-utils could conceivably be handy from time to time.

  3. You can turn a non-aggregator into an aggregator with moreutils' sponge command, which is like cat but reads all its input before writing anything.

  4. They could reasonably close their outputs in either order, which might make a difference. But I don't think it's usual for programs to explicitly close their std* streams, that gets taken care of by the surrounding infrastructure. So if tee explicitly closes its target, I'd expect that to happen before stdout gets closed.

Posted on 07 October 2015 | Comments

London Cycle Hires and Weather

Preface: I wrote this report for Udacity's "Explore and Summarize Data" module. The structure is kind of strange for a blog post, but I'm submitting the finished report essentially unchanged.

One thing I will note. I find that the cycle hire usage doesn't change much throughout the year. Shortly after submitting, I read this article which finds that it does vary quite a lot. I'm inclined to trust that result more. It's intuitively sensible, and it looks directly at the number of rides taken, instead of looking at a proxy like I do.

Take this as evidence for how much to trust my other results.

My goal is to investigate usage of the London cycle hire scheme, and in particular how it varies with the weather. I'm running an analysis from July 2013 to June 2014.

I'm using two data sets here. Daily weather data comes from Weather Underground, using the weather station at London Heathrow airport.

(London City Airport is closer to the bike stations that I use, but the data from that airport reports 0 precipitation on every single day. The data from Heathrow seems to be more complete, and I expect it to be almost as relevant.)

I collected the cycle hire data myself, over the course of the year, by downloading CSV files from an unofficial API which now appears to be defunct. It has a granularity of about ten minutes. That's about 50,000 entries per docking station for the year, so for this analysis, I'm only using the data from four docking stations near my office.

All data and source code used for this project can be found in the git repository.

Exploring the weather data


plot of chunk temp.1v plot of chunk temp.1v

These variables measure the minimum, average, and maximum daily temperatures. The graphs all look similar, and overlap a lot. The shape is a little surprising, as I didn't expect the density graphs to be bimodal. It could potentially be caused by significant differences between summer and winter, with an abrupt shift between the two.


plot of chunk rain.1v plot of chunk rain.1v

According to the rain column, There are over 225 rainy days and only about 125 non-rainy days. But by far the most common bin for precip.mm is the leftmost one. Table of values of precip.mm:

##     0  0.25  0.51  0.76  1.02  2.03  3.05  4.06  5.08   6.1  7.11  7.87 
##   207    35    20     9    17    22    12     8    12     4     4     2 
##  8.89  9.91 10.92 11.94 13.97 
##     3     5     2     1     2

Although more than half of observations have rain == TRUE, more than half of them also have precip.mm == 0, which needs more investigation. Rainfall as measured by precip.mm versus as measured by rain:

plot of chunk rain.precip.2v plot of chunk rain.precip.2v

The two measures don't always agree. Sometimes rain is false but precip.mm is nonzero; and often rain is true but precip.mm is zero. Neither of those is surprising individually: if rain is only counted when the rainfall exceeds a certain threshold, then that threshold could be large (giving false/nonzero) or small (giving true/zero). But the combination suggests that that isn't what's going on, and I don't know what is.

This table counts the anomalies by turning precip.mm into a boolean zero/nonzero (false/true) and comparing it to rain:

##         FALSE TRUE
##   FALSE   119    9
##   TRUE     88  149

There are 88 instances of true/zero, 9 instances of false/nonzero, but the cases where they agree are the most common.

I find precip.mm to me more plausible here. I feel like fewer than half of days are rainy. This website agrees with me, saying that on average, 164 days out of the year are rainy (rain - 237, precip.mm - 158).


plot of chunk wind.1v plot of chunk wind.1v

These three measures of wind speed are all averages. wind is simply the average wind speed over a day. wind.max is the daily maximum of the average wind speed over a short time period (I think one minute). gust is the same thing, but with a shorter time period (I think 14 seconds).

Unlike with temperature, the three measures look different. All are right-skewed, although gust looks less so. There are several outliers (the isolated points on the box plots), and the quartiles don't overlap. The minimum gust speed (about 24) is almost as high as the median wind.max.

Exploring the bike data

Time between updates

plot of chunk dt.1v

There are a few outliers here. Not all the lines are visible due to rendering artifacts, but above 5000, we only have five entries:

##                     name        prev.updated             updated
## 46779    Earnshaw Street 2013-10-03 08:50:23 2013-10-13 09:20:28
## 46899  Southampton Place 2013-10-03 08:50:22 2013-10-13 09:20:27
## 46918       High Holborn 2013-10-03 08:50:24 2013-10-13 09:20:30
## 47049         Bury Place 2013-10-03 08:50:26 2013-10-13 09:20:32
## 175705 Southampton Place 2014-06-20 17:36:06 2014-06-30 08:30:03

The first four of these happened when my collection script broke and I failed to realize it. The other occurred when Southampton Place was taken out of service temporarily.

Let's zoom in on the lower ones:

plot of chunk dt.1v.left

There are several instances where the time between updates is unusually large, on the order of hours or days. The times of entries with between 2000 and 5000 minutes between updates:

##                     name        prev.updated             updated
## 32650       High Holborn 2013-08-31 15:10:07 2013-09-02 12:30:05
## 32660         Bury Place 2013-08-31 15:10:08 2013-09-02 12:30:07
## 32672  Southampton Place 2013-08-31 15:10:05 2013-09-02 12:30:04
## 32674    Earnshaw Street 2013-08-31 15:10:06 2013-09-02 12:30:05
## 38546       High Holborn 2013-09-14 22:39:00 2013-09-16 08:24:22
## 38719         Bury Place 2013-09-14 22:39:02 2013-09-16 08:24:23
## 38734  Southampton Place 2013-09-14 22:38:58 2013-09-16 08:24:20
## 38735    Earnshaw Street 2013-09-14 22:38:59 2013-09-16 08:24:21
## 84066         Bury Place 2013-12-27 15:40:08 2013-12-29 23:10:14
## 84069       High Holborn 2013-12-27 15:40:06 2013-12-29 23:10:13
## 84073  Southampton Place 2013-12-27 15:40:05 2013-12-29 23:10:11
## 84078    Earnshaw Street 2013-12-27 15:40:05 2013-12-29 23:10:12
## 84186    Earnshaw Street 2013-12-30 00:10:05 2013-12-31 13:10:07
## 84202       High Holborn 2013-12-30 00:10:06 2013-12-31 13:10:09
## 84269  Southampton Place 2013-12-30 00:10:05 2013-12-31 13:10:06
## 84330         Bury Place 2013-12-30 00:10:07 2013-12-31 13:10:11
## 89443  Southampton Place 2014-01-12 20:20:10 2014-01-14 18:40:07
## 89459       High Holborn 2014-01-12 20:20:13 2014-01-14 18:40:11
## 89467         Bury Place 2014-01-12 20:20:14 2014-01-14 18:40:16
## 89524    Earnshaw Street 2014-01-12 20:20:11 2014-01-14 18:40:09
## 121381   Earnshaw Street 2014-03-15 14:50:06 2014-03-17 01:50:04
## 121398      High Holborn 2014-03-15 14:50:07 2014-03-17 01:50:05
## 121444        Bury Place 2014-03-15 14:50:10 2014-03-17 01:50:07
## 121591 Southampton Place 2014-03-15 14:50:05 2014-03-17 01:50:04
## 133765      High Holborn 2014-04-11 16:59:37 2014-04-14 01:29:07
## 133900   Earnshaw Street 2014-04-11 16:59:36 2014-04-14 01:29:05
## 133961        Bury Place 2014-04-11 16:59:38 2014-04-14 01:29:08
## 134027 Southampton Place 2014-04-11 16:59:35 2014-04-14 01:29:05

It looks like these happened to all stations simultaneously, suggesting problems with either my collection script or the API, rather than problems with individual locations.

Entries with less than 60 minutes between updates, no longer on a log scale:

plot of chunk dt.1v.60

In the vast majority of cases, updates are approximately ten minutes apart. This encourages me to take a subset of the data (bikes.all -> bikes), considering only entries with d.updated less than 15 minutes. This eliminates many outliers in future graphs.

Date and time of update

plot of chunk date.time.1v plot of chunk date.time.1v

All times of day are approximately equally represented to within ten minutes, which is good. There are five noticeable troughs preceeded by spikes, but they probably don't signify much. Dates are a lot less uniform, however. Even apart from the ten-day period where my script was broken, many days have significantly fewer updates than typical, and some have none at all.

Number of days spent with a given number of active docks

plot of chunk ndocks.time.2v

It was common for every station to report less than a full complement of docks. At least two had a full complement for less than half the time (High Holborn and Bury place are unclear in that respect). This isn't surprising, since a bike reported as defective will be locked in, using up a slot but not being available for hire.

Journeys taken throughout the year

plot of chunk date.journeys.2v

The time of year makes very little difference to the number of rides. There appears to be a slight sinusoidal relationship, but it's very weak. (I didn't do a PMCC test because that assumes that any relationship is linear, which we would naively expect not to be the case here, and also doesn't look true from the graph.)

Journeys by weekday

plot of chunk weekday.journeys.2v plot of chunk weekday.journeys.2v plot of chunk weekday.journeys.2v

Fewer journeys are taken on weekends. The median number of bikes available doesn't change much throughout the week (5 on monday and friday, 4 on other days), but the distribution does. Saturday and Sunday have noticeably different shapes to the others. They have a single peak, while weekdays are somewhat bimodal, with a small peak where the station is full (probably when people are arriving at work).

(Since the stations have different numbers of docks, I did a graph of fullness rather than of number of bikes. The density plot doesn't show peaks exactly at 0 and 1 because of how the density window works, but histograms of num.bikes and num.spaces show that that's where they are. It would be difficult to use a histogram for this graph because there's no sensible binwidth.)

Change in number of bikes between updates

plot of chunk bikes.prevbikes.name.mv

##  Pearson's product-moment correlation
## data:  bikes$num.bikes and bikes$prev.num.bikes
## t = 2466.8, df = 173250, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9859301 0.9861908
## sample estimates:
##      cor 
## 0.986061

There's very strong correlation between the number of bikes in adjacent entries. This is as expected, especially given what we saw about d.num.bikes previously. The colors here don't show any particular station-dependent trends.

Number of bikes at any given time