I'd like to coin a term. The Sally-Anne fallacy is the mistake of assuming that somone believes something, simply because that thing is true.
The name comes from the Sally-Anne test, used in developmental psychology to detect theory of mind. Someone who lacks theory of mind will fail the Sally-Anne test, thinking that Sally knows where the marble is. The Sally-Anne fallacy is also a failure of theory of mind.
In internet arguments, this will often come up as part of a chain of reasoning, such as: you think X; X implies Y; therefore you think Y. Or: you support X; X leads to Y; therefore you support Y.
So for example, we have this complaint about the words "African dialect" used in Age of Ultron. The argument goes: a dialect is a variation on a language, therefore Marvel thinks "African" is a language.
You think "African" has dialects; "has dialects" implies "is a language"; therefore you think "African" is a language.
Or maybe Marvel just doesn't know what a "dialect" is.
This is also a mistake I was pointing at in Fascists and Rakes. You think it's okay to eat tic-tacs; tic-tacs are sentient; therefore you think it's okay to eat sentient things. Versus: you think I should be forbidden from eating tic-tacs; tic-tacs are nonsentient; therefore you think I should be forbidden from eating nonsentient things. No, in both cases the defendant is just wrong about whether tic-tacs are sentient.
Many political conflicts include arguments that look like this. You fight our cause; our cause is the cause of [good thing]; therefore you oppose [good thing]. Sometimes people disagree about what's good, but sometimes they just disagree about how to get there, and think that a cause is harmful to its stated goals. Thus, liberals and libertarians symmetrically accuse each other of not caring about the poor.
If you want to convince someone to change their mind, it's important to know what they're wrong about. The Sally-Anne fallacy causes us to mistarget our counterarguments, and to mistake potential allies for inevitable enemies.
Posted on 09 April 2016
|
Comments
I've created an interactive graph of historical levels of political polarization in the US House of Representatives. It would be tricky to embed in this blog, so I'm only linking it. Summary:
The x-axis on this graph is based on DW-NOMINATE left-right scores of each member of each U.S. House of Representatives from 1865 to 2015. This uses a member's voting record to measure the direction and extremity of their political views, regardless of party affiliation.
If a member's score on this axis is known, it's possible to predict their vote on any given issue with high confidence, given no other information about the member. Members whose votes are typically left-aligned receive negative scores, while members whose votes are typically right-aligned receive positive scores.
(However, see The Thin Blue Line That Stays Strangely Horizontal, which questions the validity of DW-NOMINATE.)
Background: I made this last year for a Udacity course, "Data Visualization and D3.js". I needed to submit a visualization for marking, and this was my submission. I'm grateful for feedback provided by Udacity and by some of my friends. Without that, the result would certainly have been worse.
The source is available on GitHub, including datasets and some python scripts I used to process them. The README also documents some of the design history.
I'm aware of one bug: in firefox (38.6.1 on linux), the legend appears to display the 5-95 and 25-75 percentile boxes identically. They're implemented as rects with fill-opacity: 0.3: the 25-75 boxes have two of these rects on top of each other. This is also how the paths on the graph itself are colored.
I assume there are other bugs.
Posted on 26 February 2016
|
Comments
I realized a few years ago that I was at least somewhat faceblind/prosopagnosic. A while back I took an online test out of curiousity, and scored low. They said that if I was in London and interested in further tests, I should leave my email address. A few days ago I went in for those tests, and now I have a PhD student (Katie) also telling me I'm faceblind. Which makes it official, I guess.
Next she wants to run EEGs on me, which should be cool. That will help work out where my brain is going wrong, in the long chain between "photons stimulate nerve endings in my eyeballs" and "I recognize a face" (whatever that means). Also, apparently there's a phenomon which sounds to me like blindsight, where some prosopagnosics' brains are clearly reacting to faces on some level that doesn't reach their consciousness. She wants to learn more about that too.
What follows is discussion of the tests, my scores, and what they mean. I've been given a powerpoint with my scores reported as percentiles, along with absolute scores and average control scores. 2% or lower is counted as "impaired". Percentiles are only given as integers, or as "<1%". On the day, Katie also gave me some numbers in terms of standard deviations (σ). Under a normal distribution, 2.5% would be approximately -2σ, but I'm not sure any of these results will be normally distributed, so I don't know if σ scores really tell me anything.
A note: if you think you might be faceblind, and you'd be interested in getting more detailed tests, it might be a good idea not to read the below. I expect it wouldn't significantly bias the results if you did, except for one bit that I've rot13ed. But I don't trust myself to make that call. If you're in London, you can take the above test like me and see what happens. Otherwise I'm not sure how you'd go about getting more tests.
The object/face recognition tests were "memorise these things, then we show you a sequence of things and you have to say if each of these things was a thing in the first set". The things were houses, cars, horses, and bald women's faces. I was bad at all of these: 4% for cars, 2% for houses, and <1% for horses and women. (Average score was higher for women than horses, and my score was higher for horses than women, so I'm worse at women than horses. I think Katie told me I was somewhere between -5σ and -6σ for women. Under normality, -5σ is one in thirty million, but this is clearly not normal.) So it seems I have some level of general object agnosia, but more specific prosopagnosia on top of that.
I was 11% for reading emotions from eyes, which is a point for Team Phil Does Not Have Aspergers (some of my friends are divided about that). In fact, the average score is 26 and I scored 23, and there were a few cases where I said an answer, then thought "wait no it's this" and didn't say anything because I wasn't sure if I should. (I was speaking my answer and Katie was recording it. I had to choose from four emotions, so I'm not sure why this wasn't recorded by a computer like most of the other tests.) So plausibly I'm actually above 11%.
I was <1% at famous face recognition, recognising five out of fifty that I'd been exposed to, out of sixty in total. (I got Jvyy Fzvgu, Uneevfba Sbeq, Dhrra Ryvmnorgu, Ebova Jvyyvnzf, and surprisingly Ovyy Pyvagba.) It seems that controls tend to get above 40, so even counting that "exposed to" is vague, I did really badly at this. I think Katie said I was -9σ, which would be one in 10^19 under normality.
I'm <1% at the Cambridge Memory Test for Faces, which is the one I linked above. I actually scored worse in the lab than online. (47% versus 58%, IIRC, with a control average of 80%, and 60% indicating impairment. But the lab score I've been given is 34 against control of 58, so it's clearly been adjusted.) There could be any number of reasons for this, including "chance". But when I took it online, I often thought that one of the faces looked a little like Matt Damon, and chose that one. I like to think that "mistaking people for Matt Damon" is the way forward in face recognition.
I was somewhat okay at half of the Cambridge Face Perception Test. In this one, they showed me a face at an angle, and below it the same face face-on, six times, with varying degrees of modification. I had to order them according to how similar each was to the original face, within a minute. For half the tests, the faces were all upside down. For all of the tests, they all looked incredibly similar and my instinctive reaction was WTF.
On the upright test, I got <1%. On the inverted test, I got 7%. One strategy I used a few times was to focus on the lips, specifically on the size of the dip in the bow. I just ordered them according to that. I guess it helps, but I found it a lot easier for inverted faces.
Doing better at inverted would seem to suggest that I'm doing some kind of holistic face processing that goes wrong and blocks off later avenues for perception. Buuut, objectively I scored worse on the inverted faces, just not as much worse as average, so I'm not sure if this does suggest that. (And I'm not sure it is "objectively" - if all the faces had been assigned to the other condition, would my scores have changed?)
Hypothetically, high scores on both tests could indicate my problem is with memory, not initial perception. The low score here doesn't mean I don't have a problem with memory, but it does seem to hint that I do have a problem with initial perception. And I suspect the famous faces test points at me also having a memory problem.
Posted on 19 January 2016
|
Comments
Inspired by Slate Star Codex
Explaining a joke is like dissecting a frog: it's one way to learn about frogs. If you want me to explain any of these, ask, and I will explain without making fun of you.
"I hear someone playing a triangle in the corridor," said Tom haltingly.
"We've got to overturn every last insect in this garden," said Tom flippantly.
"Goose feathers are annoyingly fragile," said Tom, breaking down.
"Anastasia gives me pins and needles," said Christian gratingly.
"I miss my submissive," René opined.
"I didn't do it, and nor did any of my siblings," Tom insisted.
"It's not much paint, it won't hurt to run your tongue up it," said Tom metallically.
"I'm so sick, even my flu has the flu," said Tom metallurgically.
"It's pitch black and I can hear a monster doing arithmetic," said Tom gruesomely.
"Man City don't hold a candle to the real heros of Manchester," said Tom manually.
"I just bought Manchester United," said Tom virtuously.
"Lancelot told me I was his favourite!" said Tom, surprised.
"I don't think this tube of semen is entirely straight," said the incumbent.
"I can fit inside my GameCube!" said Tom inconsolably.
"In a former life, I was a priest in pre-Columbian Peru," said Tom inconsolably.
"I need a name for my squid-and-flatfish restaurant," said Tom inconsolably.
"I make a living as the red tellytubby," said Tom, apropos.
"I'm doing crunches so I can get a six-pack," said Treebeard absently.
"I'm half-fish and made of lithium," said Treebeard limerently.
"Figure three plots counts of close-ups on male versus female genitals," said Tom pornographically.
"My breasts don't have enough room in this corset," said Victoria, double depressed.
"Bring me the head of my enemy," said Emacs vicariously.
"I have affirming the consequent, base rate neglect, and now also ad hominem," said Tom, getting cocky.
"We lost the treaty, so we had to ratify it again," said Tom, resigned.
"I'm in the group supporting Shiva's wife," said Tom with satisfaction.
Posted on 01 January 2016
|
Comments
Sometimes you (or at least, I) want to run a command for its output, but also want to pipe it through another command. For example, see the results of a find but also count how many hits it got. I've sometimes lamented that there's no easy way to do this. But the other day I had a flash of insight and figured out how:
find . | tee /dev/stderr | wc -l
proc1 | tee /dev/stderr | proc2 # general case
(I'm pretty proud of this. I don't know if it's original to me, but I discovered it independently even if not.)
tee will print the output of proc1 to both stdout and stderr. stderr goes to the terminal and stdout goes to proc2.
You can make it more convenient with an alias:
alias terr='tee /dev/stderr | '
find . | terr wc -l
(Putting a pipe in an alias seems to work in both zsh and bash.)
If you want to concatenate the streams, to pipe them to another process, you can use subshells:
proc1 | ( terr proc2 ) 2>&1 | proc3
but note that stderr output from proc2 will also get sent to proc3, unless you send it somewhere else. I haven't yet thought of a use for this.
There are potential issues with buffering here. I'm not aware that tee makes any promises about which order it writes the streams in. It's going to be interlacing them while it writes, so that it doesn't need to keep a whole copy in memory. So (if the input is large enough) proc2 will be receiving input before it's finished being written to stderr, and might start writing output, and then the output streams can get interlaced.
For some values of proc2, commands which start printing before they've finished reading, this is inevitable. But I think useful proc2s are likely to be aggregators - by which I mean, commands which can't do anything until they've finished reading all their input. In my tests so far, those have been safe, but that doesn't prove much.
We can do a more reliable check with strace:
find . | strace tee /dev/stderr | wc -l
By the looks of things, tee will read into a buffer, then write it to stdout (the pipe), then write it to the specified target (stderr, which goes to the terminal), and repeat to exhaustion. But the important thing is, it doesn't close any file descriptors until it's finished writing everything, and then it closes the target before it closes stdout. If this is consistent amongst tee implementations - and it seems sensible - then aggregators almost definitely won't interlace their output with the output from proc1. I don't want to say "definitely", because there might be other stuff going on that I haven't accounted for. But at any rate, tee will finish writing before the aggregator starts.
Anyway, I see this as being the sort of thing that you're likely use manually, not in an automated process. So if the output does get interlaced a little, it's probably not that big a deal.
Posted on 07 October 2015
|
Comments
Preface: I wrote this report for Udacity's "Explore and Summarize Data" module. The structure is kind of strange for a blog post, but I'm submitting the finished report essentially unchanged.
One thing I will note. I find that the cycle hire usage doesn't change much throughout the year. Shortly after submitting, I read this article which finds that it does vary quite a lot. I'm inclined to trust that result more. It's intuitively sensible, and it looks directly at the number of rides taken, instead of looking at a proxy like I do.
Take this as evidence for how much to trust my other results.
My goal is to investigate usage of the London cycle hire scheme, and in particular how it varies with the weather. I'm running an analysis from July 2013 to June 2014.
I'm using two data sets here. Daily weather data comes from Weather Underground, using the weather station at London Heathrow airport.
(London City Airport is closer to the bike stations that I use, but the data
from that airport reports 0 precipitation on every single day. The data from
Heathrow seems to be more complete, and I expect it to be almost as relevant.)
I collected the cycle hire data myself, over the course of the year, by downloading CSV files from an unofficial API which now appears to be defunct. It has a granularity of about ten minutes. That's about 50,000 entries per docking station for the year, so for this analysis, I'm only using the data from four docking stations near my office.
All data and source code used for this project can be found in the git repository.
Exploring the weather data
Temperature
These variables measure the minimum, average, and maximum daily temperatures.
The graphs all look similar, and overlap a lot. The shape is a little
surprising, as I didn't expect the density graphs to be bimodal. It could
potentially be caused by significant differences between summer and winter, with
an abrupt shift between the two.
Rainfall
According to the rain column, There are over 225 rainy days and only about 125 non-rainy days. But by far the most common bin for precip.mm is the leftmost one. Table of values of precip.mm:
##
## 0 0.25 0.51 0.76 1.02 2.03 3.05 4.06 5.08 6.1 7.11 7.87
## 207 35 20 9 17 22 12 8 12 4 4 2
## 8.89 9.91 10.92 11.94 13.97
## 3 5 2 1 2
Although more than half of observations have rain == TRUE, more than half of them also have precip.mm == 0, which needs more investigation. Rainfall as measured by precip.mm versus as measured by rain:
The two measures don't always agree. Sometimes rain is false but precip.mm is nonzero; and often rain is true but precip.mm is zero. Neither of those is surprising individually: if rain is only counted when the rainfall exceeds a certain threshold, then that threshold could be large (giving false/nonzero) or small (giving true/zero). But the combination suggests that that isn't what's going on, and I don't know what is.
This table counts the anomalies by turning precip.mm into a boolean zero/nonzero (false/true) and comparing it to rain:
##
## FALSE TRUE
## FALSE 119 9
## TRUE 88 149
There are 88 instances of true/zero, 9 instances of false/nonzero, but the cases where they agree are the most common.
I find precip.mm to me more plausible here. I feel like fewer than half of days are rainy. This website agrees with me, saying that on average, 164 days out of the year are rainy (rain - 237, precip.mm - 158).
Wind
These three measures of wind speed are all averages. wind is simply the average wind speed over a day. wind.max is the daily maximum of the average wind speed over a short time period (I think one minute). gust is the same thing, but with a shorter time period (I think 14 seconds).
Unlike with temperature, the three measures look different. All are right-skewed, although gust looks less so. There are several outliers (the isolated points on the box plots), and the quartiles don't overlap. The minimum gust speed (about 24) is almost as high as the median wind.max.
Exploring the bike data
Time between updates
There are a few outliers here. Not all the lines are visible due to rendering artifacts, but above 5000, we only have five entries:
## name prev.updated updated
## 46779 Earnshaw Street 2013-10-03 08:50:23 2013-10-13 09:20:28
## 46899 Southampton Place 2013-10-03 08:50:22 2013-10-13 09:20:27
## 46918 High Holborn 2013-10-03 08:50:24 2013-10-13 09:20:30
## 47049 Bury Place 2013-10-03 08:50:26 2013-10-13 09:20:32
## 175705 Southampton Place 2014-06-20 17:36:06 2014-06-30 08:30:03
The first four of these happened when my collection script broke and I failed to realize it. The other occurred when Southampton Place was taken out of service temporarily.
Let's zoom in on the lower ones:
There are several instances where the time between updates is unusually large, on the order of hours or days. The times of entries with between 2000 and 5000 minutes between updates:
## name prev.updated updated
## 32650 High Holborn 2013-08-31 15:10:07 2013-09-02 12:30:05
## 32660 Bury Place 2013-08-31 15:10:08 2013-09-02 12:30:07
## 32672 Southampton Place 2013-08-31 15:10:05 2013-09-02 12:30:04
## 32674 Earnshaw Street 2013-08-31 15:10:06 2013-09-02 12:30:05
## 38546 High Holborn 2013-09-14 22:39:00 2013-09-16 08:24:22
## 38719 Bury Place 2013-09-14 22:39:02 2013-09-16 08:24:23
## 38734 Southampton Place 2013-09-14 22:38:58 2013-09-16 08:24:20
## 38735 Earnshaw Street 2013-09-14 22:38:59 2013-09-16 08:24:21
## 84066 Bury Place 2013-12-27 15:40:08 2013-12-29 23:10:14
## 84069 High Holborn 2013-12-27 15:40:06 2013-12-29 23:10:13
## 84073 Southampton Place 2013-12-27 15:40:05 2013-12-29 23:10:11
## 84078 Earnshaw Street 2013-12-27 15:40:05 2013-12-29 23:10:12
## 84186 Earnshaw Street 2013-12-30 00:10:05 2013-12-31 13:10:07
## 84202 High Holborn 2013-12-30 00:10:06 2013-12-31 13:10:09
## 84269 Southampton Place 2013-12-30 00:10:05 2013-12-31 13:10:06
## 84330 Bury Place 2013-12-30 00:10:07 2013-12-31 13:10:11
## 89443 Southampton Place 2014-01-12 20:20:10 2014-01-14 18:40:07
## 89459 High Holborn 2014-01-12 20:20:13 2014-01-14 18:40:11
## 89467 Bury Place 2014-01-12 20:20:14 2014-01-14 18:40:16
## 89524 Earnshaw Street 2014-01-12 20:20:11 2014-01-14 18:40:09
## 121381 Earnshaw Street 2014-03-15 14:50:06 2014-03-17 01:50:04
## 121398 High Holborn 2014-03-15 14:50:07 2014-03-17 01:50:05
## 121444 Bury Place 2014-03-15 14:50:10 2014-03-17 01:50:07
## 121591 Southampton Place 2014-03-15 14:50:05 2014-03-17 01:50:04
## 133765 High Holborn 2014-04-11 16:59:37 2014-04-14 01:29:07
## 133900 Earnshaw Street 2014-04-11 16:59:36 2014-04-14 01:29:05
## 133961 Bury Place 2014-04-11 16:59:38 2014-04-14 01:29:08
## 134027 Southampton Place 2014-04-11 16:59:35 2014-04-14 01:29:05
It looks like these happened to all stations simultaneously, suggesting problems with either my collection script or the API, rather than problems with individual locations.
Entries with less than 60 minutes between updates, no longer on a log scale:
In the vast majority of cases, updates are approximately ten minutes apart. This encourages me to take a subset of the data (bikes.all -> bikes), considering only entries with d.updated less than 15 minutes. This eliminates many outliers in future graphs.
Date and time of update
All times of day are approximately equally represented to within ten minutes, which is good. There are five noticeable troughs preceeded by spikes, but they probably don't signify much. Dates are a lot less uniform, however. Even apart from the ten-day period where my script was broken, many days have significantly fewer updates than typical, and some have none at all.
Number of days spent with a given number of active docks
It was common for every station to report less than a full complement of docks. At least two had a full complement for less than half the time (High Holborn and Bury place are unclear in that respect). This isn't surprising, since a bike reported as defective will be locked in, using up a slot but not being available for hire.
Journeys taken throughout the year
The time of year makes very little difference to the number of rides. There appears to be a slight sinusoidal relationship, but it's very weak. (I didn't do a PMCC test because that assumes that any relationship is linear, which we would naively expect not to be the case here, and also doesn't look true from the graph.)
Journeys by weekday
Fewer journeys are taken on weekends. The median number of bikes available doesn't change much throughout the week (5 on monday and friday, 4 on other days), but the distribution does. Saturday and Sunday have noticeably different shapes to the others. They have a single peak, while weekdays are somewhat bimodal, with a small peak where the station is full (probably when people are arriving at work).
(Since the stations have different numbers of docks, I did a graph of fullness rather than of number of bikes. The density plot doesn't show peaks exactly at 0 and 1 because of how the density window works, but histograms of num.bikes and num.spaces show that that's where they are. It would be difficult to use a histogram for this graph because there's no sensible binwidth.)
Change in number of bikes between updates
##
## Pearson's product-moment correlation
##
## data: bikes$num.bikes and bikes$prev.num.bikes
## t = 2466.8, df = 173250, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9859301 0.9861908
## sample estimates:
## cor
## 0.986061
There's very strong correlation between the number of bikes in adjacent entries. This is as expected, especially given what we saw about d.num.bikes previously. The colors here don't show any particular station-dependent trends.
Number of bikes at any given time
The correlation also looks strong between the number of bikes at each station at any given time. Since they're all close to each other, that's not surprising. The time is a big factor, with large numbers of bikes in the stations during office hours, and few numbers in the evening and early morning. There's a slight dip around 1pm, which could be related to people using them on their lunch breaks.
This graph gives an overview of global trends, but I mostly use the bikes at specific times. We can zoom in on those:
Number of slots available at 0930
(when I'm trying to arrive at work)
This is a proportional frequency plot: within each facet of the graph, the heights of the bins add up to 1. Only weekdays are considered.
About 40% of the time, Earnshaw street has no spaces. That's actually less than I'd realized. It's directly outside my office, and I haven't even been checking it because I'd assumed it was always full.
And at 0940
(in case I'm running late)
If I'm late, I have slightly less chance of finding a docking station, but not much less.
Combining the two
Journeys taken on rainy vs. non-rainy days
Here, rain is the original variable in the dataset, and rain2 simply measures whether precip.mm is nonzero. We have graphs looking at d.num.bikes on each type of day, and tables comparing its mean absolute value.
## Source: local data frame [2 x 2]
##
## rain mean(abs(d.num.bikes))
## 1 FALSE 0.5160167
## 2 TRUE 0.4156172
## Source: local data frame [2 x 2]
##
## rain2 mean(abs(d.num.bikes))
## 1 FALSE 0.4824637
## 2 TRUE 0.4073405
## Source: local data frame [4 x 3]
## Groups: rain
##
## rain rain2 mean(abs(d.num.bikes))
## 1 FALSE FALSE 0.5184101
## 2 FALSE TRUE 0.4755501
## 3 TRUE FALSE 0.4351990
## 4 TRUE TRUE 0.4042656
Earlier I said I feel like precip.mm is more accurate than rain. Despite that, rain seems to be capturing something that precip.mm doesn't, because bike usage responds slightly more to it. This would seem to suggest that days where rain is true but precip.mm is zero have less bike usage than average; and indeed this is what we see.
Taking rain to be our measure, slightly over 70% of observations had no bikes added or removed on rainy days, and slightly under 70% on non-rainy days. The mean absolute difference is about 25% higher on non-rainy days.
Foggy versus non-foggy days
## Source: local data frame [2 x 2]
##
## fog mean(abs(d.num.bikes))
## 1 FALSE 0.4488018
## 2 TRUE 0.4568736
Journeys by temperature and wind:
##
## Pearson's product-moment correlation
##
## data: bikes$t and abs(bikes$d.num.bikes)
## t = 31.414, df = 173250, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.07057403 0.07993830
## sample estimates:
## cor
## 0.07525782
##
## Pearson's product-moment correlation
##
## data: bikes$wind and abs(bikes$d.num.bikes)
## t = -22.389, df = 173250, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.05840721 -0.04901677
## sample estimates:
## cor
## -0.05371317
Unlike rain, it seems that fog, wind and temperature make approximately no difference. The mean absolute difference in number of bikes is about the same regardless of fog, and the correlation between that and temperature/wind is close to zero.
Number of bikes at any given time, depending on rain:
Rain reduces the variance, with fewer bikes during office hours and more outside of them.
Reformatting
With the data in the current format, not all the questions we want to ask are easy. For example: how does the number of bikes at one station correlate with another at any given time? I previously said it “looks strong”, but that's pretty vague.
To answer questions like that, we need to be somewhat forgiving with our definition of 'any given time'. Updates don't necessarily happen simultaneously, so we need to bin them together.
I'm going to create bins ten minutes wide, and assign every observation to a bin. Then in each bin, we can ask how many bikes were at each station. Using this, we can check correlation between each station:
Correlations range between 0.703 and 0.758, and the scatter plots and density histograms all look pretty similar. Does the correlation depend on time? Let's go for 0930, 1800, midnight, and noon.
The correlations are almost all lower. That surprised me, but I think it's an example of Simpson's paradox. I note that the darkest points in the graph are at midnight, with no bikes in any station much of the time. Bikes are periodically moved in vans to account for anticipated demand; I assume that these stations are emptied most nights to prepare for people coming to work in the morning.
An interesting point is that the weakest correlation on any of the graphs is 0.149, between Earnshaw Street and Bury Place at 1800. But the strongest correlation at a specific time is 0.757, also between those two stations, at 0930.
We also see the density charts sometimes having very different shapes, especially at 0930 and 1800. But this seems to be at least partly to do with the way that ggpairs chooses the axes on its density plots. For example, here's 0930:
The troughs look a lot less significant now.
We can view a histogram of the total number of bikes available at different times:
We see heavy leftward skews overnight, with much flatter (but somewhat right-skewed) distributions during office hours, and gradual transitions between the two.
We can also check correlation between times more distant than a single tick. If I check the slots available when I leave the house, can I learn how many will be there when I arrive?
##
## Pearson's product-moment correlation
##
## data: at.0900 and at.0930
## t = 68.675, df = 1228, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8785868 0.9017383
## sample estimates:
## cor
## 0.8907389
This is good correlation! Does it depend on the rain?
##
## Pearson's product-moment correlation
##
## data: at.0900[rain] and at.0930[rain]
## t = 55.466, df = 816, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8737218 0.9025687
## sample estimates:
## cor
## 0.8890242
##
## Pearson's product-moment correlation
##
## data: at.0900[!rain] and at.0930[!rain]
## t = 39.748, df = 410, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8692649 0.9093735
## sample estimates:
## cor
## 0.8910456
Not much, if at all.
We can construct a model
##
## Call:
## lm(formula = at.0930 ~ at.0900, data = spaces.0900.0930)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.334 -1.502 0.561 1.708 16.477
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.43496 0.15556 -9.225 <2e-16 ***
## at.0900 0.97899 0.01426 68.675 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.802 on 1228 degrees of freedom
## (69 observations deleted due to missingness)
## Multiple R-squared: 0.7934, Adjusted R-squared: 0.7932
## F-statistic: 4716 on 1 and 1228 DF, p-value: < 2.2e-16
with an R2 of 0.79, which is okay. But this isn't the best we can do, because it groups all stations together. Ideally we would create one model per station, with inputs from every station.
##
## Call:
## lm(formula = at.0930 ~ sp + hh + bp + es, data = spaces.tmp[spaces.tmp$name ==
## "Southampton Place", ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.566 -1.857 -0.148 1.152 15.420
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.79625 0.47090 -1.691 0.0919 .
## sp 0.74101 0.04179 17.731 <2e-16 ***
## hh 0.05424 0.05307 1.022 0.3075
## bp 0.11811 0.05092 2.320 0.0210 *
## es 0.07909 0.04550 1.738 0.0832 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.17 on 296 degrees of freedom
## (17 observations deleted due to missingness)
## Multiple R-squared: 0.736, Adjusted R-squared: 0.7324
## F-statistic: 206.3 on 4 and 296 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = at.0930 ~ sp + hh + bp + es, data = spaces.tmp[spaces.tmp$name ==
## "High Holborn", ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.4068 -1.1295 0.1503 1.2304 8.3941
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.87268 0.29894 -9.610 < 2e-16 ***
## sp 0.08354 0.02653 3.149 0.00181 **
## hh 0.76021 0.03369 22.567 < 2e-16 ***
## bp 0.09533 0.03232 2.949 0.00344 **
## es 0.15937 0.02888 5.518 7.5e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.012 on 296 degrees of freedom
## (17 observations deleted due to missingness)
## Multiple R-squared: 0.8349, Adjusted R-squared: 0.8327
## F-statistic: 374.2 on 4 and 296 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = at.0930 ~ sp + hh + bp + es, data = spaces.tmp[spaces.tmp$name ==
## "Bury Place", ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.3465 -1.3008 0.3121 1.4809 9.4211
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.28068 0.32734 -13.077 < 2e-16 ***
## sp 0.18778 0.02907 6.460 4.32e-10 ***
## hh 0.03132 0.03687 0.850 0.396253
## bp 0.91255 0.03538 25.796 < 2e-16 ***
## es 0.11197 0.03160 3.543 0.000459 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.201 on 295 degrees of freedom
## (18 observations deleted due to missingness)
## Multiple R-squared: 0.8969, Adjusted R-squared: 0.8955
## F-statistic: 641.5 on 4 and 295 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = at.0930 ~ sp + hh + bp + es, data = spaces.tmp[spaces.tmp$name ==
## "Earnshaw Street", ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.8978 -1.4508 0.3118 1.3272 11.8323
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.98653 0.35005 -8.532 7.60e-16 ***
## sp 0.05579 0.03107 1.796 0.0735 .
## hh 0.03405 0.03945 0.863 0.3887
## bp 0.17361 0.03785 4.587 6.65e-06 ***
## es 0.83329 0.03382 24.638 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.356 on 296 degrees of freedom
## (17 observations deleted due to missingness)
## Multiple R-squared: 0.8579, Adjusted R-squared: 0.856
## F-statistic: 446.9 on 4 and 296 DF, p-value: < 2.2e-16
Southampton Place has slightly regressed, but the others have improved slightly. In particular, Bury Place gets an R2 of 0.89, which is pretty good. (It's important to note that this doesn't make our model worse for Southampton Place than the aggregate model. The aggregate model was just overconfident on that station.)
Final plots and summary
Plot 1
The total number of bikes available changes gradually throughout the day, with few bikes typically available at night, but often many available during the daytime. The distribution looks left-skewed from around 10:00 to 17:00, and right-skewed from around 19:00 to 07:30. The left skew is never as extreme as the right skew, but because the stations have different numbers of slots, that doesn't tell us much.
Plot 2
This time around, I restricted the graph to weekdays only. It's rare for the number of stations to go up between 09:00 and 09:30. All four stations have similar usage patterns.
At 09:00, if there are five or fewer spaces available, it looks as though the most common single outcome at 09:30 is no spaces at all.
Points above the dotted black line are ones where more spaces were available at 09:30 than at 09:00. (Caveat: I've applied slight jittering, so points very close to that line are ones where the same number of spaces were available.) There are obviously much fewer of them. However, the top-left corner of the graph has a few points in it where the bottom-right corner is empty. The number of bikes never goes down by more than eleven, but it goes up by as much as fifteen.
Plot 3
I took advantage of binning to calculate specific summary functions. All stations show similar patterns: at night, there are few bikes available; during office hours, there are almost always some, and the 10-90 percentile range is a lot higher. The trough around 1pm in the previous version of this plot no longer shows up, which makes me suspect it was simply an artifact of the smoothing method.
During the day, the number of bikes available is generally ranked by the number of docking slots at each station - so High Holborn has the least, and Bury Place has the most. When the bikes are taken around 18:00, High Holborn seems to lose them more slowly than the other stations. For Earnshaw Street and especially Bury Place, the 90th percentile lines suggest that those two stations were often completely full.
Reflection
I've learned a lot about how to fight ggplot when it doesn't do exactly what I want by default, and in particular about how to shape my data for it.
I feel like a data frame isn't an ideal structure for the data I have. The fact that I had to create prev.* and d.* copies of those columns that need it seems suboptimal, ideally I would have wanted to be able to refer directly to offset rows in the data. (For example, there's currently no easy way to ask “what's the difference between the number of bikes now and 30 minutes ago?”) But I couldn't find anything that worked better. In particular, time series only allow one data type, so I would have had to fight to use them at all, and I don't know if they would have been any more useful.
My data set itself isn't ideal, particularly in the amount of missing data. Unfortunately, I don't think any better historical bike record data is available. I think I have enough data to trust my conclusions.
In general, it seems that weather doesn't have much impact on bike usage. I checked rain, fog, temperature and wind speed, and only rain made a significant difference. But since the rainfall data seems to be internally inconsistent, I don't know how much we can learn from it. It would be useful to validate it from another source. We might also learn more with finer-grained weather data. For example, when predicting bike availability at a specific time, it doesn't help much if we know whether or not it rained at all on a given day; but it might help more to know whether it was raining at that particular time.
On the other hand, we can make pretty good predictions about future bike (and slot) availability just from current availability. An ambitious future project might be a prediction system. A user could specify a station and an arrival time, and the system could tell her how likely it would be that she could find a slot in that station and nearby ones, and suggest an earlier arrival time that would increase that chance.
One thing I didn't examine was public holidays. For example, we might ask whether, on plot 2 above, many of the points where spaces were freed up fell on holidays. (We can calculate 85 points above the line, and only 8*4 = 32 of them could be on public holidays, but that's still potentially a third of them.)
After initially submitting this report, I noticed a big problem. All timestamps were collected and reported in physical time, but bike usage patterns are going to be related to clock time. So some of my graphs, particularly later ones, were mixing in data from two different clock times (e.g. 09:00 and 10:00) as if they were the same. My first submission was rejected for unrelated reasons, and I've corrected the error in all future versions.
Posted on 19 August 2015
|
Comments
(Content note: minor untagged spoilers for the solutions to certain logic puzzles. No major spoilers.)
(Epistemy note: It’s impossible to talk about how easy/hard a puzzle is that I discovered the answer to years ago, sometimes without working it out for myself. I’m about to do that anyway.)
There seems to be some kind of taxonomy for logic puzzles and how hard they are and the sort of strategies that are needed to solve them.
This might be too subjective to be worth talking about, but I suspect not, and the way to find out is to talk about it and see if people understand me, so without further ado:
Let's vaguely pretend that logic puzzles have a sort of fundamental particle which is the insight. You collect insights, and then you use insights to collect other insights, and eventually you win.
Now I'm going to describe five types of puzzle.
Deep: For example, the blue eyes puzzle. I consider this one to be actually pretty easy. You ask what happens with one blue-eyed person, then two, then three, and just proceed by induction. (Even if you don’t know the terms common knowledge and mutual knowledge and the difference between them, in the two-person case you can say “she tells Alice that Bob can see blue eyes” and in the three-person case you can say “she tells Alice that Bob knows that Carol can see blue eyes” and so on.) There's only one real insight, which you apply lots of times. Or just a couple of times until you get bored and assume/prove the pattern continues.
(I feel like there's a lot of similarity between solving a logic puzzle and proving a math theorem. Here, the math equivalent is proof by induction.)
Wide: Cheryl’s birthday is also quite easy, but what difficulty it has lies in breadth rather than depth. Every line tells you something new, and you need to extract a different insight from each of them.
(Math equivalent: a theorem with a complicated statement but a proof where every line just reads "because X, Y".)
Trivial puzzles are the limiting case of either of the above. I don't know any famous ones, but the blue-eyed puzzle with one blue-eyed person would count.
(Math equivalent: modus ponens.)
Verify: Then there are problems where as far as I know you kind of just have to intuit (or guess) the solution, and then demonstrate that it works. (The demonstration is a recursive problem that deserves its own classification, or possibly doesn't even count as a logic puzzle. Often it is just math.) The key insight is hidden away, with no hints as to what it might be. The fork in the road is a good example of this, I think, where the demonstration has reasonable claim to being trivial.
It’s a very fuzzy and personal category, because there are patterns that you may be able to recognize which would move it towards wide.
(Math equivalent: a proof which introduces some new device that you don't know where it came from, but it turns out to be exactly what you need.)
Learn: And there are probably also problems where, if you happen to know the right thing and it comes to mind at the right time, this will be trivial (or wide or…). Otherwise you’ll have to deduce thing for yourself, good luck with that. I can’t offhand think of any canonical examples.
This category may be even fuzzier and more personal, and probably overlaps a lot with verify, and is hard to detect except in hindsight. (If you can work out what you need to know, it moves towards wide or verify.)
Of course every puzzle has a thing which is the solution to that particular puzzle; and many puzzles rely on the ability to do basic arithmetic or other such skills. Those obviously "shouldn't count" to clasify a puzzle as learn, but I don't have a principled way to distinguish them from things that should.
(Math equivalent: a proof in complex analysis that relies on an obscure theorem from number theory.)
I don't intend for these descriptions to be fully general. I think that they're pointing at real clusters in puzzlespace, but don't fully capture the essences of those clusters.
Even if they did, I expect these clusters are going to fail to capture a lot of variation, and attempting to shove things into them will sometimes be a silly thing to do. But let's make a start with them, and try to classify some problems that come to mind:
This transfinite epistemic logic puzzle challenge is mostly a combination of deep and wide I think, but also some learn because I wouldn't recommend attempting it if you don't know about order types. I didn't find it overly challenging. (Except that reading the solution, I no longer remember if I got the answer exactly correct. I might have made a fencepost error.)
I haven't solved the sum and product puzzle, but as far as I've got has been a wide. I'm pretty sure I could finish it with brute force, but I suspect there's an elegant way to do it which is either a verify or a learn, possibly involving number theory.
The hardest logic puzzle ever is the fork in the road on steroids. Given the solution, proving that that solution works would be a moderately challenging wide puzzle. Generating it from scratch seems to require multiple levels of verify and wide.
The hundred prisoners problem is very verify. The verification step is a combinatorics problem. (I feel like there are a lot of problems based on having a hundred prisoners and an intellectually peverse gatekeeper, and I feel like most of them are verify. Here's another example.)
This card trick/puzzle is a verify. (I'm pretty sure I have a solution, but I've only verified it in my head, and I didn't fully work out the details. My solution is very different to the posted one. Arguably, the posted solution has a learn on the pigeonhole principle and mine doesn't.)
The pirate game is a little deep and not much else.
Logic grid puzzles are wide. They differ from Cheryl's birthday in that every clue needs to get used multiple times, but each clue-application is still straightforward. That doesn't mean they're easy.
Here are some other features that we might want to talk about.
Logic puzzles often (not always) have a number of characters. Sometimes you know more than/just as much as the characters, and need to work out how each of them will act. Sometimes you know less, and need to work out what they know from how they act.
Sometimes you need to come up with a strategy rather than a specific answer. Sometimes there are multiple strategies. Sometimes there's a foolproof strategy, other times you just need to find one as effective as possible. If that's the case, then it might be difficult to work out exactly how effective it is, even if it seems obviously more effective than the naive strategy; and proving that it's optimal might be harder again. It took three years to prove optimality on the hundred prisoners.
Different puzzles require different "levels" of theory of mind, and I suspect there's a lot of variation in how well people can keep track of those without accidentally collapsing them. "Alfred knows that Bernard doesn't know" might become just "Bernard doesn't know"; and "Alice doesn't know that Bob knows that Carol can see blue eyes" might become just "I have no idea what's going on".
Another future direction to explore might be a taxonomy of video game puzzles. I think there'd be some similar themes, but for example, video games can have progression in a sense that doesn't exist in most of the puzzles I've described so far.
Two senses, in fact. You can progress within a level, when you change the board state. Different games/levels have different attitudes to (a) how many ways you can do this at any given moment, and (b) whether it's possible to make a level unsolvable.
But also a level can teach you something by holding your hand, and in future levels you can use that without having your hand held. So level ten would have seemed impossible if you'd started with it, but after levels one through nine it's just challenging enough to be fun. Logic puzzles can have this: the fork in the road teaches you a tool to apply to THLPE. But the level order in video games gets to be designed deliberately. Whereas in real life, I mostly encounter logic puzzles by accident, with no guarantee that I've already encountered any precursors.
(When I was playtesting Sokobond, Alan said that I approach puzzles differently to most people. I think I was thinking along the lines of: "okay, the target molecule will only fit here, which means that I need to move into place from there which means…". This kind of thinking comes naturally to me, and I think it serves me well, most of the time - though I have known myself to occasionally prove that this particular level can't possibly have a solution. It seems well-suited to puzzles with predictable mechanics and clear unwinnable states.)
Posted on 20 May 2015
|
Comments
Suppose you and I are farmers, owning adjacent fields. One day you have a brilliant idea. If we dig a ditch from the nearby river, between our fields, then irrigating our fields becomes a lot less work. It would cost two utils to dig the ditch - one utilon each - and we'd get five utils each from its existence.
You come to me with this suggestion. "That sounds great," I say, "but I don't think I'm going to bother."
You object that I'm being dumb not to take those four utils.
"No," I say, "I just think that if I don't help you, you'll do all the work yourself. You still get three utils if you dig it alone, so you'll do it even if I don't help you. And by not helping, I get five utils instead of four. Why would I pay a utilon to help you?"
(Unfortunately for you, I am a member of H. economicus and have a natural immunity to arguments about "fairness" and "not being a douchebag".)
The farmer's dilemma is game-theoretically equivalent to chicken. Both of us choose to either cooperate by digging the ditch ("swerve" in chicken), or defect by sitting at home ("straight" in chicken). If both of us cooperate ("C/C"), we both get an okay result. If both of us defect ("D/D"), we both get a terrible result. If one of us cooperates while the other defects ("C/D"), then the defector gets the best possible result for themselves, and the cooperator gets a result between C/C and D/D.
If you're cooperating and I'm defecting (or vice versa), then neither of us have any incentive to change our strategies. I could start to cooperate, but then I'd just be giving you utility. You'd like that, but I wouldn't. And you could start to defect, but then you'd be throwing away utility. Neither of us would like that.
On the other hand, if we're both cooperating, then we both have an incentive to defect, as long as the other doesn't do the same; and if we're both defecting, we both have an incentive to cooperate, as long as the other doesn't do the same.
(Formally, there are two Nash equilibria, at C/D and at D/C. This distinguishes it from the prisoner's dilemma, which has an equilibrium at D/D.)
There are lots of ways this story can continue.
In one of them, you dig the ditch yourself. Months later, after harvesting and selling your crop, you sit in the pub complaining that being a farmer is such hard work, you've only come out with three utils of profit this year. Nobody's very sympathetic, because they're comparing you to me, and I've made a cool five utils. Because this is a thought experiment, there's no difference between us other than how we act. So obviously you're doing something wrong.
In another possible continuation, you threaten to burn some of my crops if I don't help you dig. Maybe I help you, maybe not; if not, maybe you were bluffing, maybe not; if not, maybe I call the police on you and you go to jail; or maybe I do help you, but I secretly recorded the conversation and leak it to the press later on… a lot of things can happen. Even if this works out great for you, it's at least a little villainous.
Instead of being so blunt, you might choose to convince everyone else to threaten me. Perhaps you dig the ditch, and then talk to our local government and convince them that you should be allowed to extract rent on it.
Another option is to tell me that if I don't help you dig, you'll spend an extra utilon to build a brick wall on my side of the ditch, so that it doesn't do me any good. If I believe that you'll do it, I'm likely to go ahead and help.
You can also tell me that if I don't help, you're not going to dig at all. Or even that you're simply not going to dig, if I'm going to be an asshole about it I can dig the whole thing myself or go without. Once again, I'm likely to dig if I think you're not bluffing.
(Precommitment is a powerful strategy in many situations. But two can play at that game…)
The real world is usually more complicated than game theory. Here are some variations on the farmer's dilemma:
Maybe I have a bad back, and digging is more costly for me than for you. This may or may not change the Nash equilibria, and it may or may not change the amount of sympathy we each get in the various continuations.
Sometimes there are lots of farmers. In this case, the ditch might cost more to dig than the value it provides to any individual, so that nobody will want to dig by themselves; but little enough that you don't need literally everyone to dig before it becomes valuable, so that everyone still wants to defect individually.
Sometimes the ditch might be an agent in its own right. For example, a company might perform R&D only if someone funds them to do so; and everyone wants them to do it, but would prefer that someone else pays.
(They might not have an explicit agreement for funding with anyone, but acausal trade and early adopters and so on.)
(And having developed a super-awesome version of their product, they might also sell a cheaper version, where they've gone out of their way to disable some of the features. This is like building part of a brick wall against people who only contribute a little to the digging.)
Sometimes the ditch might become more valuable if more people help to dig.
Sometimes the ditch requires constant maintenance. We could model that as a sequence of games, where the payoff structure changes between iterations (and might depend on the results of previous games). The ditch might not become profitable until after several rounds.
Why am I talking about this? I think farmer's dilemma situations come up from time to time in online discussions, and I want to be able to say "let's not be too harsh on AcmeCorp here, they're cooperating in a farmer's dilemma and everyone else is benefiting from that". (I don't want to discuss the specific examples I have in mind because they're kind of mind-killey.)
Although the farmer's dilemma and chicken are game-theoretically equivalent, I think our intuitions about them are different. At any rate, mine are. I can think of two reasons for this. One is that game theory only considers utility up to affine transformations. The games "Global thermonuclear war", where every player loses a million utils, and "Global happiness project", where every player gains a hundred utils, are also equivalent. But in the real world, two people crashing their cars into each other is a worse outcome than two people failing to dig a ditch.
The other reason, which is kind of the same reason, is that game theory assumes you've decided to play. If nobody wants to play chicken, you both get a better outcome than C/C. If nobody notices how valuable a ditch would be, you get the same outcome as D/D.
Another equivalent game is the snowdrift dilemma: the road is covered with snow, and we both want it cleared, but we'd both rather not clear it ourselves. My intuitions about this feel different again. You can't decline to play (except by living somewhere less snowy), but if you could, that would be better than C/C.
So describing a situation as chicken, or snowdrift, or a farmer's dilemma, all seem different. I don't know of an existing name that feels like a good fit for the farmer's dilemma. (For a while I thought the farmer's dilemma was a standard name, but I can't find anything about it online. Wikipedia redirects it to the prisoner's dilemma, but that has a very different structure.)
So it seems like a useful concept with no name. Now it has a name. You're welcome.
Posted on 05 May 2015
|
Comments
Sometimes at LW meetups, I'll want to raise a topic for discussion. But we're currently already talking about something, so I'll wait for a lull in the current conversation. But it feels like the duration of lull needed before I can bring up something totally unrelated, is longer than the duration of lull before someone else will bring up something marginally related. And so we can go for a long time, with the topic frequently changing incidentally, but without me ever having a chance to change it deliberately.
Which is fine. I shouldn't expect people to want to talk about something just because I want to talk about it, and it's not as if I find the actual conversation boring. But it's not necessarily optimal. People might in fact want to talk about the same thing as me, and following the path of least resistance in a conversation is unlikely to result in the best possible conversation.
At the last meetup I had two topics that I wanted to raise, and realized that I had no way of raising them, which was a third topic worth raising. So when an interruption occured in the middle of someone's thought - a new person arrived, and we did the "hi, welcome, join us" thing - I jumped in. "Before you start again, I have three things I'd like to talk about at some point, but not now. Carry on." Then he started again, and when that topic was reasonably well-trodden, he prompted me to transition.
Then someone else said that he also had two things he wanted to talk about, and could I just list my topics and then he'd list his? (It turns out that no I couldn't. You can't dangle an interesting train of thought in front of the London LW group and expect them not to follow it. But we did manage to initially discuss them only briefly.)
This worked pretty well. Someone more conversationally assertive than me might have been able to take advantage of a less solid interruption than the one I used. Someone less assertive might not have been able to use that one.
What else could we do to solve this problem?
Someone suggested a hand signal: if you think of something that you'd like to raise for discussion later, make the signal. I don't think this is ideal, because it's not continuous. You make it once, and then it would be easy for people to forget, or just to not notice.
I think what I'm going to do is bring some poker chips to the next meetup. I'll put a bunch in the middle, and if you have a topic that you want to raise at some future point, you take one and put it in front of you. Then if a topic seems to be dying out, someone can say "<person>, what did you want to talk about?"
I guess this still needs at least one person assertive enough to do that. I imagine it would be difficult for me. But the person who wants to raise the topic doesn't need to be assertive, they just need to grab a poker chip. It's a fairly obvious gesture, so probably people will notice, and it's easy to just look and see for a reminder of whether anyone wants to raise anything. (Assuming the table isn't too messy, which might be a problem.)
I don't know how well this will work, but it seems worth experimenting.
(I'll also take a moment to advocate another conversation-signal that we adopted, via CFAR. If someone says something and you want to tell people that you agree with them, instead of saying that out loud, you can just raise your hands a little and wiggle your fingers. Reduces interruptions, gives positive feedback to the speaker, and it's kind of fun.)
Posted on 14 April 2015
|
Comments
Content note: politics, gender politics.
For a while I've been vaguely aware of a petition to "stop taxing periods. Period." I didn't pay it much attention until today, but now I've looked at it and done way more research than I expected to.
According to the petition,
A 5 per cent tax rate has been placed on sanitary products, while exotic meats walk tax-free. HM Revenue and Customs justified this tax by classifying sanitary products as "non-essential, luxury" items.
At least the first sentence of this is true. Sanitary products have VAT of 5% imposed on them. Exotic meats (including horse, ostrich, crocodile and kangaroo) do not.
Sanitary products are covered by VAT notice 701/18, which reduces the rate on them from the standard rate (currently 20%) to 5%. It applies only to "any sanitary protection product that is designed and marketed solely for the absorption or collection of menstrual flow or lochia (discharge from the womb following childbirth)". That is, this reduction was introduced specifically to reduce tax on sanitary products.
Exotic meats are covered by 701/14, which covers food in general. Most food is zero-rated. There are further exceptions for some things, including chocolate covered biscuits (but not chocolate chip biscuits), which are standard-rated; exotic meats are not one of those things. What seems to have happened here is that the government decided that most food should be zero-rated, and then made a list of foods that shouldn't, and exotic meats didn't happen to make it on to the second list for whatever reason.
I'm less sure about the second sentence, HM Revenue and Customs justified this tax by classifying sanitary products as "non-essential, luxury" items. More from the petition:
After the UK joined the Common Market in 1973, a 17.5% sanitary tax was introduced. It was justified when Parliament classified sanitary products as "non-essential, luxury" items.
I don't think this is true, but I think I can see how the story could have been chinese-whispered into existence:
- When the UK joined the Common Market in 1973, we introduced VAT. The standard rate of VAT used to be 17.5%.
- My current belief is that until 2000 or 2001, sanitary products were standard-rated. It's not that there was a specific "sanitary tax", but VAT was a tax that applied to sanitary products.
- It seems to be widely assumed that VAT exemptions are given to essential items, and not to others. Since sanitary products were standard-rated, they must not have been considered essential, so they must have been considered luxury.
But: In 1973, the standard rate of VAT was 10%. I'd be very surprised if sanitary products were taxed at 17.5% in 1973.
VAT actually replaced purchase tax, which did only apply to luxury goods. So if sanitary products were taxed for the first time in 1973, it's because they weren't considered luxury.
Putting "non-essential, luxury" in quotes would seem to imply that this is a direct quote from something, but the first page of google results for that page all seems to be inspired by the petition at hand. The second page has nothing helpful, the third is back to this petition. I haven't found a source for this phrase.
In fact, I haven't found any official source that suggests that the government thinks sanitary products are non-essential. The evidence for this seems to be purely the fact that they have 5% VAT imposed on them.
But this assumes that VAT is not applied to essential products, which as far as I can tell just isn't true. Here is the entirety of the official-seeming evidence I've found which connects essential-ness to VAT status: a page on gov.uk which says "You pay 20% VAT most of the time - but less on essential items."
One reading of this is that all essential items, and only essential items, are taxed at less than 20%. By this reading, sanitary products are essential, being taxed at 5%.
On the other hand, when you look at the list of VAT exemptions, it includes gambling and antiques, which seem pretty non-essential; and it doesn't include clothes for adults, which are pretty damn essential. (If I don't wear clothes, I will get locked up. I'm forced to pay tax on the clothes that I'm forced to wear. This is almost Kafkaesque, in a really really minor way.)
Just in the healthcare genre, it doesn't include toothpaste or dental floss, most glasses or hearing aids, sticking plasters, paracetamol, or the creams I use for acne and eczema. (I haven't specifically researched most of these. I think I've looked in the places where I would find them if they were covered, but it's possible I've made a mistake.)
It does seem that most of the things in that list are things that most people think should be easily accessible. But if I showed someone that list without telling them what it was, I don't think they'd say "this is a list of things which are essential".
Wikipedia also doesn't use the word "essential".
If we ignore that one linked page, it just doesn't seem like the government classifies things as essential or not, and then assigns them a VAT status based on that classification. It just assigns VAT status. To ask whether the government considers something "essential" seems to be asking a null question. That's not a word that the government, as a body, knows.
There is still that one linked page, but currently I'm dismissing it as an anomaly. Gov.uk is intended to be accessible and accurate, and in this case I suspect that accessibility triumphed over accuracy, and/or someone just made a mistake while writing it.
I worry that I'm simply ignoring contarary evidence here, but consider: the government publishes a list of things. It doesn't put things on the list just for being essential, or deny them from the list for being nonessential. It doesn't correlate very well with what most people would consider essential. I can only find one place where the government comes close to describing it as a list of essential things, and that's in a descriptive context, not a binding one.
If it doesn't look like a duck, quack like a duck, or fly like a duck…
So. In 1973, the government probably did not consider sanitary products to be "non-essential, luxury" items. It just considered them to be things, and it taxed them the same as it taxed most things.
Since 2001, the government almost certainly doesn't consider sanitary products to be non-essential. It taxes them at a reduced rate compared to almost everything else, but at a higher rate than some other things.
Under EU law, the government isn't allowed to zero-rate them. We have other zero-rated things, but they were zero-rated before we joined the EU. The VAT on sanitary products is as low as it is permitted to be.
We might (or, quite likely, might not) be able to exempt sanitary products instead of zero-rating them. But there's an important difference. If you buy raw materials, produce a product, sell it, and have to pay VAT on the sale, then you can claim back the VAT that your supplier paid on the materials. If you don't pay VAT on the sale, you can't claim back that VAT. If sanitary products were exempt rather than not zero-rated, the selling price might (or might not) go up because of this, depending on the cost of raw materials. It's almost certainly more complicated than this, but:
Suppose you buy £30 worth of cloth, of which £6 is VAT. You make a lot of tampons and sell them for £100. You have to pay £5 VAT on that sale, and can claim back the £6 VAT you paid on the cloth. Profit: £100 - £30 - £5 + £6 = £71. If tampons were VAT exempt, you'd need to sell them for £101 to make the same profit.
(The past two paragraphs were written from the perspective that the seller pays VAT. Mostly we seem to talk about the buyer paying VAT. It's all equivalent, but I hope I wasn't too confusing.)
Having established (what I believe to be) the facts, here are my thoughts on the petition itself:
Firstly, it doesn't strike me as a particularly big deal.
Financially, we're talking about £150 a lifetime lost to taxation on sanitary products, assuming £5/month for 50 years. I don't think anyone is claiming it's a particularly heavy burden, it's not about money.
From a gender-politics perspective… it also just doesn't seem like a big deal. If sanitary products were still standard-rated, I could maybe kind of get behind this, even though I don't actually think that standard-rated implies non-essential. But they're at the lowest VAT rate the UK government is allowed to put them at. I just don't see this as a sign that we value crocodile meat more than female participation; or of male-focused agenda setting; or any form of institutional misogyny.
It seems like most of the conversation is fueled by outrage. I don't think there's much here to be outraged about, and I would prefer that the energy being put into this outrage goes elsewhere.
Secondly, I don't really like the tactic of petitioning the UK government over this. The petition acknowledges that they can't do anything under current EU law. So the goal is to get the UK to get the EU to change its laws.
That seems… really unfair. I don't think I can argue clearly and succinctly for that position, which usually means I should spend more time clarifying my thoughts to myself. Instead of doing that, I'm going to argue by analogy. I don't think these analogies are the same in every way, or even in every relevant way; they just seem to me to have some structural similarities to the case at hand.
So, suppose I notice that I've been overcharged on my phone bill. I call up the company, and the customer service rep is very polite and helpful and refunds me. I tell her, "that's not good enough, I also want to be refunded for the price of this phone call to you, that I shouldn't have had to make". She tells me that she simply doesn't have the power to do that. So I tell her that she needs to complain to her managers until they give her that power, and I'm going to keep calling her every day until she can refund me for the cost of this phone call.
This isn't her battle. She's sympathetic to my plight, but she has other priorities. And I'm blackmailing her into fighting for me.
This feels kind of like that kind of power dynamic.
Or: "hey man, I just need 20p for my bus fair, can you help?" "Sorry, I'd like to, but I have nothing smaller than a tenner." "So give me a tenner, don't be a tightass."
There's only so far that you can expect someone to go for you, and I feel like this petition is asking for more than that.
Posted on 22 February 2015
|
Comments
It's widely regarded that documentation is an important task that doesn't get as much attention as it deserves.
There are a lot of proposals for how to fix this, but I think they typcially miss something. "Write the documentation before the code", for example. Sounds good, but a lot of the time, until I write the code, I don't know what it's going to do. Maybe I have a rough idea, but the details are going to evolve. And sometimes I'm just going to get things wrong. Oh, I can't frobnicate the spam here unless I also make the user pass in an egg - this is the sort of detail that would be easy to miss when I'm thinking about a function, but impossible to miss when I'm trying to frobnicate the spam without an egg.
And of course, having written a function and documented it (in whatever order) - you can subsequently rewrite the function without redocumenting it. This is the well-known problem where the code and the docs get out of sync.
You can't have a rule "don't edit the code unless you first edit the documentation", because a lot of code edits don't need doc changes. A more complicated rule like "don't edit the code unless you first edit the documentation, unless the documentation doesn't change" would probably just be forgotten. After a while, whenever I ask myself if the documentation is going to change, I'm just going to return the cached answer "no" without thinking. In any case it has the same problem: I don't know what I'm going to change until I change it.
So here's my half-baked proposal that might help to solve some of the problem of keeping code in sync with documentation:
If your code is in a git (or other VCS) repository, you can look at a function and see whether the code changed more or less recently than the docstring. It would be possible to write a tool which searches for such functions automatically.
Bam, done. The rest is just details. Edit code all you want, and if you forget to update the docs, this tool will remind you next time you run it.
Okay, some details are important. Here are some things to think about:
-
Not every code edit needs you to touch the docstring, so you want a database of some sort, which can be used to mark "the docstring is up to date as of commit A, even though it hasn't been touched since commit B". Keep this database inside your repository. This is probably quite a large detail, with many details of its own.
It might actually be better (certainly simpler) to have some way of marking this in-file. For example, in python:
python
def foo(bar):
"""Do a thing
""" # 2014-01-07 11:25 (edit this comment to mark the docs up to date)
pass
-
Tracking a function across revision changes is probably really hard, in general. In the specific case of "the sort of revision changes that actually occur in real codebases", it might be possible most of the time.
((I would start by working out which lines constitute a function's code and docstring, and running git blame to see which was updated most recently. This might be good enough, but for example it breaks if you swap the order of two functions in the source file. One of them is going to be blamed entirely on that commit. It also breaks if you delete lines from the function body.
A more complicated solution would be to track the function by its identifier. Look at past revisions of the file in its entirety, search for the function, and see whether it's changed. This could go wrong in so many ways, and even if it doesn't, it still won't catch you if you move a function from one file to another.))
-
This probably works best if you run it frequently. You could have it in a hook: if there are unsynced changes, you aren't allowed to push unless you sync them (either by editing the docs, or marking them clean).
-
This could be applied to classes as well as functions, but I'm not sure when you'd want to flag them. Most changes to code inside a method probably won't touch the class docstring, even if they touch the method docstring. Applying it to whole files would be even harder. In general, this will apply better to low-level documentation than high-level.
-
You need custom support for each language and documentation style used, as well as for each VCS. In many cases, supporting a language will mean writing a parser for it. (In python, you can import a module and discover every class and function that it exposes, and their docstrings, but I don't know if you can find their source locations like that.)
Would this work? How much effort would it be to make? Would it be helpful, assuming it did work? For that matter, does something like it already exist?
Posted on 16 January 2015
|
Comments
In which I inexpertly channel Randall Munroe
Soon the seas will turn red with the blood of the human race, as the unspeakable terrors come from beyond the gate, which is Yog Sothoth, to devour all in their path! Ia! Shub Niggurath! Ia! Ia!
— Who will be eaten first?
If you were to mix the blood of the human race into the oceans, how red would they turn?
A quick fermi estimate: there are seven billion humans, with about eight pints of blood each, which is about five litres. So we have about 35 billion litres of blood to work with.
How about the oceans? They cover two thirds of the earth's surface. I know that the surface of the earth is $4πr^2$ where $r$ is 6,400 km. $(64 × 10^5)^2 = 2^{12} × 10^{10} = 4096 × 10^{10}$. $4π$ is approximately ten, so the surface area of the earth is about $4×10^{14}$ m². Two thirds of that is $2.5 × 10^{14}$. I don't know how deep they are, but 100m seems like a reasonable guess at an average. That gives us about $2.5 × 10^{16}$ m³ of ocean.
A cubic metre is a thousand litres, so we have about 25 billion billion litres of ocean. For every drop of blood, we have almost ten billion drops of ocean.
I don't know how blood and water mix, but I'm confident that the oceans will not be running red with blood any time soon, on average. Sorry cultists.
Fact checking: Wolfram Alpha tells me that there are actually about 10¹⁸ m³ of ocean, which is 40 times what I guessed, which is okay for a Fermi estimate. The oceans are still not running red with blood.
NOAA, whoever they are, tell me that the average depth of the ocean is 4km, which is also 40 times what I guessed. That's not great for a Fermi input.
(Initially, I forgot the $4π$ factor in the surface area of the earth, putting me off by 400 times. I'm not sure how to judge myself on Fermi estimates when I get somewhat-reasonable inputs and do the math wrong.)
However! The quote specifically mentioned seas. I've cheated by counting oceans as seas. What if we exclude them? I'm going to do this just by listing the oceans and subtracting their volume from the total. We have:
It's possible that this double-counts the Southern ocean, since some people divide it up between the Pacific, Atlantic and Indian oceans. But I'm going to allow that, to make the seas as small as possible.
In total, this makes $1.29 × 10^{21}$ L of ocean. That's more than the $10^{18}$ m³ I gave earlier, but that was rounded. The actual number given is $1.332 × 10^{21}$ L. So there are $0.042 × 10^{21} ≈ 4 × 10^{19}$ L of seas in the world. This is actually pretty close to my initial guess for the oceans. Still no red.
This does not mean you are safe from the Elder Gods.
Posted on 09 August 2014
|
Comments
I like Python, but one of the things I dislike is the assert statement. Its simplest form provides no help if it fails:
will raise an exception so you can see where it failed, but you don't get to see what x or y were.
There's a longer form,
assert x == y, "%r != %r" % (x,y)
but this is verbose, and evaluates x and y twice. And if x and y are dictionaries nested three deep, it might not be easy to tell what's different between them.
I'm aware of two approaches that improve the situation. nose has a --failure-detail plugin that tries to automatically give you more detail. When
fails, it:
- Finds the source location of the failed assert,
- Reads and parses this line,
- Substitutes variables with their values,
- Reports the substituted line.
This is an awesome hack, and I love that it's possible, but I don't find it all that useful. You still need to play spot-the-difference with deeply nested data structures, but that would be pretty easy to fix. The deeper problem is that it also doesn't help with
assert frobnicate(3) == frobnicate(4)
because there are no variables to replace. (frobnicate is a variable, but IIRC it doesn't substitute functions. I don't remember the exact algorithm it uses.) I had a look at the code, and I don't think it would be possible, in general, to report the values on the LHS and RHS. You'd have to re-evaluate the expressions, and there's no guarantee they'd return the same thing the second time.
The second approach is to get rid of assert statements completely. In a unittest test, you do
and if x != y it tells you what x and y are, with a helpful diff format for dicts, lists and sets.
This is great, but I just don't like writing asserts like that. So here's a new approach:
from bsert import bsert
bsert | x == y
How it works is that bsert | x returns a new object, _Wrapped(x); and _Wrapped(x) == y calls assertEqual(x, y). Other comparison methods are overloaded as well. Now we can do things like:
$ python
Python 2.7.5 (default, Dec 1 2013, 00:22:45)
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from bsert import bsert
>>> bsert | 3 == 3
True
>>> bsert | 3 == 4
Traceback (most recent call last):
...
AssertionError: 3 != 4
>>> bsert | [3] + [4] == [3, 4]
True
>>> bsert | [3] + [4] == [3, 4, 5]
Traceback (most recent call last):
...
AssertionError: Lists differ: [3, 4] != [3, 4, 5]
Second list contains 1 additional elements.
First extra element 2:
5
- [3, 4]
+ [3, 4, 5]
? +++
>>> bsert | {1: {2: 3, 4: 5}, 6: 7} == {1: {2: 4, 4: 5}, 6: 7}
Traceback (most recent call last):
...
AssertionError: {1: {2: 3, 4: 5}, 6: 7} != {1: {2: 4, 4: 5}, 6: 7}
- {1: {2: 3, 4: 5}, 6: 7}
? ^
+ {1: {2: 4, 4: 5}, 6: 7}
? ^
>>> bsert | 1 / 2 != 0
Traceback (most recent call last):
...
AssertionError: 0 == 0
>>> bsert | 1.0 / 2 != 0
True
>>> import time
>>> bsert | time.time() != time.time()
True
>>> bsert | time.time() == time.time()
Traceback (most recent call last):
...
AssertionError: 1399731667.416066 != 1399731667.416123
>>> bsert | [3] * 3 == [3,3,3]
True
>>> bsert | {1, 2, 3} <= { 1,2,3,4}
True
>>> bsert | {1, 2, 3} >= { 1,2,3,4}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "bsert.py", line 28, in __ge__
self.assertGreaterEqual(self.wrapped, other)
File "/usr/lib64/python2.7/unittest/case.py", line 950, in assertGreaterEqual
self.fail(self._formatMessage(msg, standardMsg))
File "/usr/lib64/python2.7/unittest/case.py", line 412, in fail
raise self.failureException(msg)
AssertionError: set([1, 2, 3]) not greater than or equal to set([1, 2, 3, 4])
>>> bsert | 3|8 == 11
True
>>>
There are a few limitations. For one, you can't use chained comparisons, and you won't get any kind of error if you try. The reason is that
cashes out as
(bsert | 3 <= 5) and (5 <= 4)
so there's no way for bsert to know that there's another comparison going on. For two, you can't do
because there's no way to overload the in operator from the left hand side. (In this case, you at least get an AssertionError: 1 != 3 telling you you did something wrong, because a in someList basially does any(a == x for x in someList), and so it fails at bsert | 3 == 1. If you had a dict, set, or empty list on the right hand side, it would just return False and not raise an exception.)
Similarly, bsert | x is y, bsert | x and y and bsert | x or y don't work, because those operators can't be overridden at all. (Even if they did work, they're low precedence, so it would need to be e.g. (bsert | x) and y, which is horrible.) You also can't do
because that just returns _Wrapped(False)
I think all the other operators should work fine, if you're using them in ways that make sense. Most of them have higher-precedence than |, so that for example
cashes out to
The only exception is | itself, and I've added support so that _Wrapped(x) | y returns _Wrapped(x|y).
I don't necessarily recommend that you use bsert. I'm not sure that I will. But it's there.
I've put bsert on github, but it's also short enough that I might as well just post it inline:
import unittest
class _Bsert(object):
def __or__(self, other):
return _Wrapped(other)
class _Wrapped(unittest.TestCase):
def __init__(self, obj):
# TestCase needs to be passed the name of one of its methods. I'm not
# really sure why.
super(_Wrapped, self).__init__('__init__')
self.wrapped = obj
def __eq__(self, other):
self.assertEqual(self.wrapped, other)
return True
def __ne__(self, other):
self.assertNotEqual(self.wrapped, other)
return True
def __le__(self, other):
self.assertLessEqual(self.wrapped, other)
return True
def __ge__(self, other):
self.assertGreaterEqual(self.wrapped, other)
return True
def __lt__(self, other):
self.assertLess(self.wrapped, other)
return True
def __gt__(self, other):
self.assertGreater(self.wrapped, other)
return True
def __or__(self, other):
return _Wrapped(self.wrapped | other)
bsert = _Bsert()
Belated update:
On reddit, Liorithiel informs me that py.test can extract useful failure messages from assert statements. Like what nose does, but implemented differently, so that it can show the values of intermediate expressions in more detail than bsert can. (It rewrites the AST on import time, which is an even more awesome hack than nose's.) As far as I'm concerned, this knowledge makes bsert obsolete.
Meanwhile, obeleh on github has provided a patch which allows bsert to be used with boolean expressions, with different syntax. So
is like assert 3 in [1,2,3], but with a slightly nicer exception message. (The old syntax still applies for expressions using comparison operators.) Now you get
AssertionError: bsert(3 in [1,2,3])
instead of just
It comes at the cost of a slightly longer traceback - I wonder if that can be worked around. And it doesn't provide intermediate expressions at all, so it's kind of a neat trick, but I don't know if it's useful. (Especially since the old traceback had the failing expression near the bottom anyway - there are cases where you'll see the exception but not the traceback, but we're getting into niche territory.) But that's pretty much how I felt about bsert in the first place, so I decided to go ahead and include it.
Posted on 10 May 2014
|
Comments
I don't expect anyone cares about what I think about the movies I've seen lately, so I'm bundling those thoughts into one post to be less annoying.
The Lego Movie
This is the most pure fun I've had watching a movie in a long time. It's going for funny, and it is, the whole way through. As I write this, I saw the movie almost a month ago, and remembering some of the jokes still makes me smile.
The attention to detail is also impressive. I think that if I rewatched this, I'd catch a bunch of stuff that I missed the first time. Going frame-by-frame would help at times.
Highly recommended.
Divergent
I found Divergent decent while watching it, but no more than that; and thinking about it afterwards only makes it worse.
It's pretending to be a failed-utopia story: at some point in the past, someone came up with a really dumb idea ("let's end conflict by separating everyone into five factions based on personality traits!") and apparently society rolled with it. Now the heroine, along with the audience, learns that it's a really dumb idea ("oh hey, it turns out that some people have more than one personality trait, guess we'll have to kill them"). So far, so Gattaca.
But in this case, just in case you weren't convinced that the faction idea was dumb, you get to see what happens when the faction system meets mind-control drugs. The answer is "pretty much what happens when mind-control drugs enter any society", but because it's happening to the faction system, it's the faction system that looks bad.
There's also a sort of Mary-Sue thing going on, but a viewer insert rather than an author insert. In the film world, people supposedly have only one of the traits brave/honest/friendly/selfless/clever, except for the rare divergents. So the message is "if you have two or more of these traits, you are rare and special and immune to mind-control drugs, and people will hate you for it", which is probably a message that will appeal to angsty teenagers.
Captain America: The Winter Soldier
This is an action film. It has people shooting people, and punching people, and jumping through glass windows. I have it on good authority that a lot of people enjoy that sort of thing.
There's a political-thriller plot in there, as well, but not a particularly interesting one. Director Fury recommends delaying the project to create a doomsday device that will only be used to kill bad people. Director Fury can't be trusted, because he recently hired a merc to hijack a ship owned by SHIELD (I never quite worked out why he did this). Also, someone has just killed Director Fury. Full speed ahead on the doomsday project!
There's the standard trope that when things seem hopeless for the heros, we find out that they set things up in advance to control for this very circumstance. And the same with the villains. This keeps the tension high, or something.
Brain uploads are possible, and have been possible since the seventies or so, but no one really cares, and only one person has been uploaded. The upload commits suicide to try to kill the heros, where any normal villain would have an escape plan, or at least some lines about how he's giving his life in the service of Evil; we don't get that here, because people aren't people unless they're made out of meat. (Actually, I assume that the Marvel universe has many people who aren't made of meat, but are still given personhood-status. Maybe the rule is "unless they look like us, except that they're allowed to be a different color"?)
(There's a similar thing with cryonics, but it's not clear whether normal humans can be suspended and revived, or just superhumans.)
The Winter Soldier thing feels tacked on. He's an assassin who is approximately as good at fighting as the hero is. It also turns out he's the hero's brainwashed childhood friend, returning from the first movie. The hero gets kind of sad about this, but they fight anyway. In the end, the Soldier saves the hero's life, but he only gets to do it because the hero was trying to save his life, and they don't have a big moment of reconcilliation, it just kind of happens. I guess the idea is to set something up for the third movie, but in this one, there was no reason that character couldn't have just been someone new, or even completely removed from the film.
Really though, I think the main thing I'd like to change about the film is: to a first approxmation, nobody gets hurt. People fight, and everyone who isn't a named character dies, and the named characters don't have a scratch on them despite getting repeatedly punched in the face. I know some people enjoy watching this, but I find it kind of boring, and I think that if we saw the hero sustain a split lip and a bloody nose, I might find it a lot more engaging because suddenly it might feel like he's actually in danger. (I can forgive that people don't get cut when they crash through glass windows. If they did, they would be a lot less willing to keep doing it.)
Of course, this film is only a 12A. If you show people getting hurt in fights, kids might not be allowed to watch it any more.
I did like how there was no romantic subplot.
Also, faceblind world problems: a minor character suddenly unleashes some ass kicking. Everyone looks at her like wtf. She presses a button, her face shimmers as a hologram disappears, and suddenly she looks exactly the same. (She then proceeds to remove her wig, so it's not too confusing.)
Noah
Everybody in this film fails ethics and/or metaethics forever.
Spoilers ahead, even if you've read the Bible. I can't be bothered to mark them all.
Noah thinks the world would be better off if all the people were dead, including himself and his family. He thinks his task is to save the animals and then to die.
I can accept the hypothesis that the rest of the world is evil. It's a stretch, but I think we don't see anyone outside Noah's family do anything good, so let's roll with it, and say that sure, they deserve to die.
What about Noah and his family? He explains this to his wife, Naameh, pointing out their flaws to her. Their youngest son, Japheth, seeks only to please. Or something like that. I agree that that's a character flaw, but if it's the worst you can say about someone, they're a pretty good person. I don't remember what he said about Shem and Ham - I think maybe Shem's flaw was being sexually attracted to his girlfriend Ila (in Shem's defence, Emma Watson), and Ham wanted a girlfriend too much - but it definitely wasn't "deserves to die" grade stuff. And Noah and Naameh would both kill to protect the children, which apparently makes them just like everybody else.
I'm not being entirely fair. The world is beautiful, humans ruined it once, and however good Japheth may be, we can't trust that his kids will be as good as him, and their kids will be as good as them, and so on. We can turn this into a parable about AI safety and the Löbian monster, but I don't think that was the intent.
Okay, fine. Humans need to die off. That's not Noah's problem. Noah's problem isn't even necessarily that he thinks God is talking to him, because that does happen. I'm not sure I can pinpoint exactly what Noah's problem is. Maybe it's that he seems to think that he and he alone is vital to God's plans.
When Ham tells Noah that he wants a wife, Noah tells Ham that God will provide everything they need. Later Noah goes to look for a wife for Ham. When he fails, he decides that God doesn't want Ham to have a wife, and that's when he works out that his family isn't meant to repopulate the Earth. When Ham finds a wife for himself (well, a scared lonely little girl, but in those days, what was the difference?), Noah abandons her to die.
Noah's thought process seems to be: if God wants something to happen, He will do it himself or He will work through Noah. So when Methuselah blesses Ila, and Ila subsequently becomes pregnant, the idea that this might have been God's will doesn't cross Noah's mind, and he proclaims that if the child is a girl (capable of repopulating the Earth), he will kill her.
(Incidentally, Methuselah blesses Ila when Ila is looking for Ham, who needs to be found fairly urgently. Their conversation wastes minutes, and after he blesses her, she goes and finds Shem - also looking for Ham - for making out, leading to an implied offscreen shag. Ham is forgotten.)
This brings us to the problem that most of the other characters share: they are far too passive. No one apart from Noah is cool with the human race dying off, but when Noah announces his plans, they don't do very much about it. Naameh tells him that if he kills the child, she will hate him. Japheth sends out one bird at a time to try to find land before Ila gives birth, maybe so that they can run away from Noah? Shem and Ila build a small raft, with food for a month, and plan to desert the Ark. But they neglect to make any secret of this, and when Noah burns it, they're surprised.
I am probably a little too quick to say that fictional characters should be killed, in circumstances where that wouldn't fly in the real world. But in fiction, the stakes tend to be a lot higher. In this case, the stakes are the survival of the human race. The obvious thing to do is to kill Noah.
Don't run away on a raft. The Ark is safer. You are the last hope for the human race. Kill Noah.
When Noah has attempted to kill your children, and discovered that he can't do it, kill him anyway. He hasn't changed his mind about what God wants, he just thinks he's failed God, and do you trust him not to try again? Kill him. (Eventually, Noah does change his mind, when Ila suggests that maybe God deliberately left the choice to Noah. Now Noah believes he's not only vital to God's plans, but is permitted to shape them.)
Well, Shem eventually makes a cursory attempt, at least. But it's not good enough. NPCs, the lot of them.
There are two more characters. Tubal-Cain is the villain. He thinks that man should have dominion over nature. He also decides that since God has forsaken them, so much for ethics, and he'll kill anyone he wants.
Ham is the only one of Noah's sons to express any independence. His motivation is that he wants a wife. When Noah fails to find him one, he goes to find one himself. He rescues a girl from the human settlement, they run away back to the Ark together, and when she gets caught in a bear trap with the bad guys in hot pursuit, he tries to save her. Noah arrives to save him, and he trusts Noah to save her as well, which he doesn't.
When Tubal-Cain breaks on to the Ark, Ham finds him and becomes his protégé. Sort of like Anakin Skywalker being turned against the Jedi Council by Senator Palpatine. But when it comes down to it, Tubal-Cain and Noah are fighting, and Ham decides to kill Tubal-Cain instead. (At this point, Tubal-Cain tells Ham, he has become a man. Tubal-Cain returns the magical glowing snakeskin that he took when he killed Noah's father. It's the skin shed by the snake that tempted Eve. It's all very significant, presumably, but I don't know what it signifies.)
Meanwhile, God is classic Old Testament God. He's not very communicative, which causes Noah no end of confusion; He only ever intervenes at the last minute, because He loves a fight scene as much as the rest of us; and apparently "flood the world" was the best plan He could come up with for making everything cool again.
Apart from all the ethics, the film is still pretty bad. The script is uninspired. Tubal-Cain gets some villainous monologues and a rousing pre-battle speech, but they pretty much consist of him talking about how much killing he's going to do. I did enjoy the visuals, and Russell Crowe and Emma Watson do the best they can.
I do not recommend Noah.
Ocho Apellidos Vascos (Eight Basque Surnames / Spanish Affairs)
This reminded me a lot of Intouchables. That was a French feelgood comedy-drama which ended up making a lot of money. This is a Spanish feelgood romcom which is making a lot of money. Both are heavily based on the cultural divide between the main characters. Both are well-made, but very paint-by-numbers.
Apellidos features the boy and the girl meeting. The boy thinking there's something Special between them. The girl disagreeing. The two of them getting up to crazy antics. The girl changing her mind. The boy rejecting her, and then changing his mind. The assumption that these people, who first met a week ago, and now think they're in love, will live happily ever after.
In this case, the boy is Andalusian, and the girl is Basque. I don't know much about Spanish politics, and the film doesn't go out of its way to explain (why would it?), but I get the impression that there's a great deal of tension between the Basque people and everyone else in Spain; much like Northern Ireland and the rest of the UK, back in the days when the IRA was bombing pubs and cars and stuff (okay, I don't know too much about local politics either). Accordingly, just about every character in the film is incredibly casually racist. Sometimes that's played for laughs, and probably exaggerated; other times it seems entirely plausible. It was kind of distracting: I kept thinking there is no way you could show this if you substituted Basque for Black. But then, maybe you could if you went for Irish instead.
I found the subtitles hard to follow at times. There were some words which weren't translated (I think maybe these were spoken in the Basque language), and they were easy enough to pick up but required slightly more effort than normal. And there were a number of references to things I'm not familiar with, which I had to piece together from context; again, not difficult to do, but not effortless.
Mostly, I thought this film was funny, sweet, and forgettable.
Posted on 19 April 2014
|
Comments
I'm a fan of the board/card game The Resistance, but I feel like the base game is significantly unbalanced in favour of the spies. I think that if the spies play well, they will usually win regardless of how well the resistance plays; and I think that playing well simply involves not acting like spies. (To be precise, I think that if the spies always act in public the way they would if they weren't spies, the game becomes very easy for them; they don't need to execute complicated plots to try to throw suspicion on an innocent, and they don't need to subtly communicate with each other about which of them is going to throw a failure card down on this mission. This isn't necessarily easy for them, but it's fairly simple in principle, and I expect it becomes easier with practice.)
To test this, I intend to analyse a similar, simpler game which I call Resistance-B. "B" for "brainwashed", because in this game the spies don't know they're spies, and they don't know who the other spies are either. However, any time they go on a mission, their programming takes over and they sabotage it without knowing what they're doing. If there are multiple spies on a mission, they all sabotage it.
This probably wouldn't be very fun to actually play as a group (and you'd need a GM, or some other way to find out how many spies were on a mission without the spies themselves knowing). But it seems clear that, provided all players are good at acting, this is a variant which is strictly harder for the spies to win. Their strategy in this game is available in plain Resistance (always act like a resistance member in public, but sabotage all missions), but some options have been removed, and the resistance members know that this is their strategy.
This game is also a lot easier to analyse than the original. Everyone has the same information, so there's no need to vote on teams. In fact, we can think of it as a one-player game against a random opponent; all the opponent does is pick the spies, and the player needs to pick teams to execute three successful missions.
My hypothesis is that for many game sizes, the spies will have considerably greater than even odds of winning. To be less vague, I expect the spies to have a disadvantage in five-player games at least1, but for six I'm not sure which way it would go, and by eight I expect the spies to be usually winning. (I also expect the game generally gets harder for the resistance as you add players, except that nine players is obviously easier than eight. Having one more exception wouldn't surprise me much, but two would. (If ten is easier than eight, that doesn't count as an exception, because ten is clearly harder than nine. However, it would also surprise me a little if ten was easier than eight.))
1. Although I don't remember specifically predicting this prior to beginning the analysis below. I got as far as "40% of the time, the resistance wins just based on the first round" before I started to write this post.
Note that if I'm wrong, that doesn't mean the original Resistance is unbalanced, although it's evidence against it (strength depending on just how badly wrong I am); but if I'm right, the Resistance certainly is unbalanced.
For people not familiar with the original game, here are the rules of Resistance-B. You have a specified number of players2, some of whom are "resistance members" (good guys) and some of whom are "government spies" (bad guys). A third of the players, rounded up, are spies, but you don't know which. The game progresses in up-to-five rounds. In each round, you select a specified number of your players, and find out how many of them were spies. If none of them were spies, you win that round. If any of them were spies, you loose that round. Your goal is to win three rounds. The number of players selected in each round depends on the total number of players, and the round number (the table is given on wikipedia). In addition, for seven players and up, you win round four unless you select two or more spies, not one or more.
2. The way I'm presenting the game, these aren't real players. You are "the player", and the game contains fictional players whom you control.
Resistance-B is easy to analyse case-by-case for five players (three resistance, two spies). With no information, we select the starting team (two players) randomly. There are three equivalence classes of outcomes:
- No spies ($p = { 3 \choose 2 } / { 5 \choose 2 } = 0.3$). In this case, the resistance always wins. Mission three also only takes two members, so it's a guaranteed success. On each of missions 2, 4 and 5, take one of the remaining players until you find the one who isn't a spy.
- Two spies ($p = 1 / { 5 \choose 2 } = 0.1$). Now we know exactly who the spies are. Easy win.
- One spy ($p = 0.6$, by elimination). We have players A through E, and precisely one of A and B is a spy, and precisely one of C, D, E is a spy. For mission two, our choices are (wlog) ABC and ACD. (CDE is stupid, because we gain no information and are guaranteed the mission will fail).
- ABC is a guaranteed fail, but
- If we get two failure cards ($p = 1/3$), we know C is a spy. We can't afford to fail any more missions, and we can't distinguish between A and B. So we send AD on mission 3; if it succeds ($p = 1/2$) we win, if it fails ($p = 1/2$) we lose.
- If we get one failure card ($p = 2/3$), we know C is not a spy. One of AB, and one of DE, is a spy; we can't afford to send any more spies on missions, so we have a $1/4$ chance of winning.
So ABC gives us a $1/3$ chance of winning.
- ACD has a ${1 \over 2} \cdot {1 \over 3} = 1/6$ chance of succeding, but if it does we know who the spies are and win. It has a ${1 \over 2} \cdot {2 \over 3} = 1/3$ chance of getting two failure cards; now we know BE are good, A is bad, and have a 50/50 chance of selecting between CD to win. And there's a $1/2$ chance of getting one failure card.
In this case, $1/3$ of the time, AE are spies and BCD are not. $2/3$ of the time, the spies are B and one of CD. Again, we can't afford to fail any more missions. So we can choose either BCD or ACE to go on the remaining missions, and we'll have a $1/3$ chance of winning.
So ACD has a ${1 \over 6} + \left({1 \over 3} \cdot {1 \over 2}\right) + \left({1 \over 2} \cdot {1 \over 3}\right) = 1/2$ chance of winning.
So if there's one spy amongst AB, we select team ACD for mission two, and win half the time.
In total then, we win the five-player game 70% of the time. That's not surprising.
The six-player analysis could probably be done analagously without too much trouble, since there are still only two spies. But the mission sizes go 2-3-4-3-4, which means that if we select no spies in the first mission, it's not obvious that we win, and if we select one spy we also have the choice of selecting CDE for mission two… the decision tree gets a lot more complicated, and I'd rather try to solve this problem intelligently (to work for game sizes 7+ as well). I haven't done this yet, though; it's a job for future-me.
Posted on 29 March 2014
|
Comments
A photo from a (different) recent LW London meetup
Cross-posted to LessWrong
I wasn't going to bother writing this up, but then I remembered it's important to publish negative results.
LessWrong London played a few rounds of paranoid debating at our meetup on 02/02/14. I'm not sure we got too much from the experience, except that it was fun. (I enjoyed it, at any rate.)
There were nine of us, which was unwieldy, so we split into two groups. Our first questions probably weren't very good: we wanted the height of the third-highest mountain in the world, and the length of the third-longest river. (The groups had different questions so that someone on the other group could verify that the answer was well-defined and easy to discover. I had intended to ask "tallness of the third-tallest mountain", but the wikipedia page I found sorted by height, so I went with that.)
I was on the "river" question, and we did pretty badly. None of us really knew what ballpark we were searching in. I made the mistake of saying an actual number that was in my head but I didn't know where from and I didn't trust it, that the longest river was something like 1,800 miles long. Despite my unconfidence, we became anchored there. Someone else suggested that the thing to do would be to take a quarter of the circumference of earth (which comes to 6,200 miles) as a baseline and adjust for the fact that rivers wiggle. I thought, that's crazy, you must be the mole. I think I answered 1500 miles.
In reality, the longest river is 4,100 miles, the third longest is 3900 miles, and the mole decided that 1800 was dumb and he didn't need to do anything to sabotage us. (I don't remember what the circumference-over-four person submitted. I have a recollection that he was closer than I was, but I also had a recollection that circumference-over-four was actually pretty close, which it isn't especially.)
The other team did considerably better, getting answers in the 8,000s for a true answer of 8,600.
I'd devised a scoring system, where every player submits their own answer, and non-moles score proportional to $-\left|\log\left({\text{given answer} \over \text{true answer}}\right)\right|$; the mole scores the negative mean of everyone else's score. But after calculating it for a few people, we decided we didn't really care, and we probably wouldn't be playing enough rounds for it to become meaningful.
Those questions weren't so great, because we felt there wasn't much you could do to approach them beyond having some idea of the correct answer. For round two we tried to pick questions more amenable to Fermi estimates: annual U.S. electricity consumption (sourced from Wolfram Alpha), and the number of pennies that could fit inside St. Paul's Cathedral. This round, we gave the correct answers to the moles.
I was on team Cathedral, and again did pretty badly. We started by measuring pennies using notepaper for scale, and calculating packing densities, to work out pennies per cubic metre. (I don't remember the answer, but we got pretty close.) But after that it was just a matter of knowing how large St. Paul's Cathedral was likely to be.
I had been stood outside St. Paul's Cathedral a few weeks back, but mostly facing in the wrong direction while tourists took photos of the group I was with. From that vague recollection I thought maybe it was about a thousand metres square at the base, and four stories so about twelve metres high? (Later I looked at a picture of the Cathedral, and realized that I was probably thinking of the dimensions of the entrance hall.) Someone else, who had actually been inside the cathedral, was giving much higher numbers, especially for the base, and someone else was agreeing with his general ballpark. And the other people weren't saying much, so I figured one of those two had to be the mole, and decided to not update very much in that direction. Those numbers were pretty reasonable, and mine were pretty bad. One of them was the mole (not the one I most suspected); I don't remember what he said his strategy was.
Again, I'm not too sure it was a great question; pennies-per-cubic-metre is a question of geometry rather than estimation, and the interior dimensions of St. Paul's Cathedral don't seem much more Fermi estimable than the river question.
The other team got very close to the answer I'd given the mole. Apparently they actually came up with a number off the top of someone's head that was pretty damn close. Embarassingly, the answer I gave the mole was an order of magnitude too high…. I'd sourced it from Wolfram Alpha ahead of time, but then someone asked me to clarify whether it was total energy usage, or just electricity usage. I looked again to check, and I think I used a different query, saw that it said "electricity usage", and didn't see that the number was different. The answer I actually gave was for energy usage.
The mole on that team reported that it wasn't really helpful to know the correct answer without any intermediate steps. That mechanic might be worth experimenting further, but currently I think it doesn't add much, and it's a mild extra hassle when setting up.
I had fun, and hopefully in future I will put much less trust in my estimates of the dimensions of things, but I wouldn't say the session was a particular success. Not a failure either, just kind of "it happened".
Posted on 16 February 2014
|
Comments
A book sometimes cited on LessWrong as recommended reading is E.T. Jaynes' Probability Theory: The Logic of Science. I intend to write a series of posts reading this book, summarizing the key points, and solving the exercises. (There are no solutions in the book.)
The book has over 750 pages. This will take me a long time, if I finish at all. I'm not committing to finishing. For example, if this turns out not to be a thing worth doing, I hope that I will notice that and stop. I'm also not committing to any particular posting rate while I continue.
Preface
Jaynes starts by telling us the audience of his book: readers with some knowledge of applied math "at the advanced undergraduate level or preferably higher". Prior experience with statistics and probability is unnecessary and might be harmful.
Next we get a brief history of the book. The rules of probability theory have been known for centuries, but in the mid 20th century, these were discovered to be rules for statistical inference in general, unique and inescapable. You cannot violate them without producing absurdity. These discoveries interested Jaynes, and he started giving lectures on them, and eventually they turned into a book.
Jaynes contrasts his system of probability with that of Kolmogorov and de Finetti. Kolmogorov's system looks totally different, but Jaynes finds himself agreeing with it on all technical points, and considers it merely incomplete. De Finetti's system looks similar to Jaynes', but has very little agreement on technical matters. (I don't know what Kolmogorov's and de Finetti's systems actually look like.)
One point is touched on briefly, which I think will become a large theme of the book: infinite set paradoxes. Jaynes claims that these appear to come about because of failure to properly define the objects we're working with. In Jaynes' system, an infinite set can only be specified as a well-behaved limit of finite sets. Sometimes you can arrive at the same set in two different ways, and you can ask questions of it which depend on the limiting process used; but if the limiting process is specified, answers become unique and paradoxes disappear.
PTTLOS is sometimes hailed as a champion of Bayesian statistics over frequentist ones, and indeed we learn Jaynes' view on the dispute. He has always been Bayesian, but previously it was for philosophical reasons. Now he claims that theorems and worked-out examples demonstrate Bayesian superiority independent of philosophy. But he also says that neither is universally applicable, and that both methods fall out as special cases of Jaynes' approach.
Frequentist methods have certain problems; Bayesian methods correct these problems, but can only be applied with a certain amount of knowledge of the problem at hand. The "principle of maximum entropy" provides enough structure to use Bayesian methods when we lack knowledge.
Jaynes claims that probability theory also models certain aspects of human reasoning. Someone who tells the truth, and whose listeners are reasoning consistently, but who is not believed. A policy debate where discussion of the issues causes society to polarize into two camps with no middle ground, rather than to bring about a consensus as might naively be expected.
The work is not intended to be merely abstract, and Jaynes gives an example of practical applications in safety concerns. We might discover that a substance is harmful at certain doses, and it might be natural to assume a linear response curve: half the dose is half as harmful. But knowing a little about biology tells us that this will be untrue in many cases; there will be a threshold level, below which the substance is eliminated as fast as it enters the body and causes no ill effects. Using a model which ignores the possibility of threshold levels can lead to false conclusions, however good our data is.
Finally in the preface, we are told the style of presentation of the work. Most chapters in part 1 start with verbal discussion of the nature of a problem, before getting into the math (which Jaynes considers the easy part, for many students). Part 2 is more advanced, and chapters open directly with math. Jaynes places an emphasis on intuitive understanding over (but not replacing) mathematical rigor.
Chapter 1: Plausible Reasoning
1.1. Deductive and plausible reasoning.
Suppose a policeman hears a burglar alarm, sees a broken shop window, and a man in a mask climbing out of that window with a bag of jewelry. The policeman will decide that the masked man is dishonest. But this cannot come from purely deductive reasoning: it may be that the man is the owner, on his way home from a masquerade party, and that someone else broke the window and he is merely protecting his own property.
Deductive reasoning follows from two familiar syllogisms: "A implies B; A; therefore B" and "A implies B; not-B; therefore not-A". We often don't have the necessary information to apply these.
There are two weaker syllogisms which we use in inferential reasoning: "A implies B; B is true; therefore A becomes more plausible", and symmetrically, "A implies B; A is false; therefore B becomes less plausible".
(These are related to fallacies of deductive reasoning which say: "A implies B; B; therefore A" and "A implies B; not-A; therefore not-B". But deductive reasoning has only the states "certain", "impossible" and "unknown". Inferential reasoning is more subtle, and lets us say "A becomes more plausible" without saying "A becomes certain".)
The word "implies" is meant to indicate logical consequences, not physical ones. We might say: "if it will start to rain by 10 AM at the latest, then the sky will become cloudy before 10 AM". The rain at 10 AM can hardly be the cause of the clouds before 10 AM. Clouds cause rain, but uncertainly so, so we cannot say "clouds imply rain". "Rain implies clouds" is a true logical connection, but not (directly) a causal one.
And there is a still weaker syllogism which the policeman uses: "if A is true, then B becomes more plausible; B is true; therefore A becomes more plausible". Concretely, "if a man is a burglar, then it becomes more likely that he will wear a mask and climb out of a broken window carrying a bag of jewelry. He is wearing a mask and climbing out of a broken window carrying a bag of jewelry; therefore it becomes more likely that he is a burglar." The syllogism may seem weak abstractly, but we accept that the man is almost certainly a burglar.
(Jaynes does not here mention the natural sixth syllogism: "if A is true, then B becomes more plausible; A is false; therefore B becomes less plausible". But of course this is true too.)
Deductive reasoning is nice, because it never loses strength. An arbitrarily long chain of deductive inferences, if our premises are certain, will produce a certain result. Inferential reasoning does not permit this, but we will still be able to approach near-certainty in many cases, such as with the policeman and the probably-burglar.
1.2. Analogies with physical theories
Jaynes analogizes our study of common sense to that of physics. It's too complicated to learn everything all at once, but sometimes we produce a mathematical model that reproduces some features of our observations, and we consider this progress. We expect our models to be replaced by more complete ones in future. We don't expect that the most familiar aspects of mental activity will be the easiest to model.
1.3. The thinking computer
In principle, a machine can do anything that a human brain can do. Proof by existence: the human brain does it. If we don't know how to make a machine do something, it's because we don't know how to describe the thing in sufficient detail. This is also true of "thinking".
But as we study common sense, we're going to start to learn about thinking in more detail than we currently have. Whenever we make a mathematical model of some aspect of common sense, we can write a computer program to apply this model.
If we have two very different hypotheses, unaided common sense might be enough to choose one over the other. If we have a hundred similar ones, we need a computer, and we need to know how to program it. Our goal is to develop the theory that lets us program it correctly to solve such problems, in as much depth and generality as we can manage.
Talking about machines is also useful for keeping ourselves focused. The reasoning process actually used by human brains is complicated and obscure, and hard to talk about without becoming side tracked. But we're not trying to explain or emulate a human brain. We're speaking normatively, not descriptively.
Up until now we've been asking how to build a mathematical model of human common sense. A better question would be: how can we make a machine carry out plausible reasoning, following clear principles that express an idealized common sense?
1.4. Introducing the robot
To keep our attention focused, we're going to invent an imaginary robot. We design its brain as we see fit, and we see fit to make it reason according to definite rules. We choose the rules according to desiderata that seem like they would be desirable in human brains: if a rational person discovered that they were violating one of these desiberata, they would wish to revise their thinking.
Our robot reasons about propositions, and for now we restrict it to unambiguous true-or-false propositions. We don't require their truth or falsehood to be easy, or even possible, to establish with certainty, but it must be clear what truth and falsehood actually mean. For example, Jaynes considers both of these propositions to be true:
$$\begin{aligned}
A ≡ & \text{Beethoven and Berlioz never met.}\\
B ≡ & \text{Beethoven's music has a better sustained quality than that of}\\
& \text{Berlioz, although Berlioz at his best is the equal of anybody.}
\end{aligned}$$
Our robot can think about proposition $A$, although its truth or falsehood probably can't be established today. But for now, $B$ is off-limits. Later we'll see whether this restriction can be relaxed, and our robot can help with propositions like $B$. ("See chapter 18 on the $A_p$ distribution.")
1.5. Boolean algebra
At this point Jaynes briefly introduces Boolean algebra. I assume the reader is familiar with it, but I note that we denote conjunction (logical AND) by $AB$, disjunction (logical OR) by $A+B$, and denial (logical NOT) by $\bar A$. $≡$ means "equals by definition".
Also, a brief reminder of certain identities:
- Idempotence:
$AA = A+A = A$.
- Commutativity:
$AB = BA$; $A+B = B+A$.
- Associativity:
$A(BC) = (AB)C = ABC$; $A+(B+C) = (A+B)+C = A+B+C$.
- Distributivity:
$A(B+C) = AB + AC$; $A + (BC) = (A+B)(A+C)$.
- Duality:
$\overline{AB} = \bar A + \bar B$; $\overline{A+B} = \bar A \bar B$.
If $A=B$, by which we mean that $A$ and $B$ are logically equivalent, then $A$ and $B$ must also be equally plausible. This seems obvious, but Jaynes notes that Boole himself got it wrong.
As usual, $A ⇒ B$ means "$A$ implies $B$", i.e. $A+\bar B$. Remember that this is a much narrower statement than it would be in ordinary language; it doesn't mean that there is any actual connection between $A$ and $B$.
1.6. Adequate sets of operations
We have four operations (AND, OR, NOT and IMPLIES) that we can apply to propositions. We can use these to generate many new propositions, but two questions occur to us. Firstly, are there propositions defined from $A$ and $B$ that we can't represent using these operations? Secondly, can we reduce the number of operations without losing any propositions that we can currently generate?
We answer the first question no: any logical function on $n$ variables (of which there are $2^{2^n}$) can be written as a disjunction of conjunctions of arguments and their negations. Each conjunction includes every argument precisely once, either purely or negated, and is true at precisely one point of its domain. For example, $AB\bar C + \bar A \bar B C + \bar A B \bar C$. There is one semi-exception, where we have zero conjunctions (the function which is constantly false), and this can be written $A\bar A$.
We answer the second question yes: it is clear from the previous answer that IMPLIES is unnecessary, and from duality that we can do away with either (but not both) of AND and OR. We can't reduce either of the sets (AND, NOT) or (OR, NOT), but there are two operators which suffice by themselves: NAND, $A↑B = \overline{AB}$, and NOR, $A↓B = \overline{A+B}$.
1.7. The basic desiderata
We now move on to our extension of logic. This will follow from the conditions to be discussed in this section. We don't call them "axioms" because we're not asserting that they're true, just that they appear to be desirable goals. In chapter 2 we'll discuss whether the goals are contradictory, and whether they determine any unique extension of logic.
Our robot must assign a degree of plausibility to each proposition that it thinks about, based on the evidence it has available. When it collects new evidence, it must update these degrees of plausibility. To store these plausibility assignments in the brain, they have to be associated with some physical quantity, such as a voltage level. This means there must be some kind of association between degrees of plausibility and real numbers. Thus we have
- Desideratum (I). Degrees of plausibility are represented by real numbers.
This is motivated by physical necessity, but appendix (A) shows that it's also a theoretical necessity.
Being more specific, two other properties of this representation will be useful. Firstly, that a greater degree of plausibility is represented by a greater real number. And secondly, continuity; this is difficult to state precisely yet, so we say it intuitively: an infinitesimally greater plausibility should correspond to an infinitesimally greater number.
The plausibility assigned to a proposition will often depend on whether some other proposition is true. We use the symbol $A|B$ to denote "the conditional plausibility that $A$ is true, given that $B$ is true", or "$A$ given $B$". We also have, for example, $A | BC$, the plausibility of $A$ given that $B$ and $C$ are true; and $A+B \mid CD$, the plausibility that at least one of $A$ and $B$ is true, give that both of $C$ and $D$ are true; and so on.
(I mildly dislike this notation, myself, but we're stuck with it. I think the problem is spacing: the way it's usually typeset, $A + B | CD$ looks like it should be parethesised $A + (B|(CD))$ rather than $(A+B)|(CD)$. I prefer to have more whitespace around operators with low precedence.)
We're not going to attempt to define constructions like $A|B$ when $B$ is impossible; for example, if we write $A|BC$, we are implicitly assuming that $B$ and $C$ are compatible.
We want our robot to think in a way which is qualitatively like the way humans try to reason, as described by the above weak syllogisms and similar ones. So suppose the robot has prior information $C$ which gets updated to $C'$ such that the plausibility for $A$ is increased, but the plausibility of $B$ given $A$ does not change:
$$ A|C' > A | C; \\
B|AC = B|AC'. $$
This can never decrease the plausibility that both $A$ and $B$ are true:
This doesn't say anything about how much the plausibilities change, but it gives a sense of direction. These qualitative requirements will be stated explicitly in chapter 2; for now we sum them up as
- Desideratum (II). Qualitative correspondance with common sense.
Finally, we wish our robot to reason consistently. By this we mean
- Desideratum (IIIa). If a conclusion can be reasoned out in more than one way, then every possible way must lead to the same result.
- Desideratum (IIIb). The robot always takes into account all of the evidence it has relevant to a question. It does not arbitrarily ignore some of the inormation, basing its conclusions only on what remains. In other words, the robot is completely nonideological.
- Desideratum (IIIc). The robot always represents equivalent states of knowledge by equivalent plausibility assignments. That is, if in two problems the robot's state of knowledge is the same (except perhaps for the labeling of the propositions), then it must assign the same plausibilities in both.
Desiderata (I), (II) and (IIIa) are structural requirements on the robot's brain, while (IIIb) and (IIIc) show how the robot should relate to the outside world.
It turns out that these are all the desiderata we need. There is only one set of mathematical operations for manipulating plausibilities which satisfies all of them; they uniquely determine the rules by which our robot must reason. These rules will be deduced in chapter 2.
Our robot's mental state about any proposition is going to be represented by a real number. A human's mental state is much more complicated: we judge propositions as being plausible, desirable, amusing, etc. A human's mental state might be better represented as a many-dimensioned vector of real numbers.
Unemotional propositions like "the refractive index of water is less than 1.3" can be represented with fewer dimensions than propositions like "your mother-in-law just wrecked your new car". The situations we encounter in real life are often the ones requiring many coordinates. This may help explain why human reasoning about such situations is hard to model; and why math and science (dealing with propositions that generate simple mental states) are so successful.
Many of these coordinates are not useful to us. We don't want our robot to get bored with a lengthy problem, or to get confused by emotional factors. But there is a large unexplored area of possible generalizations of the theory that we'll be developing, which could more accurately model actual human brains.
Ordinary language, if used carefully, does not need to be less precise than formal logic. But it's more complicated, giving it richer possibilities of expression. In particular, it has many ways to imply something without saying it, which are lost on formal logic. The claim "I believe what I see" and the retort "he doesn't see what he doesn't believe" would have the same meaning in formal logic, but convey opposite meanings in common language.
Another example is that the word "is" is not commutative in common language. This example is taken from a math textbook: consider a straight line in the plane, and an infinite set of points in the plane, and the projections of the points onto the line. Then the statements
- The projection of the limit is the limit of the projections
- The limit of the projections is the projection of the limit
Are not considered equivalent. The projections may have a limit while the points themselves do not (but not vice versa). The first statement implicitly asserts that the points have a limit, and is true conditional on that premise; the second implicitly asserts only that the projections have a limit, and is false.
We can also distinguish between two different senses of the word "is". The epistemological sense, "the room is noisy", expresses something about the speaker's perception. The ontological sense, "there is noise in the room", asserts the physical existence of something. Mistaking one's thoughts and sensations for external realities is called the "mind projection fallacy", and causes much trouble in probability theory and elsewhere.
(This is Jaynes' way of reminding us that the map is not the territory. It is not meant to imply that one's thoughts and sensations are completely unrelated to the real world.)
We will not attempt to make our robot grasp common language.
1.8.2. Nitpicking
Sometimes people question the second strong syllogism, "A implies B; not-B; therefore not-A". But Jaynes is happy to use it, noting among other things that if we exhibit a counterexample to a supposed theorem, then the supposed theorem is considered disproved. A new logic might lead to results which Aristotelian logic can't talk about, and that's just what we're trying to create here. But if a new logic was found to disagree with Aristotelian logic, then we would consider that a fatal flaw in the new logic.
There are attempts to develop multiple-value logics. But arguments presented in appendix A suggest that there is no new content in these. An $n$-valued logic applied to a set of propositions is equivalent to a two-valued logic applied to a larger set; or it is inconsistent.
Posted on 02 February 2014
|
Comments
There's a fairly well-known (as these things go) IOCCC entry ("westly.c") to calculate pi. It looks like this:
#define _ F-->00 || F-OO--;
long F=00,OO=00;
main(){F_OO();printf("%1.3f\n", 4.*-F/OO/OO);}F_OO()
{
_-_-_-_
_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_
_-_-_-_
}
This prints 3.141, but you could get more digits by increasing the size of the circle (and changing the printf call).
I recently decided to port this to python. Here's the result:
class F:pass
s=F();_=F();
a=[];3;d=[];
A=(lambda:a.
append(1) or
s);D=(lambda
:d.append(1)
or s);l=len;
_.__neg__=(#
(lambda: _))
_.__sub__=(#
lambda x: (D
() and A()))
s. __sub__=\
lambda x:A()
-_
_-_-_-_
_-_-_-_-_-_
-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_
-_-_-_-_-_-_-_-_-_-_
-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_
-_-_-_-_-_-_-_-_-_-_-_
-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_
-_-_-_-_-_-_-_
_-_-_-_-_-_
_-_-_-_
-_
print"%.3f"%(8.*l(a+d)/l(d)**2)
The rest of this post is spoilers, so stop reading if you'd like to figure out what's going on for yourself.
How it works
The original C version works by counting the area of the circle in variable F, and its diameter in variable OO. Then it calculates $ π = {4A \over d^2} $, and prints this to 3 decimal places.
In more detail, _ is #defined to the statement F-->00 || F-OO--;. Since F is never positive in this program, by default this decrements both F and OO. But _ mostly appears in the context of -_, which becomes -F-->00 || F-OO--;, which will only decrement F (since F is negative whenever this statement occurs).
So the diameter of the ascii circle is counted as the number of lines, and its area is counted as the total number of _s it contains.
The python version uses a similar algorithm, but implemented differently. There's no preprocessor, so instead I use operator overloading. _ is a variable. Strings like _-_-_ count the total number of _s minus one into variable a, and count one into variable d. Then we use $ π = {8(a+d) \over d^2} $. 8 because it's only a semicircle, and $(a+d)$ because a is missing one _ per line.
How it does that is easy enough. _ has a __sub__ method which increments both a and d, and returns a new variable, s. s has a __sub__ method which just increments a and returns s. So a line like _-_-_-_ becomes s-_-_ becomes s-_ becomes s, and on the way it counts 1 into d and 3 into a.
We can also start a line with a -. -_ is defined to just return _, so it doesn't change anything. The top and bottom lines, -_, are just there for aesthetic reasons, they don't do anything at all.
As an implementation detail, a and d are actually lists. You can't call a += 1 in a python lambda, because python's lambdas are crippled (limited to a single expression, whereas a+=1 is a statement). So I have functions
A = lambda: a.append(1) or s
D = lambda: d.append(1) or s
which are slightly shorter, easier to lay out, and I think more in the spirit of the thing, than writing def A(): global a; a+=1; return s. (There's also some historical motivation. I originally used I = lambda x:x.append(1) or s and called I(a) and I(d). But I had difficulty fitting it into the square block, and in the course of experimentation ended up with what I have now. I could have gone with a and d as one-element arrays and def I(x): x[0]+=1; return s. But, whatever.)
Then in the final line, l(a+d) is just len(a+d), and summing arrays concatenates them so that's the same as len(a)+len(d). And l(d) is just len(d).
Here's an unobfuscated version of the header:
class F:pass
s = F()
_ = F()
a = []
d = []
A = lambda: a.append(1) or s
D = lambda: d.append(1) or s
l=len
_.__neg__ = lambda: _
_.__sub__ = lambda x: D() and A()
s.__sub__ = lambda x: A()
Incidentally, I only have one code block, class F:pass. When I decided to lay the header out in a rectangle, because yay geometric shapes, that line forced the width because I can't put anything before or after it. I wanted to go with F = type('', (), {}). But for some reason, that makes subtraction throw a TypeError:
>>> class F: pass
...
>>> x = F()
>>> x.__sub__ = lambda x: 3
>>> x-x
3
>>> G=type('',(),{})
>>> y = G()
>>> y.__sub__ = lambda x: 3
>>> y - y
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for -: '' and ''
I don't know why that happens.
You might be wondering, if I'm using the same algorithm as the C version, why is my semicircle so much larger than the original circle? That's an annoying quirk of mathematics.
With a circle, we have $ π = { 4a \over d^2 } $. For fixed $d$, we need an area $ a = { πd^2 \over 4 } $. But our area is an integer. So for fixed $d$, we can get an upper and lower bound on $π$, and we choose to display one of these. As we increase $d$, the bounds get tighter, but the lower bound doesn't move monotonically up, and the upper bound doesn't move monotonically down. $ d=16 $ is the first time that either bound gives us three decimal places, with an area of 201.
I had to go with a semicircle, because of significant whitespace, so I instead need an area $ a = { πd^2 \over 8 } $. With a diameter of 16, I could choose between an area of 100 or 101. These give π as 3.125 or 3.15625, which aren't even correct to two decimal places. If I wanted three decimal places, I needed a diameter of 32 and an area of 402. (My semicircle is actually 34 lines tall, but as noted above, the first and last lines don't do anything.)
Pedants will note that π rounded to three decimal places should actually be 3.142. I could have acheived that with a diameter of 34 and an area of 454. With a full circle, we could go for diameter 17 and area 227.
Posted on 18 January 2014
|
Comments
Cross-posted to Less Wrong.
It feels like most people have a moral intuition along the lines of "you should let people do what they want, unless they're hurting other people". We follow this guideline, and we expect other people to follow it. I'll call this the permissiveness principle, that behaviour should be permitted by default. When someone violates the permissiveness principle, we might call them a fascist, someone who exercises control for the sake of control.
And there's another moral intuition, the harm-minimising principle: "you should not hurt other people unless you have a good reason". When someone violates harm-minimisation, we might call them a rake, someone who acts purely for their own pleasure without regard for others.
But sometimes people disagree about what counts as "hurting other people". Maybe one group of people believes that tic-tacs are sentient, and that eating them constitutes harm; and another group believes that tic-tacs are not sentient, so eating them does not hurt anyone.
What should happen here is that people try to work out exactly what it is they disagree about and why. What actually happens is that people appeal to permissiveness.
Of course, by the permissiveness principle, people should be allowed to believe what they want, because holding a belief is harmless as long as you don't act on it. So we say something like "I have no problem with people being morally opposed to eating tic-tacs, but they shouldn't impose their beliefs on the rest of us."
Except that by the harm-minimising principle, those people probably should impose their beliefs on the rest of us. Forbidding you to eat tic-tacs doesn't hurt you much, and it saves the tic-tacs a lot of grief.
It's not that they disagree with the permissiveness principle, they just think it doesn't apply. So appealing to the permissiveness principle isn't going to help much.
I think the problem (or at least part of it) is, depending how you look at it, either double standards or not-double-enough standards.
I apply the permissiveness principle "unless they're hurting other people", which really means "unless I think they're hurting other people". I want you to apply the permissiveness principle "unless they're hurting other people", which still means "unless I think they're hurting other people".
Meanwhile, you apply the permissiveness principle unless you think someone is hurting other people; and you want me to apply it unless you think they're hurting other people.
So when we disagree about whether or not something is hurting other people, I think you're a fascist because you're failing to apply the permissiveness principle; and you think I'm a rake because I'm failing to apply the harm-minimisation principle; or vice-versa. Neither of these things is true, of course.
It gets worse, because once I've decided that you're a fascist, I think the reason we're arguing is that you're a fascist. If you would only stop being a fascist, we could get along fine. You can go on thinking tic-tacs are sentient, you just need to stop being a fascist.
But you're not a fascist. The real reason we're arguing is that you think tic-tacs are sentient. You're acting exactly as you should do if tic-tacs were sentient, but they're not. I need to stop treating you like a fascist, and start trying to convince you that tic-tacs are not sentient.
And, symmetrically, you've decided I'm a rake, which isn't true, and you've decided that that's why we're arguing, which isn't true; we're arguing because I think tic-tacs aren't sentient. You need to stop treating me like a rake, and start trying to convince me that tic-tacs are sentient.
I don't expect either of us to actually convince the other, very often. If it was that easy, someone would probably have already done it. But at least I'd like us both to acknowledge that our opponent is neither a fascist nor a rake, they just believe something that isn't true.
Posted on 04 January 2014
|
Comments
If you hang out on /r/scifi for long enough, you'll hear someone say that Star Wars is not science fiction. It's fantasy in space, or space opera, or whatever, but it isn't science fiction.
(The example that inspired this post, nine months ago, was this thread.)
And, well, they're wrong.
I want to call it a "no true Scotsman" fallacy. Unfortunately, there's no good definition of science fiction that I can point to and say "whatever Star Wars has or doesn't have, that you think makes it not science fiction, is not part of the definition".
Worse, the person in question usually has a definition of science fiction. (In this case, "science fiction embraces change".) And indeed, it's a definition that Star Wars does not attain.
But what this should tell us is not "Star Wars is not science fiction". It's "your definition of science fiction is broken". Consider:
"No Scotsman takes sugar with their porridge!"
"But my uncle Hamish takes sugar with his porridge."
"No, sorry, I was unclear. I define a Scotsman as a person of Scottish citizenship who does not take sugar with their porridge."
It's a bit ridiculous, isn't it?
Star Wars isn't as bad as that, for (at least) two reasons. Since there's no agreed-upon definition of science fiction, choosing your own is more reasonable than choosing your own definition of Scotsman. (Though, I wonder just how well-defined "Scotsman" is. Is the important factor citizenship, birth, personal identity? A mix? Is a "Scotsman" actually necessarily male, with "Scotsperson" being gender neutral? Still, it's better defined than science fiction.)
The other reason is that the proposed definition of "Scotsman" is a silly definition even to consider. Why are you talking about the conjunction of "Scottish citizenship" and "doesn't take sugar with their porridge"? Is there anything you can say about "people of Scottish citizenship who do not take sugar with their porridge", that doesn't follow trivially from what you know of "people of Scottish citizenship" and "people who take sugar with their porridge"? Whereas the proposed definition of science fiction is often quite a reasonable definition of something that we might want to talk about.
Just not a reasonable definition of science fiction: it doesn't include Star Wars.
I don't think that it's bad, in general, to take a poorly-defined concept and attempt to give it a rigorous definition. And sometimes we might find that in doing so, there are some things that we've been thinking of as frantles that don't seem to fit in very well with all the other things that we think of as frantles, and maybe we should stop calling them frantles. I think something like that happened with the question of whether or not 1 is a prime number, for example.
But I think that if you do that to science fiction, and eliminate Star Wars, you've done something wrong.
My reasoning for this is basically "Star Wars is obviously science fiction, duh". But a more compelling argument is: if you eliminate Star Wars, you're bound to be eliminating a whole load of other stuff. For example, if the Force disqualifies Star Wars for being too magical, then presumably Stranger in a Strange Land doesn't count either. It's been a while since I read the 2001 series, but I think David Bowman would disqualify those as well.
There are stories that I consider science fiction that I wouldn't make this complaint about. If someone's definition excluded Ted Chiang's 99 Letters, that would be fine. But when you say that Star Wars isn't science fiction, you're no longer speaking the same language as your audience, unless your audience is composed mostly of the sort of people who try to come up with rigorous definitions of science fiction.
It's also important to note that whether or not a work counts as science fiction is fundamentally unimportant. If you read a book, and you're not sure whether it counts as science fiction, it's not because you're ignorant about the book. Learning that it is or isn't science fiction won't teach you anything new.
But still. Communicating clearly is important, and if you talk about science fiction and don't intend to include Star Wars, you're failing at that.
Posted on 22 December 2013
|
Comments
Chicken is a well-known two-player game in which the players get inside different cars, drive towards each other, and hope that the other player swerves out of the way. If both players swerve, they both lose small amounts of charisma. If one player swerves and the other doesn't, the swerver loses a slightly larger amount of charisma, and the non-swerver gains a small amount. If neither player swerves, both lose a large (occasionally fatal) amount of HP, and a small amount of charisma.
It is sometimes said[citation needed] that there is an optimal strategy to playing Chicken: after pointing your car at your opponent, remove your steering wheel. Now you can't swerve, so she knows that she can either swerve or crash.
Hearing this strategy, you head down to your local Chicken field, armed with your trusty screwdriver, and eager for a game. You quickly find an opponent. The two of you perform the traditional squaring-off ritual, enter your respective vehicles, and wheel around to face each other. You take your screwdriver in hand, smile sweetly at your opponent… and realise that she has had the same idea, and is already removing her steering wheel.
Are you still going to remove your own wheel? Or are you going to swerve?
It's easy to think that Chicken can turn into a game of "who can precommit first". We race to remove our steering wheels; if I win, then you have to swerve or crash, and if you win, then I have to swerve or crash.
But what if I put on a blindfold? Now if you remove your steering wheel, I won't know about it. What good does it do you then? Even if you always remove your steering wheel, assuming your opponent doesn't remove theirs first - and even if I know that you do that, and you know that I know, and so on - do I know that you're still going to do it when I can't see you do it? You'd like me to think that you will, but how do you convince me? And even if you do convince me, I can still remove my own steering wheel, and if I do it while blindfold I won't know if you beat me to it.
With a blindfold, I'm not precommiting not to swerve, although I can do that as well if I choose. I'm precommiting to ignore your precommitments. (Sort of. I'm precommiting not to receive any sensory input about your precommitments. I can still react to your precommitments, e.g. if both you and my model of you precommit despite me not seeing you do it.)
Of course, you can also put on a blindfold. So maybe we race to do that? But then I can tell you before the game that even if you put on a blindfold, I'm still removing my steering wheel. And you say there's no reason for me to do that, so you don't believe that I will. And I say, "try me". If I've said this in front of a large audience, the cost to me of swerving just went up. Moreso if some of the audience are potential future blindfold opponents.
And you can say the same thing to me. But this meta-level is different. If you've removed your steering wheel, you can't swerve. If you've blindfolded yourself, you can't see me precommit. Those are credible precommitments. There's no object-level reason for me to copy you after you've already done it. But just telling me of your precommitment is far less credible, you can always change your mind.
So if you tell me that you're going to ignore my precommitment to ignore your precommitment, I can tell you the same thing, hoping to change your mind.
Assuming neither of us claims to back down, what happens now? We face off against each other, and if I blindfold myself, I don't know whether you're going to do the same, and I don't know whether you're going to remove your steering wheel, and you know I don't know, and I know you know I don't know… and it seems to me that we're back at the original game.
(Incidentally, I've ignored the possibility that a steering wheel might be replaceable; or that a blindfold can be removed, or light enough to see through from the inside. These possibilities make your precommitments less credible.)
Posted on 07 December 2013
|
Comments
(Cross-posted to LessWrong.)
Human brains are bad at evaluating consequences. Sometimes we want to do something, and logically we're pretty sure we won't die or anything, but our lizard hindbrains are screaming at us to flee. Comfort Zone Expansion (CoZE) is an exercise that CFAR teaches to get our lizard hindbrains to accept that what we're doing is actually pretty safe.
Roughly it involves two steps. One: do something that makes our lizard hindbrains get pretty antsy. Two: don't get eaten as a result.
I organised a CoZE exercise for LessWrong London on Septmeber 1st. We had a total of eight participants, I think I was the only one who'd done any structured CoZE before.
My plan was: we meet at 11am, have a discussion about CoZE, play some improv games to warm up, and then head to the nearby mall for 1.30 pm. In reality, the discussion started closer to 12pm, with some people showing up part way through or not until it was finished.
After finishing the discussion, we didn't end up doing any improv games. We also became slightly disorganised; we agreed on a meeting time an hour and a half in the future, but then our group didn't really split up until about twenty minutes after that. I could have handled this better. We got distracted by the need for lunch, which I could have made specific plans for. (Ideally I would have started the discussion after lunch, but shops close early on Sunday.)
Report
My solo activities went considerably less well than I'd expected. My first thing was to ask a vendor in the walkway for some free chocolate, which annoyed her more than I'd expected. Maybe she gets asked that a lot? It was kind of discouraging.
After that I wanted to go into a perfume shop and ask for help with scent, because I don't know anything about it. I wandered past the front a couple of times, deciding to go when the shop was nearly empty, but then when that happened I still chickened out. That, too, was kind of discouraging.
Then I decided to get back in state by doing something that seemed easy: making eye contact with people and smiling. It turns out that "making eye-contact" is a two-player game, and nobody else was playing. After some minutes of that I just gave up for the day.
In my defense: I had a cold that day and was feeling a bit shitty (while wandering near the perfume shop I got a nose-bleed, and had to divert to the bathroom temporarily), and that might have drained my energy. I did also get a few minor victories in. The most notable is that I did some pull-ups on some scaffolding outside, and someone walking past said something encouraging like "awesome". (I'd like to do these more often, but the first time I tried there was ick on the scaffolding. There wasn't this time, so I should collect more data.)
[[I spoke with Critch from CFAR a few days afterwards, and he gave me a new perspective: if I go in expecting people to respond well to me, then when they don't, that's going to bother me. If I go in expecting to annoy people, but remembering that annoying people, while bad, doesn't correspond to any serious consequences, then it's going to be easier to handle. For any given interaction, I should be trying to make it go well, but I should choose the interactions such that they won't all go well. (Analogy: in a game of Go, the stronger player might give a handicap to the weaker player, but once play starts they'll do their best to win.)
He also gave me a potential way to avoid chickening out: if I imagine myself doing something, and then I try to do it and it turns out to be scary, then that feels like new information and a reason to actually not do it. If I imagine myself doing something, being scared and doing it anyway, then when it turns out to be scary, that no longer counts as an excuse. I haven't had a chance to try this yet.]]
Other people had more success. We'd primed ourselves by talking about staring contests a lot previously, so a few people asked strangers for those. I think only one stranger accepted. Trying to get high-fives was also common; one person observed that he sometimes does that anyway, and has a much higher success rate than he did in the mall. One person went into a high-end lingerie store and asked what he could buy on a budget of £20 (answer: nothing). And of course there were several other successes that I've forgotten. I got the impression that most people did better than me.
There was interest in doing this again. At the time I was hesitant but realised that I would probably become less hesitant with time. I've now reached a point where I, too, would quite like to do it again. We haven't got any specific plans yet.
Things to take away:
- Have some well-defined way to transition from "about-to-start" to "started".
- Having an audience makes some things much easier. This is potentially a way to escalate difficulty slowly.
- When I did CoZE at the CFAR workshop I was hanging out with someone, who encouraged me to actually do things as well as provided an audience. At the time I wondered whether the fact that we were together meant I did less, but I did far more with her than by myself.
- We didn't do anything beforehand to get in state, like improv games. I can't say whether this would have helped, but it seems like it might have done.
- We had a nonparticipant volunteer to be a meeting point if participants wanted to finish early. If she hadn't been there, I might not have quit twenty minutes early, but given that I did, it was nice to be able to hang out. This might be a good way to enable longer sessions.
An objection
During the discussion about ethics, someone brought up an objection, which I think cashes out as: there are good places to expand our comfort zones into, there are bad places, and there are lame places. A lot of the stuff on the recommended list of activities is lame (do we really need to be better at asking people for staring contests?), and it's not clear how much it generalises to good stuff. Under the novice-driver line of thought, bothering people is an acceptable cost of CoZE; but if we're using CoZE for lame things, the benefits become small, and maybe it's no longer worth the cost.
I guess this boils down to a question of how much the lame stuff generalises. I'm optimistic; for example, it seems to me that a lot of the lame stuff is going to be overcoming a feeling of "I don't want to bother this person", which is also present in the good stuff, so that particular feature should generalise. (It may also be that the lame stuff is dominated by the good stuff, so there's no reason to ever practice anything lame; but that seems a sufficiently complicated hypothesis to be low prior.)
(There's also a question of whether or not people are actually bothered by strangers asking for staring contests. My initial assumption was not, but after doing the exercise, I'm not sure.)
Posted on 18 November 2013
|
Comments
(Fri, 6 Dec 2013: Importing this post from its original home as a gist.)
The point of this post is an attempt to calculate e to given precision in bash, a challenge given in a job listing that I saw recently. I kind of got nerd sniped. I wrote this as I went along, so there may be inconsistencies.
First attempt
The obvious method to compute e is as the infinite sum of 1/n!, n from 0 to ∞. This converges quickly, but how far do we have to calculate to get the n'th digit of e? We can deal with that later.
We obviously need a factorial function.
fac() {
if [[ $1 -eq 0 ]]; then
echo 1
else
echo $(( $1 * `fac $(($1 - 1))` ))
fi
}
Since we only have integer division, we obviously can't calculate 1/2!. But note that x/a! + y/(a+1)! = (x(a+1) + y)/(a+1)!. We can use this to recursively calculate (sum 1/k! [0 ≤ k ≤ n]) as numer(n)/n!, where numer(0) = 0, numer(k) = k*numer(k-1) + 1.
numer() {
if [[ $1 -eq 0 ]]; then
echo 1
else
echo $(( $1 * `numer $(($1 - 1))` + 1 ))
fi
}
So now we can calculate the partial sums. Since we still only have integer division, we need to multiply them by a power of 10 to get rid of the decimal point.
nthsum() {
echo $(( 10**($1-1) * `numer $1` / `fac $1` ))
}
Note that this fails for n=0 (10**-1 isn't allowed), but we can live with that.
So this kind of works:
$ for i in `seq 1 15`; do nthsum $i; done
2
25
266
2708
27166
271805
2718253
27182787
271828152
2718281801
27182818261
2252447557
-1174490000
104582974
1946803
Up to n=11, we accurately calculate the first (n-3) digits of e. For n=12 and above, we get integer overflows.
It doesn't look like we can go very far with this: the numbers we're working with are simply too large.
Second attempt
If you google "algorithm to calculate a specific digit of e", this paper comes up: http://eprints.utas.edu.au/121/1/Calculation_of_e.pdf. It provides a simple algorithm using (mostly) integer arithmetic, implemented in ALGOL. It's simple enough to translate into bash:
ecalc() {
let n=$1
echo -n 2.
for (( j = n; j >= 2; j-- )); do
coef[j]=1
done
for (( i = 1; i <= n; i++ )); do
let carry=0
for (( j = n; j >= 2; j-- )); do
let temp=coef[j]*10+carry
let carry=temp/j
let coef[j]=temp-carry*j
done
echo -n $carry
done
echo
}
This isn't quite accurate: the original algorithm calculates m such that m! > 10^(n+1), and the loops over j go from m to 2 instead of n to 2. This means the algorithm is inaccurate for small n. (For n ≥ 27, n! > 10^(n+1) so it works out okay; for 22 ≤ n ≤ 26, we have 10^(n-1) < n! < 10^(n+1) and the result is accurate anyway. It seems like the algorithm is unnecessarily conservative, but we might also find that m! > 10^(n-1) is insufficient for larger n.) For large n, we do unnecessary work, but get the correct result.
We can fix both these problems, but this algorithm isn't especially nice anyway. Its time complexity is O(n^2). Can we do better?
Third attempt
(Spoiler alert: this doesn't go so well.)
The same google search also gives this page: http://www.hulver.com/scoop/story/2004/7/22/153549/352 which hints at an algorithm without providing it explicitly. We can adapt our first attempt for this.
Write numer(n) as a_n, so a_0 = 1 and a_n = n * a_(n-1) + 1. This gives 1/0! + 1/1! + … + 1/n! = a_n / n!. We know that e = lim [n→∞] a_n / n!; but more than this, we can show that for any n ≥ 1, a_n / n! < e < (a_n + 1)/n!.
(Proof of this: (a_n+1) / n! = 1/0! + 1/1! + … + 1/n! + 1/n!. This is greater than e if 1/n! > 1/(n+1)! + 1/(n+2)! + …, which holds if 1 > 1/(n+1) (1 + 1/(n+2) (1 + … )). For n ≥ 1, RHS is ≤ 1/2 (1 + 1/3 (1 + … )) which we know is e-2 < 1.)
So if a_n / n! and (a_n + 1) / n! agree up to k decimal places, these must be the first k decimal places of e.
Moreover, we can extract specific decmial places while keeping to integer division: the fractional part of x/y is (x%y)/y, so the first decmal digit is int( (10*(x%y))/y ) or int( (x%y)/(y/10) ) (assuming y%10 = 0), and we can extract further digits by doing the same thing again.
This gives us an algorithm for calculating e to n decimal places, one digit at a time:
ecalc() {
let a=1
let b=1
let d=0
let k=1
let n=$1
while (( d <= n )); do
while (( a/b != (a+1)/b || b%10 != 0 )); do
let a=k*a+1
let b*=k
let k+=1
done
echo -n $(( a / b ))
let d+=1
let a%=b
let b/=10
done
echo
}
Unfortunately, this only works up to three decimal places before we get overflows. The problem is that b only gets a new power of 10 every time k%5 = 0. Unfortunately 24!/10000 overflows, so we only get digits from k=5, 10, 15, 20. (In fact, ecalc 4 is also correct; but this seems to be just coincidence.)
We can delay the inevitable by keeping track of powers of ten explicitly: when we generate a new digit, if b%10 != 0, increment a counter and consider (10^powten * a)/b and (10^powten * (a+1))/b. This gives us a few more digits, but before long 10^powten * a overflows.
So, how to get around this? Why not just implement arbitrary-precision integers in bash?
It sounds crazy, but we don't actually need a complete implementation. The only operations we need are:
- Add one to a bigint.
- Multiply a bigint by an int.
- Divide a bigint by 10.
- Modulo a bigint by 10.
- Integer division of bigints, with small ratio.
- Modulo a bigint by another bigint, also with small ratio.
The latter two can be implemented with subtraction and comparison, so it shouldn't be too hard.
Let's represent a big integer as an array of numbers, each smaller than 2^32. Since bash can represent numbers up to 2^63 - 1, we can raise n up to 2^31 - 1 before overflows become a serious problem. As of 2010, e was only known up to about 2^40 digits, so this is an acceptable limit. But it's admittedly quite arbitrary, and there's no reason to hardcode it.
We'll want some way of taking an array of numbers and turning it into a bigint, if some of its elements are greater than MAXINT or less than 0. I don't think there's a convenient way of passing around arrays in bash, so let's use a register-based approach, and operate destructively on the variable $res. (This is for convenience: $res will be the output variable of other operations, and we expect normalisation to be the last thing they do.)
normalise() {
local i
for (( i = 0; i < ${#res[@]}; i++ )); do
if (( res[i] >= MAXINT )); then
let res[i+1]+=(res[i] / MAXINT)
let res[i]=(res[i] % MAXINT)
elif (( res[i] < 0 && ${#res[@]} > i+1 )); then
let res[i+1]-=1
let res[i]+=MAXINT
fi
done
}
This doesn't handle every case; for example, a term smaller than -MAXINT will break things. But it will be sufficient for our purposes.
With this, addition and subtraction are easy. We only need addition of an int and a bigint, so will call this addi (i for integer) and operate on the variable $op1.
addi() {
res=( ${op1[@]} )
let res[0]+=$1
normalise
}
Subtraction needs to be defined between two bigints, but we only need positive results.
sub() {
local i
res=()
for (( i = 0; i < ${#op1[@]}; i++ )); do
let res[i]=op1[i]-op2[i]
done
normalise
}
Multiplication and division follow similarly. (We only need to divide a bigint by 10, but allowing an arbitrary int is no harder.)
muli() {
local i
res=(${op1[@]})
for (( i = 0; i < ${#res[@]}; i++ )); do
let res[i]*=$1
done
normalise
}
divi() {
local i
res=(${op1[@]})
for (( i = ${#res[@]}-1; i > 0; i-- )); do
let res[i-1]+="MAXINT*(res[i] % $1)"
let res[i]/=$1
done
let res[0]/=$1
normalise
}
(We note that muli might break if the multiplicand is close to 2^32: if two adjacent terms in $res are sufficiently large, normalise might cause overflows. But we're assuming the multiplicand is at most 2^31 - 1, and testing indicates that this works fine.)
For modi, even though the result is a normal integer, we'll return it in $res like a bigint. The other option would be to echo it, but then we'd need to spawn a subshell to use it. (Test this yourself: compare echo 'echo hi' | strace -f bash to echo 'echo $(echo hi)' | strace -f bash. The first doesn't fork at all, because echo is a builtin command; but the second forks a subshell to run echo hi.) Forking isn't cheating, but it seems worth avoiding.
modi() {
local i
let res=0
for (( i = 0; i < ${#op1[@]}; i++ )); do
let res+="${op1[i]}%$1 * (MAXINT%$1)**i"
done
let res%=$1
}
For division and modulo, we need a ≤ operation; we can use its exit code for the return value. (We return 0 (true) if op1 ≤ op2, and 1 (false) otherwise.)
le() {
local i
local len=${#op1[@]}
(( len < ${#op2[@]} )) && len=${#op2[@]}
for (( i = len-1; i >= 0; i-- )); do
if (( op1[i] > op2[i] )); then
return 1
elif (( op1[i] < op2[i] )); then
return 0
fi
done
return 0
}
Finally we can implement division and modulo. We'll just define a mod operator, which can store the division result in a variable $div.
mod() {
local temp=( ${op1[@]} )
let div=0
res=( ${op1[@]} )
until le; do
let div+=1
sub
op1=( ${res[@]} )
done
op1=( ${temp[@]} )
}
So mod stores $op1 % $op2 in $res, and $op1 / $op2 in $div. Since we know $op1 / $op2 will always be less than 10, we could maybe get a slight speed improvement with a binary search, but I really doubt that's going to be a bottleneck.
It would be foolish not to test these. These Haskell functions (entered into GHCI, which only accepts one-line definitions) will help:
let x = 2^32
let splitint n = if n < x then (show n) else (show (n `mod` x)) ++ " " ++ splitint (n `div` x)
let unsplit s = sum $ map (\(a,b) -> b*x^a) $ zip [0..] $ map read $ words s
splitint turns an arbitrary-precision Integer into a string that we can copy into a bash array. unsplit does the opposite, taking a space-separated list of integers and turning them into an arbitrary-precision Integer. (Haskell is good for this because it has arbitrary-precision arithmetic. I originally tried this in Python, and wasted a lot of time trying to track down a bug in my bash code before realising that Python was wrong.) So we choose a few arbitrary large numbers and verify that everything works as expected. (Unsurprisingly, I caught a few bugs when I actually tried this.)
Having implemented bigints to the extent necessary, we can hopefully extract more digits from e. Arithmetic is ugly now, so we'll split off some functions, all using variables $a and $b.
Some things to note:
b_has_power_10 is faster than got_next_digit, and more likely to fail. So we test that first.
- To avoid repeating computations,
echo_next_digit and reduce_a_b simply use the results of a mod b calculated in got_next_digit.
got_next_digit() {
op1=( ${a[@]} )
addi 1
op1=( ${res[@]} )
op2=( ${b[@]} )
mod
div1=$div
op1=( ${a[@]} )
mod
(( div1 == div ))
}
echo_next_digit() {
echo -n $div
}
b_has_power_10() {
op1=( ${b[@]} )
modi 10
(( res == 0 ))
}
increase_a_b() {
op1=( ${a[@]} )
muli $1
op1=( ${res[@]} )
addi 1
a=( ${res[@]} )
op1=( ${b[@]} )
muli $1
b=( ${res[@]} )
}
reduce_a_b() {
a=( ${res[@]} )
op1=( ${b[@]} )
divi 10
b=( ${res[@]} )
}
ecalc() {
a=(1)
b=(1)
d=0
k=1
n=$1
while (( d <= n )); do
until b_has_power_10 && got_next_digit; do
increase_a_b $k
let k+=1
done
echo_next_digit
reduce_a_b
let d+=1
done
echo
}
This still seems to act as O(n^2). Which, in hindsight, shouldn't be too surprising: arithmetic is O(log b); b grows as O(k!); and k grows to approximately 4n. (k! needs to have n powers of 5, which means n ≈ k/5 + k/25 + k/125 + … = k/4.) Since there are O(n) arithmetic operations, we should actually find that this is O(n^2 log n) if we look close enough. That's disappointing; and the algorithm is considerably slower than the previous one (7 minutes versus 40 seconds to calculate 100 digits), which is even more disappointing.
But maybe we can still salvage something. (Or maybe we're just doubling down on a sunk cost.) The bottleneck right now is probably the powers of 10 in b. There's an easy way to see this: ask for e up to 20 places. Then take the values of a and b, put them into Haskell, and see just how many digits we'd actually generated at this point.
It turns out to be about 130. (And this only took twelve seconds, so we're beating the second attempt considerably.)
So if we can extract all these digits without needing so many powers of 10 in b, we can do a lot better. We might even be able to beat O(n^2), if k grows slower than O(n). So let's try to do that.
Fourth attempt
We can't simply multiply a by 10 every time we'd like to divide b by 10. That would break the algorithm, for one thing: we'd have to keep track of what power of 10 to multiply a by, and only use it when checking to see if we've got the next digit, not in increase_a_b. (It's okay for b because b only ever gets multiplied, so it doesn't matter whether we do that before or after dividing by 10. But when we do a = k*a + 1, it matters that we haven't already multiplied a by 10.)
That's a minor problem. More severely, our division algorithm was designed for small ratios. If we know a/b < 10, it's okay to examine, a, a-b, a-2b, … to see when we get below b. That won't work so well if a/b could be in the thousands.
Fortunately, we can improve division by using the digits we've already calculated. If we have e = 2.71828… and we haven't reduced a or b at all, then we know a / b = 2.71828…, 10a / b = 27.1828…, 100a / b = 271.828…, etc.
And we know (a-2b)/b = 0.71828, so 10(a-2b)/b = 7.1828…; and (10a - 27b)/b = .1828… so 10(10a - 27b)/b = 10(10(a - 2b) - 7b)/b = 1.828…; and so on.
In short, if we know that a/b = d_0 . d_1 d_2 d_3 … d_n …, then we can extract unknown d_(n+1) by:
let x = a
for i from 0 to n:
x -= d_i * b
x *= 10
d_(n+1) = x/b
These operations are all reasonably fast. (It is, however, O(n), which means we're not going to beat O(n^2).) So, let's try it. We'll store digits of e in an array named e_digits.
next_digit_helper() {
local i
local tmp1=( ${op1[@]} )
local tmp2=( ${op2[@]} )
local x=( ${op1[@]} )
local y=( ${op2[@]} )
for (( i = 0; i < ${#e_digits[@]}; i++ )); do
op1=( ${y[@]} )
muli ${e_digits[$i]}
op1=( ${x[@]} )
op2=( ${res[@]} )
sub
op1=( ${res[@]} )
muli 10
x=( ${res[@]} )
done
op1=( ${x[@]} )
op2=( ${y[@]} )
mod
op1=( ${tmp1[@]} )
op2=( ${tmp2[@]} )
}
got_next_digit() {
op1=( ${a[@]} )
addi 1
op1=( ${res[@]} )
op2=( ${b[@]} )
next_digit_helper
div1=$div
op1=( ${a[@]} )
next_digit_helper
(( div1 == div ))
}
found_digit() {
echo -n $div
e_digits[${#e_digits[@]}]=$div
}
ecalc() {
a=(1)
b=(1)
e_digits=()
d=0
k=1
n=$1
while (( d <= n )); do
until got_next_digit; do
increase_a_b $k
let k+=1
done
found_digit
let d+=1
done
echo
}
Now this works, but it's even slower than the last attempt. We could improve things by reducing a and b as before when possible, but that's not going to gain us much.
There is one other thing we can do, though it seems potentially unsafe. There's a lot of repeated work involved in figuring out how many digits we've accurately calculated. If we guess in advance how high we need to take k, we can save ourselves a lot of work.
Fifth attempt
Recall that we have n digits if a_k/k! and (a_k+1)/k! agree up to n decimal place
s. The difference between these is 1/k!. If k! > 10^(n+1), then the decimal expansion of 1/k! will start with at least n+1 zeros. The only way a_k/k! and (a_k+1)/k! could disagree at or before the nth decimal place is if the digits of a_k/k! in positions n+1, n+2, … log_10(k!) are all 9. If additionally a_k/k! disagrees with e at the nth decimal place, it follows that the digits of e in positions n+1, n+2, …, log_10(k!) are all 0.
e is conjectured to be normal, which would mean that any arbitrarily long string of 0s can be found in its decimal expansion. But the probability of l 0s starting at a given position is 10^-l; so if we ask for, say, 100 digits and take k up to 100, then since log_10(100!) = 157, the probability of getting a digit incorrect is 10^-57, which I feel is acceptable. So let's ignore the problem for now.
extract_many_digits() {
let d=0
op1=( ${a[@]} )
op2=( ${b[@]} )
while (( d <= $1 )); do
mod
echo -n $div
op1=( ${res[@]} )
muli 10
op1=( ${res[@]} )
let d+=1
done
}
ecalc() {
a=(1)
b=(1)
k=1
n=$1
while (( k <= n )); do
increase_a_b $k
let k+=1
done
extract_many_digits $n
echo
}
It's a lot faster, but still slightly slower than the second attempt. (And has the same caveat that as written, it only works for sufficiently large n.)
I've now run out of ideas, which is a bit anticlimactic. (At least, I've run out of ideas that I think will do any good. We could get a more accurate bound on how high to take k; but that could be applied to our second attempt as well. We could reduce a as we go along; but that would make increase_a_b slower, probably gain us very little overall, and certainly not improve on O(n^2).)
So let's return to the second attempt.
Second attempt, take two
Recall that this is the algorithm we're using:
ecalc() {
let n=$1
echo -n 2.
for (( j = n; j >= 2; j-- )); do
coef[j]=1
done
for (( i = 1; i <= n; i++ )); do
let carry=0
for (( j = n; j >= 2; j-- )); do
let temp=coef[j]*10+carry
let carry=temp/j
let coef[j]=temp-carry*j
done
echo -n $carry
done
echo
}
It seems foolish to rely on an algorithm without understanding it, so how does it work? The paper doesn't make it entirely clear, but what's going on is this:
We approximate e as 2 + 1/2! + 1/3! + 1/4! + … + 1/m!, where in our implementation m=n. Rewrite this as 2 + 1/2 (1 + 1/3 (1 + … 1/(n-1) (1 + 1/n) … )).
This in turn is 2 + 1/10 (1/2 (10 + 1/3 (10 + … 1/(n-1) (10 + 1/n (10)) … ))). Some of the 10/k terms are greater than 1; we refactor so that, for example, 1/2 (10 + 1/3 (10 + …)) becomes 1/2 (13 + 1/3 (1 + …)) and then 6 + 1/2 (1 + 1/3 (1 + …)). Eventually we have e = 2 + 1/10 (c + 1/2 (c_2 + 1/3 (c_3 + … 1/(n-1) (c_(n-1) + 1/n (c_n)) … ))) where each c_k < k. It follows that the 1/2 (c_2 + … ) term is less than 1, so c must be 7, the first decimal digit of e.
Rewriting again, e = 2.7 + 1/100 (1/2 (10c_2 + 1/3 (10c_3 + … 1/(n-1) (10c_(n-1) + 1/n (10c_n)) … ))). We apply the same procedure to get the second digit of e, and so on.
The algorithm's coef[j] takes the place of these c_j. To get each digit, we recalculate the c_j in one pass starting from the right; the digit is whatever term is left over outside of the 1/2 (…).
It's worth noting that this algorithm has the same probability of making mistakes as our fifth attempt. So it's probably worth thinking about this probability in more detail.
However, we note that if the calculated sequence of digits ends with d 9s, then only the final d+1 digits have a chance of being incorrect. If they are, then the 9s should all be 0s, and the one preceding them should be one higher. This is reassuring; it permits us to say "if you really can't tolerate the slightest inaccuracy, then ask for more digits than you need, and throw away the final ones as appropriate".
(We could automate the process, but that would require us to guess a new n, then run the algorithm again from scratch. If we want that behaviour, we really ought to write it as a seperate program. In this sense, the fifth attempt was better, because you could store intermediate results and use them to get a head start on later ones.)
I also note that I've spent too much time on this already, and that the implementation as it stands chooses m larger than necessary (except for small n, which we shall soon fix), massively reducing the probability of an error. (For n>50 or so, the probability of an error is smaller than the probability of winning the lottery; if an error does appear, it seems more likely to be from some a mistake somewhere else.) And whatever the actual probability of error is, it was originally small enough that the authors of the paper didn't notice, and after cursory examination I haven't been able to find any instances where the original algorithm made a mistake. (Digit 327 looked promising, being followed by three 0s; but it turned out okay in the end.)
So while I'd like to go into more depth on this issue, I won't do so at this point.
It remains to fix the algorithm for small n. We simply calculate to at least 22 decimal places' worth of precision. This is a little slower than necessary, but the small-n case hardly seems worth optimising.
ecalc() {
let n=$1
let m=n
(( n < 22 )) && m=22
echo -n 2.
for (( j = m; j >= 2; j-- )); do
coef[j]=1
done
for (( i = 1; i <= n; i++ )); do
let carry=0
for (( j = m; j >= 2; j-- )); do
let temp=coef[j]*10+carry
let carry=temp/j
let coef[j]=temp-carry*j
done
echo -n $carry
done
echo
}
We could at this point try to calculate the value of m actually needed. We could even use our arbitrary-precision arithmetic to do it; we haven't implemented logarithms, but we can get an upper bound using the inequality log(sum(a_i * (2^32)^i)) < sum(log(a_i) + i*log(2^32)).
But this would impose O(n^2 log n) startup cost, so is decidedly not a good tradeoff. There may well be better ways to approximate log_10(m!), but again, I've spent too much time on this.
This has been interesting to write, even if more than half of it turns out to be useless.
Posted on 16 June 2012
|
Comments
(Fri, 6 Dec 2013: Importing this post from its original home as a gist.)
The recent post on Hacker News about #! semantics surprised me. I had always assumed that a shebang line like
Would be equivalent to calling
$ /usr/bin/prog -a -b <file>
- but instead, it's
$ /usr/bin/prog '-a -b' <file>
This could only surprise me because I hadn't found out the hard way, so maybe it's not a big deal. Most scripts that I write don't have even a single argument on the shebang line. Most of the rest are Perl scripts, and perl is clever when it comes to parsing a single argument that looks like multiple arguments:
$ echo hi there | perl '-a -n -l -e print $F[0]'
hi
But this behaviour does have consequences, especially with the use of higher-order commands such as sudo, nice and env. For example, the following shebang lines will not work as intended:
#! /usr/bin/sudo -u phil sh
#! /usr/bin/nice -n 19 sh
#! /usr/bin/env perl -n
(Scripts using sudo and nice in a shebang seem unlikely to be distributed, but might find use in site-local maintenance scripts. env can be used to make a script more portable, in case a program isn't in a consistent location across systems.)
So I got to thinking about a program that would act like env for this purpose, but splitting its arguments on whitespace, or even doing full shell-like parsing of quotes.
Of course, such a program already exists: its name is shell. sh accepts the -c option to pass a shell expression on the command line. If this expression comes from a shebang line, word-splitting will be performed just like when typing directly into a shell. As a bonus (arguably), you even get to use things like pipelines, output redirection, shell built-in commands, and forking to the background, all in the shebang line of a script.
There is one downside: normally with a shebang line you can think of the script name and any arguments as being implicitly appended. This no longer holds: sh -c takes an expression, not a program name, and expressions don't take arguments in the same way that programs do. Instead you need to access these arguments through shell variables $0 through $9, $* and $@.
Alas, my first tests failed. It seems that -c requires its argument to be, well, a separate argument, so it's not much use with a shebang. (This is the case when sh is linked to Bash. Perhaps other shells are different, but if it doesn't work in Bash's sh-emulation mode, it probably can't be considered portable.)
So I went ahead and wrote a small script to get this behaviour. I even improved on what I could have done with sh -c: by default the script name and arguments are implicitly appended, but passing -c at the start of the first argument disables this.
I named this script /usr/bin/cmd, so for example the following shebang lines are now possible, and do what you would expect:
#! /usr/bin/cmd sudo -u phil sh
#! /usr/bin/cmd nice -n 19 sh
#! /usr/bin/cmd perl -n
But you can also do things like
#! /usr/bin/cmd grep -v '^#' | perl
to strip comments from the input before you process it. Or perhaps
#! /usr/bin/cmd -c perl "$0" | xargs grep "$@"
to generate a list of filenames in perl and search them for a pattern given on the command line. On a similar note,
#! /usr/bin/cmd -c perl "$0" "$@" | xgraph -nl -m &
might save having a separate file just to pipe the output of a perl script into xgraph.
I have a lisp program which expects an S-expression as input, but I can use
#! /usr/bin/cmd (echo \(&&perl -lne'chomp,print"($_)"'&&echo \)) | sbcl --script
and now it expects plain text. (It could be cleaner, but I wanted to keep it less than 80 characters, and semicolons don't interact well with parentheses and emacs' paredit-mode. This example is probably a little more extreme than is sensible.)
There are also some pitfalls to cmd. If you have a system which does split shebang lines, I think normal behaviour will still work, but anything fancy - any use of -c, or shell processing - will fail. I don't think it would even be possible to port, unless there's some way to tell where the shebang arguments stop and the command-line arguments begin. (You could work this out in most cases by checking which argument names an executable file with cmd on the shebang, but that seems fragile.)
You need to be careful in -c mode to quote properly. Otherwise it will seem to work, but break mysteriously when an argument contains a literal space or wildcard. I think "$0" and "$@" are the constructs you want in almost all cases: everything else I've tried fails, and I haven't found anywhere that these fail. (Perhaps an option to cmd which would cause it to replace % or something with "$0" "$@" would be a good idea.)
If you want to be portable, you also need to worry about the length of your shebang lines. 127 bytes seem to be accepted on all modern systems, but I'll admit that I don't recognise (except perhaps in passing) many of the names in the originally linked article. (But if you want to be portable, you also want to wait until cmd is installed as standard on most systems. This might take a while.)
One pitfall that seems to have been avoided: I was worried that perl (which performs its own parsing of the shebang line) would be too clever for cmd, and recognise switches intended for programs later in a pipeline. This doesn't seem to happen:
#!/usr/bin/cmd -c perl "$0"; echo -l
print "hi";
produces output "hi-l", as intended. But I don't know exactly what rules perl uses, so there may be edge cases.
And I'm sure there are problems that I haven't anticipated. Which is the worst kind of problem.
Ultimately, I'm not sure how much use cmd will be. But until a few days ago, I don't think I'd ever thought about using a shell script in a shebang line, so I guess there's territory yet to explore. I'd be interested in hearing more ideas on the subject.
If you're interested, cmd can be found at http://github.com/ChickenProp/cmd.
Posted on 27 July 2010
|
Comments