Stats for Perverts, Vol 1
In short: Keep track of the number of people who RSVP "I'm Going" and the number of people who actually show up, and then use a spreadsheet program to find the equation relating one to the other.I tracked RSVPs and attendance for the last 33 BEHIVE erotic hypnosis munches, and I can now expand on my initial findings with 8 datapoints (where I concluded you should ignore the maybes). There is a strong relationship between the number of Yes RSVPs and the total number of attendees, R square = 0.85. That means that 85% of the fluctuation in munch sizes can be predicted by the RSVPs, and this is highly statistically significant (p < 0.00001) meaning it's very unlikely that the relationship is just a coincidence.
But the picture tells the story better. You can easily see the relationship, where the more RSVPs we had the more attendees.
Furthermore, this gives us an equation: 0.866 x Yes RSVPs + 4.75. Or you could round it off to 0.9 x Yes RSVPS + 5 for simplicity. Once you have an equation like this, it's easy to make a prediction, even on a calculator app: 20 RSVPs translates to about 22 people, 30 RSVPs translates to about 31.
You'll notice this isn't so far from the rule of thumb that I proposed in the last post: Predict that all of the yeses, and none of the maybes, will show up. In fact that rule also has a pretty good R square fit, 0.75. So we'd be fairly safe using that rule for our events. BUT that's BEHIVE munches - the equation for your event might be very far from "all of the yeses". You need the data.
By the way, Maybes are still utterly worthless!! With this larger dataset, the correlation is now R square = 0.18, which seems like it might be better than nothing, but in fact it is not!! With 33 datapoints it is not even close to statistical significance, p = 0.32, which means if you correlated any two random things in the world - price of fig jam to sales of fucksaws, number of shark attacks to length of Divney posts - you could easily get an R square of 0.18. Ignore the maybes!
How can you get your own equation and graph like this? You can use Excel, but I'm going to step through using Google Sheets because it's free.
- First, put all the I'm Going RSVPs in one column of a spreadsheet
- Then put the actual attendances in the column next to it
- Go to the Insert menu and choose Chart
- A panel will appear on the right. Change Chart Type to Scatter
- Click on one of the points in the graph and scroll down in the righthand panel. Click to select Trendline.
- Keep scrolling in the panel, and click on Show R2, and under Label, choose Use Equation
When it comes to actually phoning restaurants, though, how much can you trust it? Well I can get an idea of how off it can be by looking at how high the points are above or below the trendline. That tells me that using this equation, I'll usually only be off by 3 people or less (though with exceptions!), and I'm just as likely to overestimate as to underestimate. But your uncertainty may be different, especially if you have many fewer datapoints to work with. I recommend waiting until you have numbers for at least 6 of your events to make this kind of chart. I plan to write a future post about how to get an idea of the uncertainty in your prediction in a more quantitative way.
There are a lot of other things that might help predict munch turnouts. How many people came last time? Is there a steady trend of growth or shrinkage over time? Does it change by the season? What about weekends with other important events going on? The method I described above is technically called a linear regression, but I think it should be called "line fitting" - you are relating one set of points to another by finding the equation of a straight line that passes as close to the points as possible. When you throw in more predictors like that, it gets a lot more complicated, becoming what is called multiple regression, and I plan to write about how to do that properly too. But you can get surprisingly far with a simple equation like this.
It's tempting to get frustrated as an organizer when people flake on their RSVPs, and both overestimating and underestimating the reservation size can cause a lot of problems. I say, don't get mad, get modelling! And treat every event like an experiment that helps you get better and better at these predictions.
No comments:
Post a Comment