I live in an island archipelago of the Salish Sea, which is itself a finger of the vast Pacific Ocean. The islands are in the far northwest corner of the United States, and also in the far southwest corner of Canada. Neatly tucked between the mainland and Vancouver Island, half of the islands belong to Canada—the Gulf Islands—and, due to a combination of hapless bureaucrats drawing lines on maps in places they had never been, and also to the death of a British pig at the hands of an American soldier back in 1859, half of the islands belong to the United States—the San Juans. Among the American islands, there are four which are served by Washington state ferries, another dozen which are inhabited by humans, and hundreds more, some of which are so tiny that they are submerged at high tide, which are inhabited only by seals and sea lions, otters and various birds of sea and shore. My family and I had been coming to these islands for more than fifteen years when, in 2022, we finally had the resources and freedom to move here.
The county seat of San Juan County is on San Juan Island itself, in Friday Harbor. The courthouse is here, as are many environmental non-profits; the ferry landing is right in town, so you can visit without a car. There is a glorious national (historical) park on the island, which includes the place of the Pig War that resulted in the U.S. laying claim to these islands— and there are visible remnants of the people who came before the Europeans. In all of these islands, the Coast Salish people left behind middens—trash heaps—which are visible in many of the coastal banks, and at low tide, their long abandoned oyster beds are visible, as half circles raised above the surrounding cobbles.
San Juan is one of the ferry-served islands of the (collective) San Juan Islands, the other three being Orcas, Lopez, and Shaw.
Orcas is the most mountainous of these islands, with glorious inland lakes in the massive Moran State Park, and is famous for attracting both artists and celebrities. The small town of Eastsound is smaller than Friday Harbor, and quite sweet1. Eastsound being a long way from the ferry landing, there are fewer day trippers from the mainland than in Friday Harbor.
Lopez is laid back and friendly, and flat enough that it attracts the greatest number of bicyclists. People literally wave as they pass one another on the road. Town is smaller yet than Eastsound.
Shaw is the smallest of the ferry-served islands. The one and only store on the island just celebrated its 100thanniversary; for many years it was run by nuns. Right next door, there is now a second business on the island. It is a tap room.
The waters around these islands were once full of salmon, orcas, and other species of whales. All are still here, although their numbers have dwindled. The kayaking is gorgeous, as is the hiking and biking. At summer solstice, the sky has sunlight for nearly 18 hours, and the clear blue skies, low humidity, and highs in the 70s throughout most of the Summer often feel like perfection. At winter solstice, however, the time between sunrise and sunset is just over eight hours, and storms out of the Pacific bring high winds, driving rains, and king tides that can make travel by land or sea treacherous. The sky feels low, heavy, and menacing. Winter is a gray season, even though we are in the rain shadow of the Olympic mountains, and receive only half the rain of nearby Seattle.
As you might imagine, the number of people on these islands swells in the Summer—people come from Seattle, certainly, but we have visitors from all over the world. Year-round residents of the county, according to the 2020 census, number only about 18,000 people. We are distributed primarily among the three largest islands—San Juan has nearly half, Orcas a third, and Lopez, which is about half the size of the other two and the least densely populated, has only one sixth of the population of the whole county. We should expect, therefore—and my experience biking and hiking on Lopez for many years with my family supports this expectation—that there’s not a whole lot of human activity on Lopez. Not a lot of business, and not a lot of trouble.
It was thus with some surprise that, on the earliest ferry out of San Juan on a Thursday morning in early Summer, I saw the population of the ferry, in both people and cars, appear to double when we made our stop at Lopez. We had left the dock in Friday Harbor, on San Juan, at 5:35am. The ferry was heading to the mainland—Anacortes—and so was everyone on the boat. Islanders go to the mainland for medical appointments and to see friends or shows, to pick up appliances or stock up on lumber or go to SeaTac—the airport. Many islanders report that the best part of any off-island trip is coming home. I have often found this to be true.
What could explain a ferry-going population of people leaving Lopez that seemed equal to that leaving San Juan, when the population of Lopez is a third that of San Juan? My intuition that something was off was informed both by my experience in these islands, and by my rough knowledge of some of the numbers that I’ve shared here. But my intuition was also based on observations that were untested and unquantified. It’s not that people and cars aren’t countable—of course they are—but I had neither the time nor the inclination to conduct a careful count, so I don’t know. Maybe my sense of things was off. Certainly there were more cars down below, on the car deck, but how many more? Was it really twice? Were the amount of cars that had gotten on at Lopez actually surprising? I tucked my sense away, not forgetting it, but not putting too much credence in it, for later retrieval and assessment.
Later that morning, I walked into a small independent lighting store in the charming town of La Conner, where the owner was restoring an antique lamp. I told him that I was coming from the islands, where there are no services such as the ones that he offers, and he said, “yes, we get a lot of island business here.” He paused then, reflecting, and added, “especially from Lopez these days. I don’t know what that’s about.”
Call that observation number two. It’s hearsay, it’s anecdotal, it’s unverifiable. But unbidden, I had heard something that was consilient2 with my observations on the ferry earlier that day. We humans are always looking for pattern; sometimes we miss it when it’s there, more often we attribute meaning to things that have none. Coincidences happen. And our brains tend to be verificationist once we’ve got an idea in our heads—I think there’s something going on on Lopez, so I am more likely to remember conversations like this one, and more likely to throw out as irrelevant any suggestions to the contrary. I am hyper-conscious of this tendency of organisms to do this, so am, I think, less likely to make spurious connections than is the average bear, but nobody is fully able to rise above their bias. All of that said, consilient observations are precisely the kind of proto-data that we look for when considering whether or not there is a pattern to be discussed or revealed.
The next day, I was reading our weekly paper, The Journal of the San Juan Islands, which always publishes the Sheriff’s Log for the county from the previous week. Usually, the Sheriff’s Log is a sedate mixture of speeding violations, reports of dogs at large, and occasional calls about possible domestic violence or trespass, most of which turn out to be nothing. And usually, they are mostly from San Juan and Orcas, with an occasional report from Lopez. This week, however, out of 35 total incidents reported, fully 43% came from Lopez. Lopez, remember, has only one sixth of the population of the county—that’s about 17% of the population generating 43% of the Sheriff’s reports.
Call that observation number three. Unlike my sense that the numbers of people coming on to the ferry from Lopez were out of proportion to what I expected, however, in this case, I could quantify my sense that something was off.
Specifically, what I could do with the Sheriff’s Log was calculate whether the observed numbers—that is, how many actual events there were from each islands—were significantly different from the expected numbers—that is how many events we would expect from each island.
There is an elegant class of statistical tests—wait, don’t leave, this is good stuff, I promise—which are simple and easy to use, called “goodness of fit” tests. Goodness of fit tests assess whether something that you have observed directly—some actual measurement from reality—is a match for what you expected. You might ask: How do I know what I expect?
Knowing what to expect is part of the art of science, specifically of pattern recognition, and of hypothesis generation, and of experimental design, and of statistical analysis. Learning to bring to explicit consciousness what it is that you expect is one way that we move from intuition to rational conclusion. Intuition is generally implicit—I feel like there’s something wrong here, I’m going to turn around, things feel off—and there is nothing wrong with that. But when you have an opportunity to check your intuition against measurable outcomes, you should do it. Among other things, this will help you refine your intuition in the future. The more explicit you can be about what you think is wrong, why your spidey sense is on full alert, how you think you’re being misled or manipulated, the greater the chance that you can avoid a bad outcome yourself, communicate to others what you understand, and have an ever more accurate understanding of the world.
In the case that the Sheriff’s Log revealed what seemed to me a strangely high number of events from Lopez, I dusted off that most famous of goodness-of-fit tests, the Chi-Square, and did an analysis on the data. You can do this too. It is, really and truly, not that hard. And it is so empowering.
I won’t walk through all of the analysis here,3 but will briefly describe the logic.
The actual data (observed values) were: 15 out of 35 Sheriff’s Log events happened on Lopez; 20 occurred elsewhere in the county.
To calculate my expected values, I began with the assumption that the only thing that should affect how many events happen on a given island is its population4. That is: events will be evenly distributed across the population. From U.S. Census records, we know that only 17% of the population of the county lives on Lopez. Thus, our expected value of events on Lopez is 17% of 35 total events. Seventeen percent of 35 (just multiply 0.17 by 35) is 6.
Six events were expected to happen on Lopez. But 15 were observed to happen there.
Now there’s a little Chi-Square magic that you do, where you take the square of the difference between observed and expected values, and divide the whole thing by the expected value, and that no doubt sounds complicated, but again, it really isn’t. In this case, the result is staggeringly significant. Which is to say that those numbers coming out of Lopez, at least for this one week in June of 2024, were way out of whack.
Now we’ve got numbers to go with our intuition. Those numbers had felt wrong to me—felt really, very wrong—but I did just a little math, and now I have quantitative back-up. Is something up? Yes, something is up. I don’t yet have any insight as to what it is that is up, but I do now know that what I’m actually observing is significantly different from what I should expect to observe.
What are statistics for? In modern times, too often, statistics are used to confuse, muddle, and scare the populace into doing something. Given that they are used this way, it’s even more important to have some grasp on their power, and their execution. With the ubiquity of computers, statistical tests have gotten more and more complicated. When people had to do all the math by hand, there was a constraint on complexity, and we thus had tests that were less powerful, to be sure, but also far harder to game, harder to hide statistical chicanery in. Many—perhaps most—modern scientists who use statistics, have a specialist statistician on whom they rely to do their stats for them. This means that in many cases, not even the scientist who “did the research” really knows what happened to her data between generating it, and having results to share with the world. Take data, put it in a black box, and nod sagely at the fancy-sounding result that comes out the other side.
Personally, I don’t want to trust the analysis of data that I worked hard to collect, to someone using models and software that they can’t explain in English to me. I also don’t want to trust the analysis of society to such models and software. If something feels off—if it feels like you’re spending way more at the grocery store than you used to, but are assured that the economy is booming; if you’re told at the doctor’s office that some measurement that they did of your blood means that you need to start taking prescription drugs right away, but you feel healthy—do what you can to check the math, or the analysis, of the people who are trying to convince you of something that feels wrong. Maybe you’ve changed your buying habits, and food prices are stable—or maybe the prices of eggs and fruit and meat really are climbing faster than in previous years. Maybe you do have a condition heretofore unknown to you which will benefit from medical intervention—or maybe the system that says that a level of X means you need to take Y is flawed. Especially if their solution requires a second fix for their first fix—drugs to deal with side effects of the other drugs they’ve got you on—seriously question if they have any idea what they’re doing. All too often, they do not.
What are statistics for? What statistics are supposed to be for is to remove bias from our interpretation of the world. Along with the scientific method, statistics allow us to accurately describe pattern, and to assess how likely that pattern is to have happened by chance. Put another way, statistics allow us to quantify coincidence, and minimize magical thinking as explanation for what we see around us. As humans walking around a complex world, we try to make sense of what we see, and try not be misled by our own observations and presumptions that there is pattern when perhaps there isn’t any. In this job, statistics are our friend. Just having familiarity with goodness of fit tests, and the Chi-Square test in particular, can make you far more self-sufficient, and allow you to hone your intuition. There are a lot of situations where the Chi-Square test can’t be used, but also a lot where it can be.
One more thing: The events out of Lopez in the Sheriff’s Log were not just quantitatively unexpected, but qualitatively unusual as well. While San Juan had a number of traffic violations and a couple of thefts from businesses, and Orcas had some animals-at-large and domestic disputes that resolved quickly, Lopez had several incidents in which individuals were sufficiently “suspicious” or “argumentative” or “yelling during the early morning hours” that police were called. Lopez also had home burglaries; and an on-going investigation into a death. It is more difficult—and more subjective—to conduct statistical analysis on qualitative data, and I will not attempt that here. But what these left me with was more consilience—more pieces of evidence that suggested that something was not quite right on Lopez.
At the time of this writing, my story doesn’t have an end. It is not, therefore, a very good story. Something unusual was going on on Lopez in early Summer 2024, but I do not have a good bead on what that something might be. I’ve heard a few hypotheses, all of which are a bit vague. What explains the unusual level of activity, police and otherwise, on Lopez? Some blame the ferry system. No, suggest others, it’s the lack of affordable housing. Actually, say still others, it’s all that newly built affordable housing. Presumably this is not an exhaustive list of possibilities. Perhaps none of these even come close to describing what is going on on Lopez. But the list is a good start: It’s the boats. It’s the lack of housing. It’s the abundance of housing.
Imagine if I took what I do know—something unusual is happening on Lopez—and attached it to one of these possibilities, making a construction that sounded like this:
“Higher numbers of police reports on Lopez due to the housing shortage should prompt locals to push harder for housing solutions.”
That last sentence is a hypothesis masquerading as a result. The thing I know is that there were higher numbers of police reports than expected; I did the statistics. I did not, however, in any way test or try to explain why there were larger numbers of police reports.
First you establish that there is a pattern that needs to be explained. Then you get down to trying to explain it. Explaining it is difficult, though; sometimes even impossible. So very often, in journalism and in the popularizing of scientific results, we see a conflation of “there’s a pattern” with “this is why there’s a pattern.” Arguments often look like this, with a logical flaw right in the middle, at step two:
We know X.
X is because Y.
Therefore we need to do something about Y in order to address X.
We do know X, but we don’t know why X is true. Thus, line 2 is unfounded; it might be true, it might not be. We do not know, and no work has been done to suggest it either way. But uncareful journalists and scientists, in combination with a public that wants to have trust in its institutions and authorities, slide right by what we don’t know, and make claims of truth, in the name of science, that are unfounded. Once again, with the details from this particular story:
We know that police activity on Lopez is unusually high.
Police activity on Lopez is unusually high because of the housing shortage.
We therefore need to do something about the housing shortage in order to reduce police activity.
It is true that we know 1, but we do not know 2; thus 3 is pushing an agenda that is unwarranted by the available analysis.
Maybe it’s true. Maybe it’s not. When I say that a housing shortage may not be responsible for high levels of police activity, I am not saying that I think it is not responsible. I am also not saying that I don’t believe in housing shortages. Nor am I weighing in on whether such a shortage exists on Lopez right now. I am making precisely no claim in that realm whatsoever, moral or otherwise. I am just stating the bounds of what we actually know. That is the responsible, careful, scientific thing to do.
In the 21st century, though, being responsible, careful, and scientific can get a person branded an alt-right conspiracy theorist. It’s worth being called names for standing up to powerfully bad analysis. There is more powerfully bad analysis on the way, of that we can all be sure. It will be designed to confuse, divide, and scare. The better we all get at recognizing rhetorical tricks, such as claiming the mantle of science when all you’ve actually got is a nicely worded statement and a fancy degree, the better we can resist the scaremongering tactics to come.
Eastsound is also home to Darvills, the bookstore which sells signed copies of A Hunter-Gatherer’s Guide to the 21st Century!
Consilience is a term of art in the branch of evolutionary biology that seeks to discover the history of relationships between species. In phylogenetic systematics, scientists might use both molecular characters (e.g. DNA sequences) and morphological characters (e.g. shapes and positions of bones and soft tissues) in their analysis, but sometimes these different types of data disagree—they are in conflict. Dig deep, though, and expect consilience between your datasets, for—assuming that we are living in an objective universe with a single timeline—all true stories must reconcile.
I’ve got a ridiculously simple spreadsheet which I used when I taught stats to undergrads. I don’t think that excel files can be embedded in or uploaded to Substack, though.
This is an explicit assumption, which is how they should always be. Obviously there will be exceptions—when special events bring large numbers of people to one island but not others, we should expect the number of police events to be higher there as well. Making all of your assumptions explicit means that your work can be checked, both by yourself, later, and by anyone else who wants to do so.
@Heather You know your audience! I chuckled audibly when I read, "...wait, don’t leave, this is good stuff, I promise...".
Thanks for this. It drives me nuts, that people can draw conclusions out of thin air, and they claim they are 'following the science'. I won't name any specific president who has done this. Progressives will think I'm talking about Trump. MAGA's will think I'm talking about Obama and Biden. Only research provides anything resembling the truth.