Absolutes are very comforting things : this is right, that is wrong. Once established, no further thought is required - indeed, an absolute fact can be used to refute anything that disagrees with it. This is extremely powerful, but also potentially very dangerous. What if you're just plain wrong about those facts ?
For some time now I've been trying to convince y'all that science is rarely - but not quite never - about these black-and-white absolute facts. Granted, it has a strong component of factual elements : I measured this result with this instrument using this method. Perhaps the measurements were wrong or the method was not appropriate, but that measurement was taken in that way even so. I even contend that it's possible to establish not merely that things happened but even to prove their underlying causes - though this does require a carefully stated (but by no means unusual or watered-down) definition of proof.
But the scientific method is truly a messy, complex thing. Working out the relations that explain all those disparate facts is seldom as clear as the measurements themselves. And so while I really do like the opening quote very much, I take strong issue with the final statement. That's all there is to it ? Is it really so simple to judge whether something is right or wrong ? Well, not exactly.
Falsifiability Is Nice, Though
|  | 
| At least it's undeniably easier to disprove than to prove something. Whether you can really do this with a single experiment is something we'll come back to later. | 
It's obviously much better if it's possible to conduct some experiment which will either confirm the predictions of your theory or find some different result. Now, confirming its predictions doesn't automatically prove your theory. It's still possible that another model would do an equally good job of coming to the same answer by a completely different mechanism - and of course it's possible that new observations would reveal something that your theory didn't predict at all, in addition to what it got right. Don't take the latter case lightly - as the link explains, models can make extremely specific, accurate predictions but still be fundamentally wrong. I call this the "more than one way to skin a cat" principle.
So proving models to be correct is extremely difficult. Not impossible, for reasons I go into at length here - but in short you have to be careful about your definitions. "Proof" must happen within the assumption of an objective, measurable reality in which we're not being fundamentally deceived by our poor senses / Dave's farts / evil demons / Theresa May's witchy powers. Your theory must be carefully stated so that it's as specific as possible - otherwise a single error could be unfairly held to disprove it, when in fact it just needed a very small adjustment*. Properly framed though, the line between theory and fact can indeed become blurred with sufficient evidence.
* Immediately we see that "that's all there is to it" is looking decidedly shakey. Great care must be taken to ensure that the result really does disagree with the fundamental nature of the theory and not some more legitimately tweakable aspect.
But surely disproving a theory is much easier ? It's surely much more decisive when an observation disagrees with prediction, right ? And isn't it absolutely essential to a scientific theory that it should be at least possible to disprove a theory with enough effort ?
Here's Where Things Get Tricksy
Well for starters being falsifiable is clearly, at best, a necessary but not sufficient condition for a scientific theory. Astrology is falsifiable and is falsified continuously, but that doesn't make it scientific. Without wishing to spend ages on what science actually is, astrology clearly isn't it. Hell, it probably wouldn't even be scientific if it got its predictions right.
But is it even true that it's necessary for an idea to be testable to make it scientific ? If you can't even know if it disagrees with experiment, how can it possibly be scientific ? Doesn't that equate it to the intellectual level of Dave the magical farting fox ?
There are two aspects to this. Sometimes ideas are very difficult to test, and sometimes testing them is fundamentally impossible. Again the boundary between the two can be blurred : if your theory can be tested in principle but only using, say, energy levels comparable to the entire output of the Sun in a million billion squillion years, that is certainly practically impossible. It can't be done in your lifetime. Philosophically though, this is clearly different to the case where no-one will ever be able to test it. Dave's magical farts, you theory might say, might be so damn magical that they defy rational analysis and simply can't be used to make any kind of prediction at all.
Clearly we need some examples from the real world, to show just how messy this can really get.
Yet More Shades Of Grey In the World Of Science. Quelle Surprise.
Ensemble models - false but not false but still useful
Here's a nice one from the world of meteorology. Because the equations of fluid dynamics are incredibly difficult to solve, they have to be simplified. It's completely impractical to solve all of the necessary billions of equations if you wanted to do it perfectly, and it's completely impossible to know absolutely everything about the weather system at any given time. Both theory and observation have intrinsic errors that can't be avoided. So they have to be reduced and simplified - it's that or stop making weather forecasts altogether.
This means that instead of running a nice simple single computer simulation, meteorologists run many - each with slightly different equations and conditions based on observations. Predictions are based on what the majority of models say will happen, but of course sometimes only a few models get it right while most get it wrong. The thing is that this doesn't mean those models are flawed and can be thrown on the fire and spat on in disgust - they're still perfectly valid approximations and in other circumstances may actually give more accurate results than the others. So while both the overall prediction and individual models are falsifiable, that falsifiability doesn't even mean you can say they're fundamentally wrong. It's much more complicated than that. It would be better to think of them only as testable : they might be falsified in this one particular case, but not all.
|  | 
| Example of ensemble models for the path of a hurricane. Most agree very well up to a point : after that they predict radically different paths. Small differences in the models and initial conditions lead to big changes in the end result. | 
What if you extrapolate falsifiable phenomena to untestable scales ?
Astronomy is awash with examples of ideas that are both difficult and impossible to test. Predictions are often made as to what will happen billions of years in the future, when it's entirely possible the human race will be extinct. And they're also made about things happening billions of years in the past or beyond the observable Universe - things we'll never be able to test. Does this really make them unscientific ? After all, they're based on logic, observations, and testable phenomena on laboratory scales - or indeed on smaller but still astronomical scales and timescales where observations can verify them.
Measurable, useful comparisons do not require falsification
Astronomy also presents good examples of the ambiguity of what "testing" and "falsifiability" mean, further emphasising that they are not the same thing. First, we can't observe the behaviour of individual galactic systems on the necessary timescales. Second, observations generally have large errors in the measurements, because the systems we examine are so faint and distant. So we can test our predictions statistically, e.g. we can see if the population of galaxies in the distant past does what we predict it should have done - we can see how well it agrees with our models. But we can't truly falsify it by checking if individual galaxies behave as we think - there are usually just too many parameters and too many sources of error, so it's often possible to easily explain away a few weird outliers from the overall trend. We can often only determine which model does best rather than which ones are completely impossible.
|  | 
| At least the "weird outliers" in astronomy are often nice to look at. | 
Theories can be only partially testable
Just to muddy the waters even more, some theories consist of a mixture of testable and untestable aspects. Inflation predicts that the observable part of the Universe is just a small part of a much larger whole, the majority of which can never have any influence on us at all. It predicts some signatures we can search for, but this most fundamental, dramatic aspect is thoroughly untestable*. And it uses a mixture of complicated mathematical models and hard-nosed observations. So is it science ? Similarly dark matter makes some testable, falsifiable predictions about galaxy cluster interactions, but could rightly be described as unfalsifiable in terms of directly detecting the actual particle in a laboratory - one can always say, "we didn't find anything, so the particle must be harder to detect than that."
* That is, we can't test it directly by flying off to a region of the Universe the theory predicts are forever inaccessible, by definition. More on the standards required for testing/falsifiability later.
Proofs you can't check - who watches the watchers ?
Here's another example - a computer claims to have proved an obscure mathematical theorem but its proof is far too long for any human to ever read. By necessity, this proof must be based on logical deductions, but if it's too long to check then is it really a proof ? This isn't really all that novel either - throughout history, stupid people have stubbornly refused to accept the proofs that cleverer people have come up with. Does that mean that clever people aren't being scientific if they can't explain their ideas to the mentally deficient ? With science becoming increasingly complex and requiring increasing amounts of time to fully understand, this is a real problem. And if scientists don't even fully understand their results, well...
Wiggle room is not cheating, but it can be problematic
And here's another extremely common issue : elephants ! I mean, models which depend on values which can't be determined by observation or experiment. These "free parameters" can be adjusted to get whatever result you like. In severe cases, even when you do have parameters that let you make a testable prediction which is found to be wrong, you might still be able to tweak the parameters so that everything's tickety-boo once again. This is similar to the nature of both dark matter and its alternatives : you can't falsify every possible dark matter particle and you can't falsify every theory of modified gravity. Are these ideas unscientific ?
Doing better than Dave but not very much
What about aliens ? The existence of aliens isn't yet established observationally, but the prospect of their existence is based entirely on scientific findings. Then there's the notion of godlike aliens, chronically invoked as explanations for anything ordinary theories can't account for. Such aliens are consistent with established scientific findings, but in terms of testability they're scarcely better than Dave the magical farting fox. On the other hand, most of the other examples I've just given are clearly much more scientific than Dave the magical farting fox, despite their lack of falsifiability in some areas.
What if it's falsifiable but only far in the future ?
|  | 
| And this popular myth is itself likely false... | 
Such control was trivial on the scale of apple trees, but absolutely impossible in Newton's day on the scale of planetary distances. It remained impossible for another 270 years, when Sputnik 1 became the first artificial satellite. It still isn't possible on planetary mass scales because we can't appreciably change the orbit of a planetary-mass body. And we can't do controlled tests on galactic distance scales either : like Newton, we can only interpret observations. Which brings us back to dark matter, with some people holding that there must be missing mass to explain the observations (assuming we know the theory is correct, it makes a prediction we can test to some limited extent) and others claiming that the theory of gravity breaks down on really large scales (assuming the theory has been falsified by observations). It's all rather messy, isn't it ?
There's more to science than numerical measurements
|  | 
| Science is really all about the kitties anyway. | 
But We Can At Least Falsify Some Things, Right ?
Of course. But to return briefly to the notion that "if it disagrees with experiment, it's wrong", it's worth noting that even here things aren't always as clear as we might like. Models are often complex, not only because of their fundamental nature but because (as mentioned) they may have many different parameters. Now if, given the known values of the parameters, a model makes a prediction which is then found to be wrong, does that automatically mean the model is wrong ? No - it could be that those known parameter values were simply in error. This does in fact happen sometimes, on those rare occasions when theory proves superior to observational "facts". It's a sort of generalised case of the anthropic principle : accepting a theory as true, you can use it to predict observational values.
It could also be that the model itself needs adjusting. Grey areas rapidly emerge when multiple adjustments are needed : at what point do you say, "we've made so many adjustments that this model is unsalvageable, we should just chuck it right out" ? The notion of crystal spheres and epicycles comes to mind, with early astronomers inventing more and more elaborate, unwieldy theories until eventually they gave way to something very much simpler. So while you often can totally falsify many models, in other cases it's far less clear cut.
But let's not go nuts : it's worth repeating that you can completely falsify some models. If your model predicts the existence of a planet around a star that should be detected by some telescope and no planet is found, then your model is wrong. The key is not to get carried away. Maybe your observations are good enough to rule out the existence of any planet around the star, but often they won't be - they will just put some limit on how massive a planet might be there. That may or may not be sufficient to rule out your model, depending on the details. Your model as originally and precisely stated is wrong, but that doesn't necessarily mean that every single aspect of it is completely bunk.
Conclusion : This Is All Deeply Unsatisfying
Indeed. To recap, falsifiability isn't an absolute necessity :
- Not all aspects of every theory are falsifiable or even testable, but that doesn't make them useless.
- Theories can consist of testable elements which can be extrapolated to scales where they cannot be tested.
- We often can't do controlled testing but are limited to purely observational interpretation, which is subject to some amount of unknown errors - so are falsification is far less rigorous.
- Sometimes when we can't falsify a theory, we can at least say if it makes better or worse predictions than its rivals.
- Theories which make no new predictions are still arguably better if they avoid philosophical conundrums or logical paradoxes that plague their rivals.
- Bizarre though it may seem, falsifying a model doesn't necessarily mean that it's been disproven.
- Models can often be saved - after seemingly being falsified - by honestly legitimate modifications; it's almost as rare to be able to declare a model truly dead as it is to say it's been proven.
- There are always grey areas rather than strict boundaries. You can't rigorously define what counts as a legitimate modification to a theory, nor is it sensible to set a time limit within which it should be possible to test your theory if it requires technological advancement to test it.
- How do you actually define falsifiability anyway ? What if a theorem is so complex that no-one else can even understand it ?
Falsifiability is, however, always a bonus. A theory never becomes worse by being falsifiable. But the demand for falsifiability, like so many other things, is highly beneficial in moderation but can be actually damaging if taken to extremes.
Indeed, really extreme proponents of falsification often tend to be those of the anti-science ilk. Geology, astronomy and anything else which involves deep time, they say, are not really sciences because we can't actually prove anything - no-one left records for billions of years ago for us to check, and we can't wait around to see how galaxies evolve. In a very strict sense, the evolutionary history of life on Earth and the behaviour of stars over cosmic time really can't be falsified.
Such a way of thinking has many parallels with conspiracy theories. It's not that everyone is lying, exactly, it's just that they are demanding impossibly high standards from the evidence which can never be met. By demanding ludicrously high levels of confidence, by refusing to make even the most basic assumptions and give the data some rudimentary level of trust, in short by refusing to even entertain hypothesis for the sake of it, they prevent themselves from learning anything. And they rarely say why they have such confidence in their own senses, which is bizarre given the complexities and many, many demonstrable fallibilities of the human brain.
Science, in a very crude sense, requires a sort of leap of faith - just enough to let you play with the data for the sake of it, just enough to trust that extrapolations are not totally outlandish - but not in such a way that your preferred conclusion becomes sacred and inviolable. Newton applied laws testable in Cambridge to the scale of the Solar System, an act that could be described as one of audacious faith, but he surely wouldn't have defended his ideas if the evidence had gone against them*. While science can often have elements similar to that of faith, it's actually more of a sort of highly elaborate play.
* If the evidence had gone against Newton, he wouldn't have rushed to proclaim himself wrong not (just) because he was a jerk, but because he'd want to be sure which way the evidence was really pointing overall. He'd have known that disproof is a far more subtle concept than a single observation disagreeing with a prediction.
This playful element is sometimes less obvious in the applied sciences like engineering, chemistry and medicine, where a very much higher degree of control and rigour is possible. There, falsification is not only necessary but largely unavoidable. But while it might be nice to hold falsification as a general overarching principle of science, it's a mistake to try and apply this to all sciences in the same way. The standards of falsification possible in engineering are simply impossible to meet in geology, archaeology, astronomy or quantum mechanics.
Yet while the latter disciplines cannot be distinguished from other pursuits by virtue of their falsifiability - which as we've seen is a thoroughly murky area - they are clearly of a different nature to the prospect of Dave the magical farting fox. So if we can't use falsifiability to set them apart, what should we use instead ? I prefer to abandon rigorous absolutes altogether, but there are at least some useful guidelines :
The Universe is, of course, under no compulsion to be testable by a bunch of hairless monkeys on an unremarkable rock floating through the cosmic void who think that digital watches are a pretty neat idea. Consequently, falsifiability is always nice to have if you can get it, but if you insist upon it in all circumstances, then you're hindering scientific advancement - not helping it. A theory that isn't falsifiable doesn't become uninteresting; not being able to "solve mysteries" (to use the journalistic vernacular) doesn't mean you can't ask increasingly interesting questions. Right and wrong answers are only a small component of science - for the most part, it's far more interesting than that.Indeed, really extreme proponents of falsification often tend to be those of the anti-science ilk. Geology, astronomy and anything else which involves deep time, they say, are not really sciences because we can't actually prove anything - no-one left records for billions of years ago for us to check, and we can't wait around to see how galaxies evolve. In a very strict sense, the evolutionary history of life on Earth and the behaviour of stars over cosmic time really can't be falsified.
Such a way of thinking has many parallels with conspiracy theories. It's not that everyone is lying, exactly, it's just that they are demanding impossibly high standards from the evidence which can never be met. By demanding ludicrously high levels of confidence, by refusing to make even the most basic assumptions and give the data some rudimentary level of trust, in short by refusing to even entertain hypothesis for the sake of it, they prevent themselves from learning anything. And they rarely say why they have such confidence in their own senses, which is bizarre given the complexities and many, many demonstrable fallibilities of the human brain.
Science, in a very crude sense, requires a sort of leap of faith - just enough to let you play with the data for the sake of it, just enough to trust that extrapolations are not totally outlandish - but not in such a way that your preferred conclusion becomes sacred and inviolable. Newton applied laws testable in Cambridge to the scale of the Solar System, an act that could be described as one of audacious faith, but he surely wouldn't have defended his ideas if the evidence had gone against them*. While science can often have elements similar to that of faith, it's actually more of a sort of highly elaborate play.
* If the evidence had gone against Newton, he wouldn't have rushed to proclaim himself wrong not (just) because he was a jerk, but because he'd want to be sure which way the evidence was really pointing overall. He'd have known that disproof is a far more subtle concept than a single observation disagreeing with a prediction.
This playful element is sometimes less obvious in the applied sciences like engineering, chemistry and medicine, where a very much higher degree of control and rigour is possible. There, falsification is not only necessary but largely unavoidable. But while it might be nice to hold falsification as a general overarching principle of science, it's a mistake to try and apply this to all sciences in the same way. The standards of falsification possible in engineering are simply impossible to meet in geology, archaeology, astronomy or quantum mechanics.
Yet while the latter disciplines cannot be distinguished from other pursuits by virtue of their falsifiability - which as we've seen is a thoroughly murky area - they are clearly of a different nature to the prospect of Dave the magical farting fox. So if we can't use falsifiability to set them apart, what should we use instead ? I prefer to abandon rigorous absolutes altogether, but there are at least some useful guidelines :
- Is the theory based on falsifiable components ? Does it at least make some falsifiable predictions even if not all of them can even be tested yet ? Can it be falsified on some scales even if it's impossible to test on others ?
- If you can't falsify that theory, does it at least make predictions which distinguish it from its rivals so that one can determine which one is more successful ?
- Does the theory have mathematical rigour even if its observational predictions are untestable ?
- If the theory offers no new predictions to distinguish it from existing models, does it at least do as well as those models ? And does it improve on any philosophical difficulties of the current ideas ?
Feynman, of course, understood all this very well. The opening quote is merely a simplification, a lie to children that's a useful introductory teaching aid rather than a fundamental truth. It's a great principle to aspire to, but the reality is much more subtle. So just to prove I'm not out to attack Feynman, let me give him the final word with something I think is much closer to the truth :
 







 



