Prying open cracks

Finally, it is time for the data criticism. Even though the real issues lie elsewhere, it is my position’s cornerstone—and will reveal a shameful side of science.

Prying open cracks
Generated with deepai.org.

At this point in the story, things get technical, and that brings several challenges. It needs deciding what aspects are relevant, and tying them into a comprehensible thread is difficult—especially when the relevant information is scattered across various publications (or even outright omitted). But there are advantages, too. The situation is well-suited to outline bigger, often obscure issues that underpin problems—what happens, for example, when scientific standards aren’t upheld, data scrutiny is lax, controls are not performed, peer-reviews rushed, or responsibilities evaded.

It should be highlighted that these systemic issues are the real concern. Mistakes happen, and there has to be room for them, especially in research. As you push the boundaries of knowledge, you inevitably stumble at times. Nonetheless, defending the truth is science’s chief ambition, and corrections must be possible. A scientist who thinks themself infallible, who won’t admit mistakes, and refuses to amend false findings cannot be trusted.

So, while I focus on the details of one particular situation, I do so only of necessity. It provides distinct results to analyse, allowing me to build a foundation for my criticism, by extension of which larger topics can then be discussed. But make no mistake; the analysis will reveal a lot between the lines. Yes, it is difficult to prove obscure things (e.g. neglect or misconduct), but the concerned publications clearly verify the disregard of serious issues for extended periods.

The approach will also reveal just how laborious it is to actually scrutinise data, and how much expertise it takes only to recognise problematic results (in modern science’s highly specialised fields). It will also demonstrate how easily false results spread despite academia’s (theoretically) strict guardrails, and, importantly, how futile corrective efforts are in the face of an utterly indifferent population.

As things get more technical, please do keep in mind that you don’t need to understand every detail. I provide this expansive criticism only as proof for my stance, for those few who are interested. The real issues are larger, and will become the focus some ways down the road. Let us now walk through the data, step by step, back in time, until we find the moment things turned sour.

The most recent publication aims to expand a rather modern measurement device’s scope of applicability. The so-called atomic force microscope (AFM) exploits simple principles to achieve remarkable resolution. In its easiest configuration, it functions like a record player that traces a needle’s deflection as it is dragged across a surface. The signal is recorded through a laser beam reflected off the backside of the cantilever that holds the needle. Any deflection is thus amplified, thanks to the lever effect, enabling the resolution of individual atoms (far below the diffraction limit of light).

Since its inception forty years ago, the AFM has been continuously advanced, enabling ever more sophisticated applications. Modern measurements often oscillate the needle slightly above the surface instead of dragging it across; a much more gentle approach. Instead of scratching the sample all over, it is only poked when the cantilever is at its lowest point. Or, to understand the sample’s elasticity, the needle can be deliberately pushed into the surface.

Furthermore, the cantilever can be “functionalised”, meaning a molecule of interest is tethered to the tip (with a linker molecule). This opens a whole new avenue of investigation during the needle’s retraction: Should the molecule bind to the surface, it will hold back the cantilever. Consequently, the bond’s strength can be calculated (via so-called force spectroscopy).

But perhaps most impressively, all of that is possible in liquid—and thus with biological samples in a native environment (for example live cells in a buffer). Which is where my predecessors’ findings come in.

Supposedly, they successfully combined two approaches. They functionalised a cantilever with a molecule that interacts only with a particular counterpart that is scattered across the sample’s surface. They then recorded the sample’s topography, thereby revealing the counterpart’s distribution (the points at which the needle is held back). Lastly, they targeted those distinct locations for further measurement, et voilà: topography and force spectroscopy in one go. Not a small feat.

Unfortunately, the publication itself focuses on this combination of approaches, detailing only the used protocol, while claiming to have “validated the presented new mode” in two different projects. Scrutinisation is not possible with such little information, but the comparability of the different experiments is clearly established—and the first cracks in their claims appear (for a summary, scroll down):

  1. For example, one project’s experimental setup is entirely different from its original reporting: whereas the lipid sample is now deposited directly onto the measurement surface (Köhler et al., 2019, p. 6), the protocol was previously much more elaborate (Fis, 2018, p. 36-37), following a sophisticated and fragile step-by-step assembly of three finely tuned layers stacked on top of each other. This multi-layer approach was specifically designed to ensure the surface-embedded interaction partners are locked in place during measurement. Without it, they can move and once they do, force spectroscopy is no longer possible (as it is unclear where on the surface to measure).
  2. Several things need consideration to ensure force spectroscopy is reliable in the first place, none of which are verified:
    – Any supposed interaction can only be trusted if the signal occurs within the used linker molecule’s known length. This is one of the approach’s key reliability markers, as the responsible supervisor themself repeatedly pointed out in the past, e.g. in 2002. Nonetheless, the excessive unbinding length is simply “attributed to the elasticity of the receptors and the lipid membrane” (Köhler et al., 2019, p. 3). Which makes no sense. If the receptors deform so significantly or the membrane is lifted (thousands of times per measurement!), positional accuracy cannot be assumed, as necessary (among other issues that arise).– Similarly, the signal needs to have a specific shape that verifies the interaction indeed stems from the tethered molecule. The linker used for cantilever tethering is specifically designed to stretch in a particular fashion for that very reason—as the main authors themselves highlight in more detailed documents (which will still be analysed):
    Köhler (2016, p. 20) elaborates the linker’s “most crucial profit […] is the possibility to clearly distinguish between unspecific adhesion and specific binding events” thanks to the “characteristic non-linear profile of the unbinding event”.
    Likewise, Fis (2018, p. 19) argues the “unbinding event can be easily discriminated […] due to the parabolic shape of the force profile which arises from stretching of the PEG linker”.
    Their combined publication (Köhler et al., 2019), however, provides two different shapes. One does show the expected signal shape (Fig. 1c, bottom), the other does not (Fig. 4d).– Another wildly important factor to ensure interaction integrity is a form of specificity proof. Since the signal arises from two molecules coming together, it should disappear if one of them is hindered. To achieve this, so-called “blocks” are performed: the adding of a substance that binds (or inactivates) either of the two constituents. These blocks are the field’s fundamental control conditions, with the primary investigator describing it as “mandatory that the specificity of the receptor-ligand interaction be demonstrated by blocking experiments” (Hinterdorfer & Van Oijen, 2009, p. 416). Similarly, the other authors highlight this importance in more elaborate publications, too (Köhler, 2016, p. 13 and Fis, 2018, p. 19). Nonetheless, only one trimmed image is presented for one experiment (Fig. 2c) and none for the other. Once we review these more detailed documents in detail, it will become clear why this trimming of results is rather suspicious and that the used blocks are often inappropriate, or outright omitted.
  3. The only two individual measurement curves (Fig. 1c, bottom and Fig. 4d) show entirely different results than actually reported. One shows an unbinding force of roughly 5 pN, the other even less. Yet, after data analysis these forces are 10– to 40–fold higher (Fig. 3a and all reported results in Köhler, 2016 and Fis, 2018).
  4. The only reported final result (Fig. 3a) is supposedly the interaction between ATP and UCP1, but in another paper it reflects ATP’s interaction with UCP3, a different protein (Macher et al., 2018, Fig. 3A).
  5. For the second included project, interaction only happens if the surface-embedded molecule is correctly oriented. Supposedly, this was verified via its height profile, but the only shown recording reveals interaction (darkened spots in Fig. 3b, top) with molecules of various heights (more and less bright spots in Fig. 3b, bottom). This will be discussed more elaborately in the future, which will raise serious questions about this supposed identification process.
  6. Furthermore, that identical measurement (Fig. 3b) is also presented for two different experiments in Fis, 2018 (Fig. 5.2 and Fig. 5.16). As that publication’s experimental setup is supposedly different (see point 1 above), this recording is thus shown for three different conditions, leaving it unclear what it actually represents (and future discussion will reveal that it is, in fact, entirely untrustworthy for another reason). This will not be the last time we encounter this issue.
  7. Furthermore, it is noteworthy that even though “binding probabilities below 5%, due to the specific blocking” (Köhler et al., 2019, p. 11) are expected in control conditions, the two experimenters diverge in what they accept as signal in the first place. Köhler frequently dismisses binding probabilities up to 9.90% as unspecific—with values between 5% and 10% dismissed on 22 occasions (Köhler, 2016, pp. 168-169, 172). Meanwhile, Fis considers anything above 5% specific, including e.g. a 9.50% probability (2018, p. 61). There is more to say about this issue further down the line.
  8. Even though all experiments are performed with the exact same AFM (Köhler et al., 2019, p. 5; Fis, 2018, p. 37; and Köhler, 2016, p. 69), the exact same cantilever (Fis, 2018, p. 37 and Köhler, 2016, pp. 30, 86-87), on the exact same surface layer (a PLE membrane; Köhler et al., 2019, p. 6), and the same linker lengths (Köhler et al., 2019, p. 4; Fis, 2018, p. 26; and Köhler, 2016, p. 20) the two projects yield clearly different cantilever oscillation amplitudes of 7.4 nm (Fis, 2018, p. 37) and 13.7-14.1 nm (Köhler, 2016, pp. 48, 57, 83). These values are only reported in the more detailed publications, even though they are “[t]he most important parameter that needs to be properly adjusted” and “[t]he fact that only a short range of oscillation amplitudes are appropriate” (Fis, 2018, p. 15), as “[t]he range of appropriate amplitudes for recognition imaging is […] sharply localized” (Köhler, 2016, p. 28). Köhler et al. (2019, p. 9) actually state “[t]he proper chosen oscillation amplitude is the most prerequisite to achieve reliable recognition events [SIC]”, but still omit it.

In short, the publication provides only very scant information, the details of which can only be found in other documents. Nonetheless, it raises a lot of questions. It is unclear whether the findings are based on a reliable signal (with multiple crucial indications to the contrary), several results are presented for diverging experiments in other publications, some claims directly contradict more elaborate reports of the same data, and what little detail is provided deviates significantly from final results.

And yet, we are only just scratching the surface.