The Unpublishables I: My Failures as a Scientist and Negative Controls
My biggest failures in research and the insider knowledge that they granted me: negative controls edition.
In December 2024, I received the letter confirming I'd completed grad school. For nearly five years, I worked on a class of RNA molecule called microRNA (or miRNA, for short) and asked whether certain microRNA candidates might be of therapeutic interest for vascular disease.
In every corner of science, papers tell only part of the story. The microRNA field is no exception. I got to discover on my own things that reading papers did not adequately prepare me for, and became dumbfounded by how my observations upended what I thought I knew. "How has no one noticed this?," I'd ask, and then I'd realise; someone probably has. They just haven't put this information out for the world to read. There is minimal incentive to publish such laboratory observations, which are lost to academic careerism's demands.
It also turns out that failure was the most common event that preceded my achievement of every major knowledge accumulation milestone, the type of knowledge that is not traditionally "publishable". Through these failures, I learned something profound, which was the fundamental disconnect between how science is presented and how it's actually practiced. I discovered that the literature makes scientific progress seem linear and methodical, when the reality is far messier, built on shaky methodological ground that few discuss openly.
That realization motivated me to write this piece, where I shall share my failures and how they relate to uncovering hidden, illegible, or tacit knowledge. Knowledge about proper controls, experimental design flaws, and the cumulative damage of methodological oversights that ripple through entire fields.
If you're a fellow biomedical researcher, maybe this helps. If you are simply a curious bystander, perhaps it prompts you to consider what useful, unpublished or "unpublishable” knowledge you hold about your field, and to share it, too.
The gap between theory and practice
A lot of people, myself included, have issues with microRNAs, and understandably so. These concerns generally are related to the theoretical and conceptual issues about how microRNAs work, whether they could function effectively as therapeutics, and why. These are the kinds of questions one can answer after reading the literature. There are also other, more niche and insightful answers to those questions that one will encounter when actually working with microRNAs in the lab hands-on, or hear firsthand from someone who has and isn't shy about venting their frustrations.
In retrospect, we can identify publications that foretold issues later encountered in practical research. Yet this revelation changes little when most researchers have either missed these papers entirely, failed to connect their significance to ongoing work, or knowingly disregarded their warnings in pursuit of publication. Ultimately, those small laboratory insights will remain within lab notebooks and university cloud storage systems and eventually rot, dooming a non-trivial number of people to painfully rediscover it all on their own.
Those practical challenges, the ones you really figure out through grinding on the bench, are where most of the very interesting information about a field comes from. Not having this information is a setback, and its degree of accumulation is a marker of seniority, a status symbol. How do you really control for microRNA experiments? How do you measure their abundance? How much microRNA should you add to your system for a good over-expression? What are the things my hands learned on the bench working with microRNAs that you won’t find in the methods section of any paper?
I will publish essays answering all of those questions, which contain a lot of my failures in the lab, starting with what has been my first very large frustration in research: controlling microRNA over-expression experiments.
Quick review on microRNAs. What are they, and why are they interesting?
MicroRNAs are short RNA molecules that regulate the expression of genes. They do this by base-pairing with messenger RNAs (mRNAs) through a specific region called the "seed sequence", a stretch of just 6-8 nucleotides. This tiny sequence gives microRNAs a remarkable ability - they can bind to many different mRNAs, giving it the power to regulate many genes at once. Multiple mechanisms of action have been described for microRNAs, some resulting in a repression of gene expression, while others leading to its promotion. According to most well-studied mechanism, microRNAs bind to mRNAs on their region termed the 3’ Untranslated Region, and indirectly represses the translation of that mRNA into protein by either sequestering it or inducing the degradation of the mRNA.

What makes this interesting is that certain microRNAs don't just target random genes, they seem to suppress groups of genes involved in the same biological pathway or process. This is akin to flipping one switch and dimming an entire section of a building rather than just a single lightbulb. Conceptually, it isn't very different from, e.g., transcription factors, which can also coordinate expression of entire gene networks, or even cytokine-based therapeutics. Therefore, there isn’t much novelty here in terms of the multi-target approach. Still however, exploring what potential lies in microRNA therapeutics is a valid endeavour, as part of diversifying our gene-regulatory therapeutic approaches. If we can identify a microRNA that naturally targets a set of genes contributing to a harmful process, say, inflammation in vascular disease, we might harness it to dampen that process at a large scale.
The following key aspects of microRNAs will be relevant to the discussion later on. MicroRNAs fold upon themselves to create hairpin-like structures, each containing two microRNAs that are subsequently loaded onto protein complexes after processing, at which point they're considered "mature" microRNAs. The protein complex responsible for loading these microRNAs is known as the RISC complex, which functions as the mRNA targeting mechanism. The binding affinity between a microRNA and its mRNA target is determined by multiple factors, including characteristics within the microRNA’s own sequence and the sequence of the target mRNA near the binding site.

I remember first learning about microRNAs in undergrad. They seemed like elegant molecular switches. Little did I know how much more complicated the reality would be once I started working with them.
The state of the field: controls
Scientists use microRNA over-expression experiments to study the biological functions of specific microRNAs and identify their target genes by observing resulting phenotypic changes and down-regulated gene expression. MicroRNAs are very often over-expressed using synthetic oligonucleotides that mimic the mature microRNA hairpin, often called "microRNA mimics."
If you want to test the effect of any treatment on something measurable, you need a negative control, something to ensure that observed changes occur solely due to the treatment. You also need to adequately control for exposure to reagents used for delivering the treatment. A well-designed microRNA over-expression experiment requires three essential controls (in addition to other positive and/or context dependent controls): treatment with the microRNA of interest (to test its effect), treatment with a microRNA negative control (an inert sequence presumed to have no targets, a “non-targeting negative control”), and a "mock" control (just the transfection reagent without any microRNA-type molecules).
What does a microRNA negative control actually control for? An optimal microRNA negative control should recapitulate the exposure to the transfection reagent used, and occupy RISCs to a degree comparable to that observed in samples transfected with microRNA mimics. It should achieve this without eliciting transcriptional changes through interaction with target mRNAs. These negative controls are typically derived from different species or use "scrambled" sequences, depending on the manufacturer. The "mock" control, of course simply recapitulates exposure to the transfection reagent in the absence of microRNA over-expression.
Using multiple different microRNA negative controls is not customary in the field, and I don't recall seeing this in any paper anywhere up to that point in time (without denying that it could exist). Horizon Discoveries (ex-Dharmacon) microRNA mimics and negative controls are very widely used in the field. Curiosity led to the purchase of two different microRNA negative controls from Horizon, the same company where I purchased microRNA mimics from, to check whether there is consistency in the effects that different microRNA controls exert to our system of interest.
And.. there wasn’t.
I still remember staring at those flow cytometry results where I stained with EdU for proliferation and with a LIVE/DEAD dye for viability with a sense of disappointment. The two supposedly "non-targeting" controls did not seem to affect viability, but were giving me very, very different rates of proliferation. I ran the experiment again, thinking I'd made some rookie mistake. Same result. The difference between the two controls was so dramatic that it completely changed the interpretation of my results. When compared to one control, the microRNA mimic showed almost no effect on proliferation, but when measured against the other, it appeared to substantially reduce proliferation rates.
It wasn't the discrepancy itself that was surprising. Assuming equal experimental conditions (such as dose, backbone structure, transfection protocol), the difference likely stemmed from differential off-target effects, which are unintended interactions with mRNAs that share partial sequence complementarity. Because microRNA negative controls mimic the structure of endogenous microRNA, they can potentially function as microRNAs themselves through binding via their seed region or through compensatory 3' pairing. This makes sense.
It wasn’t even the fact that I stumbled upon a problem that would probably take a very long time to troubleshoot. Such things were bound to happen as they are core to the job.
The real blackpill was that I had never seen this reported before, or at least never seen groups commonly practicing the use of multiple microRNA non-targeting negative controls. It reduced my confidence in a field that suddenly seemed increasingly opportunistic. Were the reported microRNA over-expression effects truly significant, or were they downstream of comparisons to negative controls that themselves affected gene expression? Would some researchers strategically select controls to achieve significance? The foundations of the field suddenly seemed unstable.
One night, after a particularly frustrating day in the lab, I found myself scrutinizing the figures and methods sections of even more published microRNA papers. Although admittedly I did not read every microRNA paper that has been published, I did read a bunch, and none mentioned testing multiple controls in experiments using a mimic-based strategy for microRNA overexpression. I wondered how many other researchers had discovered this issue but hadn't published it anywhere.
Now assume that we were able to create the perfect sequence to serve as a negative microRNA control. Imagine that this sequence is guaranteed to be non-targeting with our mandate of heaven. There is no off-target microRNA-mRNA binding. Does this mean that this is a good negative control for controlling microRNA over-expression experiments?
Nope!
It has been shown that nonspecific effects of commercially sourced microRNA mimic duplexes could also in part be attributed to the length of the duplex. This indicates that microRNA non-targeting controls and microRNA mimic duplexes should be length-matched to minimise discrepancies in non-specific effects between conditions. In simpler words, you'd want to length-match each microRNA you over-express to its own negative control.
I failed here, but I also learned
Here is where I failed. I didn't manage to figure out a way to properly control these microRNA over-expression experiments in the end. Was this due to a lack of time? Wrong prioritization? Me not having done the required thoughtwork and reading when I needed to? Most possibly a bit of everything, all of them little blunders contributing to the overarching failure. Ultimately, all you need to do is have a thesis and a concrete body of work to support it. Not to solve every single issue that appears. These would eventually create a never-ending chain.
But failure is a tremendous teacher. I learned about the fundamental disconnect between how science is presented versus how it's actually practiced.
I realised firsthand that the literature makes science sound cleaner than it is. A very large portion of the microRNA field has definitely been building on methodologically shaky ground. It became apparent that testing multiple negative controls is a must, and validating key findings through context-dependent complementary approaches should be a lot more common (more on that on Part III of this essay series, where I will talk about measuring microRNAs).
I also realised that cumulative damage of not properly controlling these experiments extends far beyond my own research. It ripples through the field, affecting everything from basic understanding of microRNA function to the development of potential therapeutics. This is precisely the type of knowledge that remains hidden in lab notebooks and after-work discussions with colleagues. No one publishes papers titled "Why Our Standard Controls Are Inadequate" or "How I Wasted Six Months Before Realizing This Approach Was Doomed." Academic incentives simply don't reward such honesty.
I learned to be skeptical not only of subtle microRNA effects, but also of the use of another type of small RNA called small-interfering RNA (siRNA). The microRNAs field is fairly niche, but siRNAs are extremely commonly used. They functions very similarly to the microRNA through loading into the RISC and down-regulating target mRNAs by binding onto them with its seed sequence. The reasons I described above that could lead to discrepancies between different microRNA negative controls could also apply to siRNAs,
And this brings me back to where I started. The most valuable knowledge I accumulated during my PhD didn't come from successfully published experiments, but from these failures – from discovering what doesn't work and why. If this piece inspires at least one other person to share their failures and the unpublishable knowledge that they acquired from it, it will have succeeded.
Many such cases! My PhD work was on statistical network learning methods, applied to learning gene regulatory networks. There's a vast literature on the subject, but basically the way each paper is structured is that it includes synthetic data analysis, then real data analysis. For synthetic data analysis, you generate a synthetic network, then a dataset from that network, and see whether your method applied to the dataset gives you back your original network. This shows that your method "works". Then, for real data analysis, you take some real dataset of gene expression (but unknown ground truth), run your method, and then talk about how your network makes sense and provides new insight. But I noticed that the synthetic datasets never look at all like the real datasets; for example, the real datasets have much more poorly conditioned covariance matrices. Then I noticed that if I modified the synthetic experiment setup so the resulting data looked like the real datasets, none of the methods worked well at all. And somehow, no published paper ever mentioned this.
Many people do not realize how much knowledge is gained from failure. We tend to think "trial and error", which has some validity, instead of learning what outcome comes from each test. I have not been in research (all related to soil dynamics) in decades, but I still remember the odd results we would get with our tests, how predictions from well seasoned scientists were far off from the results, and how sometimes what we learned was more of an art than what could be put into well-defined scientific models.
Soil interactions can be highly variable, even when conditions are deemed 'uniform'.
I am not an expert in biology by any means, but I can imagine how tiny changes (controls) in the environment (including nutrition) can effect the responses from a being's genetics.