Follow the reluctant adventures in the life of a Welsh astrophysicist sent around the world for some reason, wherein I photograph potatoes and destroy galaxies in the name of science. And don't forget about my website, www.rhysy.net



Friday, 3 March 2017

This Is Not The Crisis You're Looking For


There's no such thing as perfect research. Consequently there's no perfect way to review research either. Yet there seems to be no shortage of "peer review scandal" articles which, taken out of context, can give the erroneous impression that we're in the middle of some sort of crisis. Well, we are, but it's not a crisis of the peer review method - it's a crisis of funding.

In an ideal world all research would get funding. It's incredibly difficult to determine which is the best research to pursue, because even if you don't care about knowledge for knowledge's sake, you can't be certain which research will generate the coolest spin-offs. Alas, for the moment we live in a world of finite resources and so someone does have to choose what to fund and what not to fund.

The funding crisis in research manifests itself in some big, obvious ways. Unique, world-class facilities are being threatened with closure, massive cuts, or being forced to privatise. The world's largest steerable telescope, the GBT, is now relying on funding to search for aliens. Aliens ! For goodness bloody sake. I'm not saying the search for aliens isn't worthwhile, but it's an all-or-nothing deal : you either find 'em, or you don't. Past methods have been based on piggybacking the alien hunting on regular science projects, which was fine, but prioritising the search for aliens (which will take decades with no guarantee or success) over standard projects (which are 100% certain to detect something and increase knowledge, even if it wasn't what was expected) is depressing.

This is a consequence of poor marketing : exciting but very unlikely breakthroughs are hugely emphasised to the detriment of more ordinary but equally valuable research, which is slow, careful, and while it may be interesting to a degree it can hardly be accused of being exciting. It's also a result of the long-enduring myth of the lone genius, which is subtly but fundamentally wrong. Yes, geniuses do make revolutionary breakthroughs from time to time... but no, they pretty much never do so entirely on their own without any influence from their peers and predecessors, i.e. devoid of any contact from the legions of ordinary, careful, methodical, "slow" researchers.

But I digress, as usual. The GBT, the ongoing saga of Arecibo and the cancellation of several fully armed and operational space telescopes are the most obvious examples of the funding shortage. Debacles in peer-reviewed publications are, I believe, a more subtle effect.

Science is a creative process, but it doesn't exist outside of real-world concerns like where the next pay cheque is coming from - or indeed where the next job is coming from. Even with the best will in the world, scientists have to eat, so they can't escape the pressures forced on them by funding. While we can and must campaign to increase funding, we also have to accept that dramatic changes are unlikely on a short timescale. So how can we ensure that good science gets done in this overly-competitive, depressingly business-like environment ?


The problem is not too many cooks, but a lot of spoiled broth

Scientists are a lot like chefs. They spend most of their time in their own kitchens (research institutes) producing delicious meals (research papers) but do sometimes go along to each other's restaurants to sample their rival's cuisine (review their papers).
With fierce competition for jobs, it's inevitable that employers resort to using very simple methods to assess candidates : largely publication rate, hence the "publish or perish" guideline. Hence the obvious tactic : publish lots of mediocre papers. This is not the way it should be, of course, because while you can't know for sure which research is valuable, you can certainly make some very good guesses. You can also know when mistakes have been made, and some publications are full of crap; not everything falls in the grey area of controversial research. That is, after all, why we have peer review in the first place.

Ideally employers should actively examine the publications of their prospective candidates, but that's not possible when there are a hundred people applying for one position and they each have ten publications. One other method they use is to look at citation rate - how often each paper has been cited by other researchers. But that's of limited benefit as well, because high-quality research can slip through the cracks while provocative bullshit often generates a furore, with most of the citations dismissing the research rather than praising it.

The problem is that the metric of publication rate is too simple. It makes sense to consider this at some level - do you want a junior researcher or someone with more experience ? But a publication is a publication. There's no way to judge by glancing at a C.V. whether that research was really top-grade or just plain mediocre. But what if there was ?


Let us know what we're eating !


Currently, scientists have a choice of either publishing in a regular journal, or Nature or Science. That's largely the extent of the differentiation between the journals : really prestigious, or normal. The system is very easy to game - you just do some minor, incremental research and publish it. It's not necessarily even bad research, it's just not particularly interesting but it boosts the publication rate just as much as a more careful or thought-provoking piece would. My proposed solution is that we reform the publication system in quite a simple way that would make the system harder to abuse.

We need these small, incremental papers, but we shouldn't pretend that all papers are created equal. Nor are all referee reports are of equal rigour. We shouldn't try to suppress the weaker papers any more than we should insist on absurdly high levels of careful reviewing, because reviewers, being human, are subject to their own biases and we don't want to risk chucking out a good idea because one individual got up on the wrong side of bed. What I'm proposing is that we try and label the papers as a guideline by which employers can quickly assess performance based on something more than sheer number of publications.

I say "label" rather than "grade" because this can be a complex non-linear system. It might, for instance, be useful to label papers according to content. Some papers consist of nothing more than an idea and simple calculations. This is undoubtedly useful to report to the community but it hardly compares to a comprehensive review or a series of relativistic magnetohydrodynamic simulations combined with years of observations. Other papers consist entirely of catalogues of observational data while others are purely numerical simulations. Which ones are the most valuable ? For science, all of them ! But for an employer, it depends what sort of person they're looking to hire. A wise employer will seek to have a diverse range of skill sets, from the uber-specialists to those with more broad-ranging experience.

What this would change is that your employer would no longer see from your C.V. that you have twenty papers and instantly think, "Wow !". They'd see at a glance that you have, say, fifteen simple idea-based papers, three observational papers and two based on simulations. They'd know not just how much research you were doing, but what sort of research it was. You'd still communicate your findings to the community - maybe you'd even publish more under this system - but any half-witted employer would see than a candidate with three papers all combining detailed observations and simulations was better than one with thirty incremental results.

When submitting a paper, the authors could suggest a possible label but it would be up to the referee and the editor to decide if they'd accept it or not. This wouldn't stop so-called "salami publishing", where people publish endless variations on a theme or re-analyses that didn't find anything new with no novel techniques or methods used, but it would make them easier to identify. This might also cut down on the sheer volume of information we have to read. If authors are allowed and indeed encouraged to recognise the non-importance interesting but not Earth-shattering nature of their results they will a) reduce the levels of publications of the worst, most pointless "findings" of all and b) reduce the size of incremental papers. A lot of full-length papers under the current system would become more like letters - very short reports that get right to the point and don't have to re-iterate the methods that were explained in detail in some existing publication.

We'd have to call them something other than, "short, incremental, not very interesting articles" though. "Essays", maybe. I'll get back to the mechanics of how such a labelling system could be applied later on.


And let us know how well it was prepared, too


Papers could be labelled not only by content but also review rigour - not to be confused with research quality, because that's not the same thing. Indeed this might be necessary under the new system. If more complex papers are to be seen as more valuable, they'll need more careful review. All levels of peer review are going to need some basic guidelines, which will require some thinking about what we want journal-based peer review to actually mean. Currently, reviewers are given a free hand to request whatever changes they like.

For instance, the lowest level of review (for an "essay" paper, maybe) might be a single referee doing a check to make sure there are no internal inconsistencies, known problems with the methodology, factual errors etc. It would still be pretty rigorous, but the referee wouldn't be expected to check every calculation, numerical value, or citation. For the highest level, there might be three or more independent reviewers checking everything with a fine-toothed comb, and they'd be expected to check everything. But even this should not mean - unlike currently - that they get to dispute interpretative statements (i.e. "in our opinion...") unless those statements were in flat contradiction to the facts. Some standard principles need to be clearly spelled out if referee quality is going to be even remotely homogeneous.

One of the other dangers of an overly-competitive environment is that legitimate skepticism ends up becoming hostility. When your career depends very strongly on your publications, not only are you going to be reluctant to point out your own mistakes but you'll also have a vested interested in finding those of others. This is healthy in that it encourages reviewers to spot flaws and think deeply about the issues, but refereeing is not supposed to be the same as debunking. There ought to be a certain joy in considering new ideas, especially ones that contradict your earlier ones.

I'm not sure there's a review system which can incentivise this (more funding is really the best answer here), but certainly good oversight by the editor can prevent papers being shot down in flames because the referee was grumpy. One thing that ought to be a requirement at any level of review is that if you attack an idea, you should state explicitly what changes you want : do you insist that the authors remove an idea completely or merely want more emphasis on the uncertainties ? You should propose an alternative, not just say, "this is wrong".


Put the kitchen on display but allow the chefs to wear masks

Gosh, this is a useful analogy, isn't it ?
For all this to work, the review process would have to be completely transparent so that everyone could check if the reviewers were adhering to the rules - currently the author-referee exchanges are only seen by the journal editors. Admittedly, exactly what fraction of the early versions of the paper should be made public is a complicated detail, but having the author-referee exchanges public would give strong accountability to the system and make sure everyone was doing what they were supposed to be doing.

Reviewer identity is currently kept secret (to the author but not the editor) unless they choose to reveal themselves, and this needs to remain status quo. Anonymity helps protect reviewers if they worry that their support for a theory widely perceived as wrong would be detrimental to their own career, or if they have to work with the author in the future, or if they're simply a much more junior researcher and afraid to publicly criticise a more senior author. It also reduces the chance that the author and reviewer will be in cahoots and stop the author tempering their responses to appeal to the specific reviewer. Anonymity, rather than transparency, is in this case more encouraging to describing the facts rather than pandering to the reviewer's ego.

There's another important aspect to the visible-kitchen analogy that works quite well too : it helps researchers understand the exact method of their peers. And not only the process, but the results too. For instance, many simulation papers do not describe their precise initial conditions (even the most basic details like number of particles are sometimes missing), and if they do they're sometimes hidden in the main text - not in a clear, obvious table. And they still only show selected snapshots, not usually complete movies - if a picture paints a thousand words but uses a thousand times more memory, then a movie paints a million words and uses a million times more memory.

But nowadays we've got a million times more memory : showing the movies in the online material should be considered essential, not a nice bonus. I want to be able to interpret for myself what's happening, not rely exclusively on the author's interpretation. Rather than preparing a meal, it's more like trying to follow origami instructions : possible via book but far easier via video.

In as much as is possible, the precise conditions and process needed to replicate the result should be described. That's not always possible since raw data can be very large indeed, but in many cases this should now be the exception rather than the rule. The "essay" style papers I've described also wouldn't be expected to go in to such depth, but they should reference "recipe book" papers where the method was described in detail, and their really precise setups could always be included in the online material. So can raw observational data (where feasible) and analysis code, for that matter.


Everyone should try your signature dish


So far we've improved the way a C.V. can be used to see at a glance just what it is a researcher does and how thoroughly tested their investigations have been. We've also cut down on the amount that other researchers have to read without preventing anyone from publishing anything, and made the review process transparent so everyone can see if it was really up to the mark. We've essentially given everyone a menu : you can see at a glance exactly what sort of research a person does and to what general standard. Just like food, you won't know for sure until you try it for yourself - but this is obviously much better than if you don't have any kind of menu at all.

What we haven't quite addressed is replication : can another researcher, following your recipe, exactly reproduce your findings ? Now, the "essay" style papers should not necessarily contain every single step of the process, but full articles should. The problem is that you've got to encourage reproduction or it won't happen : people will just keep eating the same meal from the same restaurant. After all, it's a time consuming procedure, with a high potential just to confirm the previous findings and not learn anything new. And if you don't confirm the findings, there's the more political concern that you might embarrass the original authors - potentially alienating a future collaborator.

One way to offset this would be to award replication studies an extra level of prestige : insist that these studies be subject to the highest level of review possible. Getting such a paper accepted would be a real challenge and recognised as such. So there would be a motivational balance of glory on the one hand, difficulty and low likelihood of new discoveries on the other. A successful replication study could also have a transformative effect on the original paper, changing it from a merely interesting result to one that deserves strong attention. That in turn encourages everyone to publish research which can be replicated, because a replication study which failed would be a pretty damning indictment. Couple that with the labelling system that identifies the originality and nature of the research and that would be, I think, a pretty powerful reform of the system.


Summary



So that's my proposed solution to the so-called crisis in peer review and replication studies. In order to stop people publishing mediocre papers, force papers to be labelled as such. That's a slightly cruel and cynical way to put it. A nicer and actually more accurate description is that we need to be able to disseminate as many ideas as possible to our colleagues and papers are still the best way to do that, but we have to recognise that not all papers either have or need the same level of effort to produce. Consequently they don't require the same level of review rigour either.

It's hard to predict exactly what the effects of this be. I lean toward thinking that it would actually increase publication rate rather than lessening it : everyone has lots of ideas but we don't often get to discuss them with the wider community as often as we might like. That sort of discussion isn't appropriate to the current publication system, which is heavily geared towards much more in-depth articles. Those types of papers would probably decrease in number, since now no-one would feel the need to make a mountain out of a molehill or slice the salami too thin. So ideas wouldn't be stifled or suppressed, but we'd have less tedious, long-winded papers that were only ever written to bolster publication rates to get through.

How exactly would a paper be labelled ? There are several different methods. One would be to have more and more specialist journals, which is somewhat happening at the moment. For example the Journal of Negative Results specialises in - you've guessed it - results that were unexpectedly negative. Or Astronomy & Computing, specialising in different computational techniques and code in astrophysics rather than new results about how the Universe works. There are also occasional special editions of regular journals focusing on specific topics.

But perhaps new journals are overkill. Most regular journals already have a main journal plus a "letters" section which publishes much shorter, timely articles. Why not extend this further ? Instead of just MNRAS Letters, also have MNRAS Observational Catalogues, MNRAS Numerical Simulations, MNRAS Serendipitous Discoveries, MNRAS Data Mining, MNRAS Clickbait, MNRAS Essays, MNRAS Breakthroughs, MNRAS Replication Studies, MNRAS Things I Just Thought Up Off The Top Of My Head While I Was On The Toilet, MNRAS Things Some Bloke In A Pub Said Last Tuesday, etc.

Literally the only change this needs is a label in the bibliographic code of the article. Similarly, recognising that we're now in an almost purely online world, some sort of code could be applied to articles on their replication status : not applicable, attempted but failed, successfully reproduced etc (just as ADS lets you see who cites any given article and other metadata). Then everyone knows, instantly, that the research is or is not obviously wrong - but only provided that the review rigour on the replication studies be of the highest level possible. Otherwise you just get huge numbers of people publishing crap replication studies that don't mean anything.

The hard part would be to agree some broad common standards, which would likely only be possible after practise. Nature retains its pre-eminence as much by reputation as the actual quality of its publications (some would say more so). If other journals started divisions which only published potentially major discoveries with a high level of reviewer scrutiny, the pre-eminence of one journal over another would be broken... but only so long as that was actually recognised by researchers. Hence the need for common standards, reviewer guidelines and a more transparent review process. If you can see that the Bulgarian Journal of Bovine Psychology is publishing results as important and as rigorous as Nature, who would still favour one over the other ?

This is not the only possible approach. A more radical idea is that we largely abandon papers and move deeper into the purely online realms of blogs, forums and social media, making science communication far more fluid. I do not like this idea. A paper presents a permanent, standalone record of the state of the art at some time given the available evidence - it can be checked years later relatively easily. A better approach would be for the existing major journals and arXiv to run forums, e.g. each paper automatically starts a discussion thread. This would be a vastly superior way to find out what the community thinks of each paper than waiting for citations from other researchers, which usually takes months and is often limited to a single unhelpful line, "as shown by Smith et al. (2015)".

Of course, what labelling papers won't necessarily do is make the damn things any easier to read. But that's another story.

No comments:

Post a Comment

Due to a small but consistent influx of spam, comments will now be checked before publishing. Only egregious spam/illegal/racist crap will be disapproved, everything else will be published.