part 1 - Frederic G. Hoppin, Jr. "How I Review an Original Scientific Article" BY Hoppin FG AM J RESP CRIT CARE 166 (8): 1019-1023 OCT 15 2002

Mon Apr 28 15:23:42 EDT 2003

To: Readers of SIG-Metrics List

This paper has been broken up into parts 1 and 2 as it was too long to
include in a single message to SIG-Metrics List members.

Part 1

Frederic G. Hoppin, Jr. "How I Review an Original Scientific Article" BY
Hoppin FG
AM J RESP CRIT CARE 166 (8): 1019-1023 OCT 15 2002

Frederic G. Hoppin
Departments of Medicine, Memorial Hospital of Rhode Island and Brown
University, Pawtucket, Rhode Island

Correspondence: Correspondence and requests for reprints should be addressed
to Frederic Hoppin, M.D., Division of Pulmonary and Critical Care Medicine,
Memorial Hospital of Rhode Island, 111 Brewster Street, Pawtucket, RI 02860.
E-mail: Frederic_Hoppin_Jr at Brown.edu
________________________________________________________________

Occasional Essay

How I Review an Original Scientific Article

Frederic G. Hoppin, Jr.
Departments of Medicine, Memorial Hospital of Rhode Island and Brown
University, Pawtucket, Rhode Island

Correspondence: Correspondence and requests for reprints should be addressed
to Frederic Hoppin, M.D., Division of Pulmonary and Critical Care Medicine,
Memorial Hospital of Rhode Island, 111 Brewster Street, Pawtucket, RI 02860.
E-mail: Frederic_Hoppin_Jr at Brown.edu

WHY WRITE THIS?

There has been substantial recent interest in the quality of the peer review
system in biomedical publication, with several International Congresses (1)
and a recent JAMA issue entirely devoted to the topic (2). The quality of
reviews of articles submitted for publication varies widely (1, 3-5). Black
and colleagues have suggested that their quality might be improved if
journals trained their reviewers (6). How do we currently learn the trade?
Some of us have learned by doing reviews, by fielding reviews of our own
submissions, and by comparing our own reviews with other reviews of the same
articles. When editorial consideration of a submission is completed, the
editorial offices generally forward all correspondence to the referees. I
always look at this correspondence because it often reveals new insights and
provides useful feedback on my review. Some, more fortunate, started by
drafting reviews for their seniors (with confidentiality strictly protected)
and then engaging in an intense tutorial over the science, the presentation,
and the review itself. Lock's comprehensive and scholarly review of
editorial peer review contains a very useful set of guidelines (4), and
there are other relevant publications (3, 5-10). But, to my knowledge, no
how-to-review paper has been published. I hope that some of the lessons I
have learned over the years as reviewer and onetime Associate Editor and the
practices that I follow might be helpful to the novice and might provide
affirmation and perhaps a pointer or two for the experienced reviewer.

WHAT DOES IT TAKE TO DO A GOOD REVIEW?

Motivation
Good reviewers, in my experience, have a resolute sense of responsibility to
their colleagues and a strong conviction that the archival literature, with
high standards set by peer review, is critically important to the progress
of science (6, 8). The best reviewers also appreciate the opportunity for
teaching and find reviewing a good paper as informative and exhilarating as
participating in an inspiring work-in-progress research seminar. The quality
of their reviews, furthermore, is importantly contagious.

Scientific Expertise
The challenge to the reviewer is to see what the authors themselves have not
seen. This is a daunting task. It requires scientific expertise of two main
sorts, (1) awareness of the literature, i.e., being right up to date, and,
more often a problem in my experience, knowing the old stuff and (2) mastery
of the relevant science, i.e., being able to apply and relate scientific
principles and findings to the new science.

Several different areas of expertise may be relevant for a given submission.
A paper that is sent to me, for example, may include elements of clinical
and applied science, general pulmonary physiology, basic lung and chest wall
mechanics, mathematical modeling, or stereology. Although my expertise is
uneven among these topics and a submission often requires significant
expertise in disciplines that I cannot cover responsibly, the Associate
Editor usually turns out to have selected reviewers to cover all main areas.

Helpful Attitude
Many reviews are not very helpful. Why not? A good review takes substantial
intellectual effort and time and is not immediately credited by the
reviewer's academic institution or peers (11). Indeed, authors' satisfaction
appears to be associated with acceptance for publication, not with the
quality of the review, at least for submissions to general medical journals
(12). Dissatisfied authors can see reviewers as being picky, hasty,
arbitrary, dogmatic, dismissive, superficial, wrong, judgmental, arrogant,
unfair, jealous, or self-serving. Such perceptions are quite predictable,
given the high stakes for the authors and the status of power and anonymity
of the reviewers. Occasionally such accusations are valid at some level.

Yet, an insightful and articulate review can substantially improve the
science and clarity of a submitted paper (8) and can advance the authors'
knowledge and ability to conduct and report science. The reviewer can be
fully as helpful as an involved laboratory colleague or a visiting
professor.

My approach is to be resolutely respectful. This does not mean watering down
the review; downplaying a concern; failing to demand justification,
explanation, and clarity; or avoiding a clear recommendation. It does mean
(even late at night, after a busy day, with a marginal manuscript) reading
with patience, objectivity, and openness to new ideas and approaches, and
reporting with complete clarity and without summarily closing off debate. It
also means being careful not to give rein to my competitive instincts.

Time
I frequently miss important insights on my first reading and then often have
to ruminate before I have a problem in full and articulate perspective. The
time required varies widely. Complex or novel techniques, methods, or
analyses require much more time than standard ones. Significant deficiencies
of presentation cloud and disadvantage a discouraging number of otherwise
scientifically meritorious submissions (13), burdening the reviewer with
figuring out exactly what has been done, what has been concluded, how the
authors reached their conclusions, and what is missing. It has been asserted
that the quality of review increases with the time expended up to but not
beyond 3 hours (6), but for many of the papers that come to my desk, 3 hours
would not suffice for a careful and helpful review. This experience is
confirmed by many colleagues and is abundantly clear in the content and care
that I see in others' reviews. A complex, potentially important paper can
certainly take a full working day (9).

Senior reviewers, surprisingly, are reported to do a worse job than their
juniors (7). That the seniors also spend less time (7) may be the
explanation for the lesser quality! Another possibility is that the seniors
are more ready to cut a review short when they determine that a paper has
clear, serious, irreparable scientific deficiencies and believe there is no
need to detail all deficiencies in a sloppily written paper. Nonetheless,
reviewers should be warned that "time is of the essence," in this setting
means "spend it, don't hurry it, even if you are senior."

How an academician can find the time is a critical issue. Although there are
certainly many important intangible benefits to reviewing (e.g., broadening
one's scientific knowledge, enjoying the scientific interchange and debate,
fulfilling a sense of responsibility), the tangible benefits are limited to
the possibility that gaining the respect of the editors might lead to
invitations to participate in national societies, an editorial board, or a
study section. Furthermore, the job competes with activities that have
immediate rewards or accountability, e.g., teaching, preparing grant
applications, performing research, seeing patients. It would be very helpful
if the academy could be structured to reward this activity more directly
(11).

HOW DO I PROCEED?

Acceptance
I accept an invitation to review an article if the topic is of interest to
me, if it is within my expertise, and if I can commit the time. I consult
with the Associate Editor before accepting if it turns out that I have
already seen the article in a presubmission review or in review for another
journal. I always obtain a relevant "in press" article or a "companion"
paper that is currently under review by others from the editorial office
before I start my review.

First Reading
I spend some time with the abstract to set myself up for the review, i.e.,
to decide what to look for in the experimental design, methods, results, and
bases for conclusions, and particularly to note what the authors think is
important in their work. I also take a moment, before being seduced by the
paper itself and distracted by its details, to pose a few broad questions,
for example, "Essentially a methods paper?" or "What's new here compared
with their earlier papers?" I list these preliminary questions on the front
page and usually add to, strike out, or revise that list as I work through
the text.

I then read the article closely, focusing primarily on understanding the
science. I stop wherever I do not fully understand the science from what is
written, where some aspect of the science is troubling, or where I believe
the authors may have failed to put their work into fair and full
perspective. I attempt to characterize each such problem in a preliminary
fashion. I do not look for specific errors, as from a checklist. The process
goes in the other direction, e.g., a question about the science occurs to
me, the answer does not, and my task then is to identify the specific error.
This last task is not always easy or immediate. I may have to check the
literature, consult a colleague, or do some hard thinking. Errors of
presentation may be more readily identified than are errors of the science,
but it is often unclear whether a problem arises from fuzzy presentation,
fuzzy thinking, or both (8).

What sorts of problems do I encounter? I will give some categories,
descriptions, and a few examples below. My intent, again, is not to provide
a checklist but rather to help the reader to characterize the problems he or
she encounters.

Problems with the science.
Many problems require careful analysis but in the end turn out to be
violations of logic or of common sense (e.g., contradiction, unwarranted
conclusion or attribution of causation, inappropriate extrapolation,
circular reasoning, pursuit of a trivial question) rather than violations of
abstruse principles. Two brief examples have to do with applications of
statistics. (1) More than once, I have seen a standard error that was
impressively and misleadingly narrow only because the authors had used a
large "n" of samples instead of the small "n" of animals from which the
samples were obtained. (2) More than once, I have seen a claim that
Treatment A differed from Treatment B, not because of a direct comparison
between the effects of the two treatments but because the effect of
Treatment A was statistically significant, whereas the effect of Treatment B
was not. Both examples are violations of common sense that became apparent
through close reading and not by direct recall of the relevant rules from
"Statistics 101."

Many problems arise from failure to apply available, specific knowledge. The
authors have not applied relevant basic scientific principles, have not
considered a likely methodological uncertainty, have failed to recognize a
confounding factor, have not considered the appropriate statistical power
(14). For example, I have seen more than one study in which the authors
reported measurements that depended on chest wall configuration made at
total lung capacity, without specifying whether total lung capacity had been
maintained actively with an open glottis or passively with relaxation
against a closed glottis. I had to presume, until I heard otherwise, that
the authors, unaware of the substantial difference of configuration or of
its implications, had failed to control a potentially confounding variable.

Problems with the ethics.
I have not yet identified fraud in a study; inconsistent results have always
appeared to have more pedestrian explanations. And I have not uncovered
inappropriate treatment of human or animal subjects. Approval by an
Institutional Review Board does not absolve the reviewer. For this reason,
for example, I have often asked that authors specify their protocol for
ensuring that paralysis of an animal does not mask a lightening of the level
of anesthesia.

Problems with the presentation.
Often I can guess but am not sure of the author's exact intent. Helpfulness
requires that I identify the problem with the authors' presentation, and
this requires that I know how to write. There are very readable,
comprehensive texts on this subject (9, 10). A brief survey here of the
kinds of problems that I encounter may provide a useful frame of reference
for the reviewer.
Redundancies, irrelevancies, and unnecessary excursions are relatively minor
sins but may impair communication by boring and distracting the reader.
Failures to define terms or to use words with precision are more serious
because they can mislead. Noncolloquialisms, common when English is not the
authors' native tongue, distract and can mislead. Jargon (by which I mean
any nonstandard word, idea, or even argument that has become so familiar to
the authors that they neglect to explain and delimit it) is a prevalent and
insidious problem. At its most benign, jargon annoys and fudges. This is
why, for example, nonstandard abbreviations are strictly limited by many
journals. Worse, jargon can mislead. For example, the phrase "inflection
point" is used in a number of current clinical ventilator studies to
designate the distinct upward deflection or "knee" on the inflation limb of
a lung pressure-volume curve. However, to the vast majority of scientists
and all lexicographers (so far), the term designates a very different point
on the curve, namely where it changes from concave to convex or vice versa.
Jargon can be as subtle as the use within a given article of two closely
related terms or phrases that the authors may or may not intend to be
exactly equivalent, for example, "pulmonary function" and "pulmonary
function tests." Very common terms can become jargon when they are not
carefully defined for the purposes of the paper; for example, functional
residual capacity can differ substantially depending on which of several
acceptable definitions is applied: (1) a mechanistic definition-the lung
volume where the sum of static lung and passive chest wall recoils is zero;
(2) a functional definition-the lung volume at the end of a relaxed,
prolonged expiration; or (3) another functional definition-the lung volume
at the end of a series of ongoing expirations under any one of a variety of
specified scenarios.

I am often confused by quite pedestrian errors. For example, without commas
to set it off, a dependent phrase may run on to the rest of the sentence,
and the reader is interrupted while searching for a contextual clue to the
syntax. This is particularly problematic in scientific writing, as it tends
to contain long series of nouns, e.g., "... hospital outpatient weight
control program standards ..." As another example, compare the statements
"It was concluded that ..." and "Our data, however, show that ..."-the
identity of the authors who reached the conclusion (the distinction may be
important) is clear in the active voice but equivocal in the passive voice.
Even spelling mistakes may not be benign-a computer spellchecker will never
reject an "ever" that should have been a "never," and a technical editor may
not follow the science well enough to catch the error.

Many articles are poorly focused. The thrust of a paragraph, for example,
should be clear at the beginning, e.g., a "topic sentence." Another example,
I often see a set of data strung out in the text of the RESULTS section in a
serial recitation of means, standard deviations, and "n." If the message to
be drawn from the data resides in comparisons within the set, this practice
burdens the text, is less accessible than a figure or table where the reader
can readily make the requisite comparisons, and commonly displaces an
explicit statement of what the authors want the reader to see. How much more
focused, concise, and informative it is to say simply, "[Variable A]
increases linearly with [Variable B] (see Figure 3, and Table 2)."

It is astonishing how often authors fail to develop their ideas
systematically, i.e., to lead the reader through their thinking. For
example, the reader needs to know the basis for the experimental design at
the outset. Yet I often see an idea that is important to the experimental
design postponed (perhaps in a misguided effort to avoid redundancy) until
the DISCUSSION, where it is fully developed. Both purposes can be readily
accomplished by identifying the idea in the INTRODUCTION, together with an
appropriate road sign, e.g., "as is developed in more detail in the
DISCUSSION ..." Even worse, an astonishing number of submissions fails to be
explicit about the logical structure of the study, for example by failing to
specify goals, hypotheses, testable predictions of the hypotheses, and
conclusions, perhaps under the illusion that the logical structure of the
study is so obvious as to "go without saying."

I do not keep a checklist of what must appear in a paper. Instead, I keep
asking the general question, "What is missing?" Some examples: Have the
authors acknowledged other reasonable hypotheses? For a given argument, have
they specified, examined, and assessed the impact of all reasonable
assumptions? Have they considered methodological limitations? Does the
DISCUSSION address all discrepancies or agreements between their results and
those of other workers? This is almost an attitude on my part.

Gross errors, such as percentages that do not add up to 100, have often
slipped by reviewers into the archival literature. I do look quickly at
every datum (in a viable paper), but I cannot take the time to check
calculations unless something looks way out of line. I regularly find
significantly misleading or inaccurate statements about specific
citations-sometimes I knew the citations, sometimes I checked out the
citation because the attribution seemed odd or was particularly critical to
the science being presented.

Notations.
During this first reading, I make notations on the text, in the margins, or
on the backs of the opposite pages. These include broad and narrow,
substantive and trivial issues, citations I want to check, and individuals I
want to run something by. I pose questions even when I suspect that they may
be resolved later in the paper. I have learned to include enough detail in
these notations to successfully jog my memory. For example, a recent
notation reads, " (4) (control) for ? comparable V/P protocol," meaning that
I wanted to check Reference (4), which in effect supplied the control data
for the current study, the authors having failed to specify the exact
differences, if any, between the two studies, perhaps unaware that
differences in the volume-pressure protocol could be a major problem.

Finally, I return to the front page to list the main issues. This is an
ordered list, informed by the broad questions that I have already listed on
the front page, by the more substantive notations throughout the text, and
by the abstract, which I take as representing what the authors think is
important. I then put the manuscript aside for a day or so because important
insights and perspectives often occur to me while I am doing something else
and because returning to it enforces an initial "view from 40,000 ft."

Second Reading
On returning to the article, I review my front-page lists, my notations, and
relevant parts of the text. I then proceed to make judgments. Although I am
naturally uncomfortable with judgments, I know that they must be made and
that I have the requisite scientific background and experience in certain
areas. I describe some criteria below, not as a checklist, but as
illustrations and a framework for understanding and evaluating the various
problems that I encounter.
Criteria for judging the science.

It may be years before it becomes clear whether or not the conclusions of an
article are correct. Forecasting is risky, and if what I now suspect is
probably wrong turns out later to be right, it is important that it be
published now! Instead, I judge the integrity of the science, particularly
the quality of its reasoning and of its application of scientific principles
and knowledge.

I would also like to know if the article is important. Sometimes an article
appears to provide a convincing answer to a question of current interest.
The absence of such a connection, however, does not preclude ultimate
importance. The main reason is the prevalence of serendipity in scientific
progress. This was elegantly demonstrated by Comroe and Dripps (15). They
selected the 10 most important clinical advances in cardiovascular-pulmonary
medicine and surgery over the preceding 30 years. They identified and then
examined 529 articles that had important effects on the direction of
subsequent research and development, which in turn proved to be important
for 1 of these 10 clinical advances. An astonishing 41% of the articles
reported work "that, at the time it was done, had no relation whatever to
the disease that it later helped to prevent, diagnose, treat, or alleviate."
So I look instead for novelty of idea, conclusion, data, or methodology.
These criteria are relatively easy to apply. An article that is both new and
has scientific integrity has a shot at turning out to be important.

I avoid making a judgment on the basis of a particular study being applied
as versus basic. Applied studies may have the appeal of practical relevance,
and basic studies the appeal of broad relevance, but landmark studies have
been published over the full spectrum from applied to basic.

I do not consider politics or the reputation and academic status of the
authors. The referees' anonymity, incidentally, can help insulate the
Associate Editor in that regard. Hesitation to challenge weakness in
articles submitted by well-respected scientists and friends would serve them
and the journal poorly.

Criteria for judging the presentation.
I do not shy from identifying lack of clarity, precision, or completeness. I
simply assume that if I have difficulty after careful reading so will many
other readers. I avoid, however, judging a presentation on the basis of
style per se-although I might have made quite different choices, I am not
the author.

Recommendations.
My recommendation to the Associate Editor reflects (1) what I envision as
the ultimate outcome, i.e., acceptance or rejection, and (2) any steps that
I believe have to be taken before that decision is made. I have no simple
scale for weighing the merits of an article, but I can go through several
illustrative examples.

What do I recommend when an article formulates a relatively compelling
question, or puts forward an intriguing idea, but the science is weak? I
convey to the authors what might improve the science, and I describe the
pros and cons to the Associate Editor, who then has a difficult decision to
make. A somewhat mischievous perspective on this issue is given by Julius
Comroe, in one article of his delightful "Retrospectroscope" series in this
journal. He pointed out how briefly, informally, and even incidentally a
number of the truly great advances in science were first introduced (16). In
one example, he quoted the 267 words in which Korotkoff described and
explained the basis for the now ubiquitous clinical method of determining
blood pressure. Comroe concluded with the following mischievous fantasy:
"Dear Dr. Korotkoff:
Thank you for permitting us to read your interesting manuscript. We regret
that we cannot publish it in its present form. You may wish to resubmit it
after you have (1) compared data obtained by your method with that obtained
for different arm circumferences, (2) verified the accuracy of your method
against direct measurements of systolic and diastolic arterial blood
pressure in animals, and by the Riva-Rocci method in a large number of
subjects of different ages and (3) done statistical analysis of the data.
Sincerely, The Editors"
I doubt that I would have had the foresight to recommend publication.

What do I recommend when the article offers only a minor advance? Authors
seem to be in more of a hurry to publish than in years past, perhaps due to
(1) a larger cadre of competing investigators, (2) awareness that promotions
committees are better at counting papers than they are at evaluating them,
and (3) the felt need to establish a track record for funding. I often find
it helpful to look at other papers on the topic from the same laboratory,
which may show that the submission contributes to an orderly and productive
evolution of ideas or turn up a pattern of repetitive "churning" of data and
ideas. If this inquiry does not clarify, I recommend asking the authors to
specify and defend exactly what is new in their submission; the burden,
really, is on them.

How does the adequacy of the presentation bear on my recommendation? Mostly
as an absolute threshold, namely that the reader must be able to make an
independent judgment about the strengths and weaknesses of the authors' data
and conclusions from what is presented (8).

How to balance high standards against the purpose of the archival
literature, which is to enable scientists to communicate? How to avoid being
a curmudgeon on the one hand and a soft touch on the other? The Associate
Editor brings his or her own calibration into evaluation of my review.
Nonetheless, I keep an eye on the severity and content of the other
reviewers' comments on the same articles and keep in mind that more than 70%
of submissions to AJRCCM, for example, are not accepted.

Often my recommendation reflects suspended judgment, pending a response from
the authors. I am particularly careful to give them the opportunity to
respond in the case of a potential fatal flaw; sometimes they can readily
clear up the issue, sometimes not. Once and only once, after prolonged
rumination, I concluded that the central reasoning in an article was
circular. This was put to the authors, who responded, "You are right.
Thanks. We withdraw the paper."

Continued ... Part #2