Re: [AUDITORY] Registered reports

I want to highly praise Eric’s observation about psychophysics research that has been conducted without a “replication crisis” since the mid 19^th century, and indeed Smith and Little’s paper is an excellent one. I also like Chuck Watsons description of a lot of psychoacoustics as research investigating what listeners can do as opposed to what they do do (e.g., in Robinson and Watson in J.V. Tobias’s Foundation of Modern Auditory Theory, 1973). In addition to the points Eric made about small “n” research with within-subject designs, I want to add that in a large majority of these studies the individual listener data are shown, which makes inferential statistics almost pointless (i.e., you will not find any such statistics in the psychoacoustics field from the 1840’s until about 1980, and I think some useful knowledge about hearing was imparted over those 140 years). What is also an irritant for someone who learned their statistics in the 1960s is how much of the solid description of statistics described in those days has been lost. The classic texts of my time were by Bill Hays and the other by Ben Winer. One should look at these texts and see what they have to say about type I and II error and level of significance. There is no mention of a magical 0.05 level as such levels are entirely arbitrary. These authors, and many others at the time, stated that the p value should be provided and the decision as to what is statistically significant is left to “reader” to decide. Within subjects designs might very well lead to different criteria for statistical level of significance. If some of the ideas of “registered reports” are to be pushed, I really hope serious attention is paid to the variety of research that is done in our field so that an additional burden, among an ever increasing number of burdens, is not imposed on those for whom such “reports” would be almost useless.

William A. Yost, PhD

Research Professor

Spatial Hearing Facility

Speech and Hearing Science

Arizona State University

PO Box 870102

Tempe, AZ 85287

480-727-7148

William.yost@xxxxxxx

From: AUDITORY - Research in Auditory Perception [mailto:AUDITORY@xxxxxxxxxxxxxxx] On Behalf Of Frederick Gallun
Sent: Tuesday, June 12, 2018 11:46 AM
To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Re: Registered reports

I will add a comment on Les’ point about the unfamiliarity of replication crises and failures to publish null results in some of the areas of hearing science. This is relevant to the registered reports question because it is actually very important to not that psychophysics is not in a replication crisis and when a model prediction fails in a psychophysical laboratory, everyone is still interested in knowing about it. What then is the difference between psychophysics and other areas of psychology, other than what is being studied?

A compelling answer is made quite well by a recent paper (Smith, P.L. & Little, D.R. Psychon Bull Rev (2018) https://doi.org/10.3758/s13423-018-1451-8) on the power of small-n repeated measures designs. The authors argue that the replication crisis is not going to be solved by overpowering all of our experiments, as some have proposed. Instead, we should look to the methods of psychophysics in which the individual participant is the replication unit, theories are quantitative and make mathematical predictions, and the hypothesis testing is thus on much firmer ground.

So, what makes psychophysics so useful as a model, and why don’t we see failures of replication weakening our theories of auditory perception? Smith and Little might say that it is because 1) we work hard to find and use measurement instruments that appear to be monotonically related to the psychological entity that we are trying to understand (i.e., intensity perception or binaural sensitivity), 2) we spend a lot of time coming up theories that can be formulated mathematically and thus the hypothesis to be tested takes the form of a mathematical prediction, and 3) these model predictions are directly expressed at the individual level. The last piece is extremely important, because it gives a level of control over error variance that is nearly impossible at the level of group effects. The Smith and Little article is not particularly surprising to those of us used to controlling variance by repeatedly testing our participants until they are well-practiced at the task and only then introducing variations in the tasks or stimuli that we expect to produce specific effects at the level of the individual participant.

This approach is not common in the areas of psychology suffering from the replication crisis. Consequently, the common suggestion has been to increase the number of participants rather than question the wisdom of using large-n designs with ordinal hypotheses based on theories that cannot be described mathematically and measurement instruments that are designed based more on convenience than on monotonic relationships to the putative psychological entity to be tested. As Smith and Little argue, this is an opportunity to change the field of scientific psychology in a very positive way, and the path is by focusing on increasing sample size at the participant level through repeated testing across multiple theoretically-connected conditions rather than at the group level. As a psychophysicist who works with clinical populations (and an Editor and Reviewer of many clinical research manuscripts), I find this question very relevant, because those who work with patients are much more likely to come from a background of large-n designs, where experimental rigor is associated with assigning each participant to a single condition and comparing groups. In this case, it is obviously important to have as large a number of participants in each group as possible and to make each participant as similar to the others as possible. This often leads to enormous expenditures of time and effort in recruiting according to very strict inclusion criteria. For practical reasons, either the inclusion criteria or the sample size is almost an impossible barrier to achieving the designed experiment. The result is unless both money and time are in great supply, the study ends up being underpowered.

From this perspective, I see the registered report as a useful way to have the discussion about the most powerful methods before large amounts of time and resources have been devoted to the study, and I would encourage those with expertise in controlling error variance and experience in developing robust tools to do their best to bring this knowledge to the other areas of the field in as constructive a manner as possible. I would hope that the registered report could be a vehicle for this discussion.

Erick Gallun

Frederick (Erick) Gallun, PhD  

Research Investigator, VA RR&D National Center for Rehabilitative Auditory Research
Associate Professor, Oregon Health & Science University

Editor in Chief - Hearing, Journal of Speech, Language, and Hearing Research

http://www.ncrar.research.va.gov/AboutUs/Staff/Gallun.asp

On Tue, Jun 12, 2018 at 6:16 AM Les Bernstein <lbernstein@xxxxxxxx> wrote:

I agree with Ken and Roger. It's neither clear that the current system falls short nor that RRs would, effectively, solve any such problem. To the degree there is a problem, I fail to see how making RRs VOLUNTARY would serve as an effective remedy or, voluntary or not, serve to increase "standards of publication." If people wish to have the option, that sounds benign enough, save for the extra work required of reviewers.

As suggested by Matt, I tried to think of the "wasted hours spent by investigators who repeat the failed methods of their peers and predecessors, only because the outcomes of failed experiments were never published." Across the span of my career, for me and for those with whom I've worked, I can't identify that such wasted hours have been spent. As Ken notes, well-formed, well-motivated experiments employing sound methods should be (and are) published.

Likewise, re Matt's comments, I cannot recall substantial instances of scientists "who cling to theories based on initial publications of work that later fails replication, but where those failed replications never get published." Au contraire. I can think of a quite a few cases in which essential replication failed, those findings were published, and the field was advanced. I don't believe that it is the case that many of us are clinging to theories that are invalid but for the publication of failed replications. Theories gain status via converging evidence.

It seems to me that for what some are arguing would, essentially, be an auditory version of The Journal of Negative Results (https://en.wikipedia.org/wiki/Journal_of_Negative_Results_in_Biomedicine).

Still, if some investigators wish to have the RR option and journals are willing to offer it, then, by all means, have at it. The proof of the pudding will be in the tasting.

Les

On 6/9/2018 5:13 AM, Roger Watt wrote:

3 points:

1. The issue of RR is tied up with the logic of null hypothesis testing. There are only two outcomes for null hypothesis testing: (i) a tentative conclusion that the null hypothesis should be regarded as inconsistent with the data and (ii) no conclusion about the null hypothesis can be reached from the data. Neither outcome refers to the alternative hypothesis, which is never tested. A nice idea in the literature is the counter-null. If I have a sample of 42 and an effect size of 0.2 (r-family), then my result is not significant: it is not inconsistent with a population effect size of 0. It is equally not inconsistent with the counter-null, a population effect size of ~0.4. It is less inconsistent with all population effect sizes in between the null and the counter-null. (NHST forces all these double negatives).

2. The current system of publish when p<0.05 is easy to game, hence all the so-called questionable practices. Any new system, like RR, will in due course become easy to game. By a long shot, the easiest (invalid) way to get an inflated effect size and an inappropriately small p is to test more participants than needed and keep only the “best” ones. RR will not prevent that.

3. NHST assumes random sampling, which no-one achieves. The forms of sampling we use in reality are all possibly subject to issues of non-independence of participants which leads to Type I error rates (false positives) that are well above 5%.

None of this is to argue against RR, just to observe that it doesn’t resolve many of the current problems. Any claim that it does, is in itself a kind of Type I error and Type I errors are very difficult to eradicate once accepted.

Roger Watt

Professor of Psychology

University of Stirling

From: AUDITORY - Research in Auditory Perception [mailto:AUDITORY@xxxxxxxxxxxxxxx] On Behalf Of Ken Grant
Sent: 09 June 2018 06:19
To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Re: Registered reports

Why aren’t these “failed” experiments published? What’s the definition of a failed experiment anyway.

I think that if the scientific question is well formed and well motivated AND the methods sound and appropriate for addressing the question, then whatever the result may be, this seems like a good experiment and one that should be published.

Sent from my iPhone

Ken W. Grant, PhD

Chief, Scientific and Clinical Studies

National Military Audiology and Speech-Pathology Center (NMASC)

Walter Reed National Military Medical Center

Bethesda, MD 20889

kenneth.w.grant.civ@xxxxxxxx

ken.w.grant@xxxxxxxxx

Office: 301-319-7043

Cell: 301-919-2957

On Jun 9, 2018, at 12:48 AM, Matthew Winn <mwinn2@xxxxxx> wrote:

The view that RRs will stifle progress is both true and false. While the increased load of advanced registration and rigidity in methods would, as Les points out, become burdensome for most of our basic work, there is another side to this. This is not a matter of morals (hiding a bad result, or fabricating a good result) or how to do our experiments. It’s a matter of the standards of *publication*, which you will notice was the scope of Tim’s original call to action. In general, we only ever read about experiments that came out well (and not the ones that didn’t). If there is a solution to that problem, then we should consider it, or at least acknowledge that some solution might be needed. This is partly the culture of scientific journals, and partly the culture of the institutions that employ us. There's no need to question anybody's integrity in order to appreciate some benefit of RRs.

Think for a moment about the amount of wasted hours spent by investigators who repeat the failed methods of their peers and predecessors, only because the outcomes of failed experiments were never published. Or those of us who cling to theories based on initial publications of work that later fails replication, but where those failed replications never get published. THIS stifles progress as well. If results were to be reported whether or not they come out as planned, we’d have a much more complete picture of the evidence for and against the ideas. Julia's story also resonates with me; we've all reviewed papers where we've thought "if only the authors had sought input before running this labor-intensive study, the data would be so much more valuable."

The arguments against RRs in this thread appear in my mind to be arguments against *compulsory* RRs for *all* papers in *all* journals, which takes the discussion off course. I have not heard such radical calls. If you don’t want to do a RR, then don’t do it. But perhaps we can appreciate the goals of RR and see how those goals might be realized with practices that suit our own fields of work.

Matt

--------------------------------------------------------------

Matthew Winn, Au.D., Ph.D.
Assistant Professor
Dept. of Speech & Hearing Sciences
University of Washington

The University achieved an overall 5 stars in the QS World University Rankings 2018

The University of Stirling is a charity registered in Scotland, number SC 011159.

--
Leslie R. Bernstein, Ph.D. | Professor
Depts. of Neuroscience and Surgery (Otolaryngology)| UConn School of Medicine
263 Farmington Avenue, Farmington, CT 06030-3401
Office: 860.679.4622 | Fax: 860.679.2495

---------------------------------------------
Frederick (Erick) Gallun, PhD  Research Investigator, VA RR&D National Center for Rehabilitative Auditory Research
Associate Professor, Oregon Health & Science University
http://www.ncrar.research.va.gov/AboutUs/Staff/Gallun.asp