Clarifying Measurement and Construct-Level Inference in Myers and Abramowitz’s Review of the Inference-Based Approach

Note to readers:
A condensed, peer-reviewed version of this article has been published in the Journal of Obsessive-Compulsive and Related Disorders. That shorter version underwent independent peer review and is limited to 1,000 words in accordance with journal requirements.

For those who prefer the concise, peer-reviewed version, a downloadable preprint is available here:

[Link to published article page]
[Link to preprint PDF]

The longer version below provides a fuller discussion of the issues addressed in the published article.

Clarifying Measurement and Construct-Level Inference in Myers and Abramowitz’s Review of the Inference-Based Approach

Frederick Aardema

Introduction

Myers and Abramowitz (2025) present a detailed review of the inference-based approach (IBA) and the construct of inferential confusion (IC). The review documents a consistent empirical pattern: measures of inferential confusion are reliably associated with obsessive–compulsive symptom severity, and these associations frequently persist after accounting for general psychological distress and established obsessive belief domains. The authors also note that available IC instruments typically demonstrate strong internal consistency and a robust unidimensional factor structure. These observations are theoretically consequential, insofar as they bear on attempts to identify process-level contributors to obsessive–compulsive disorder (OCD) symptom expression.

At the same time, the review uses a set of measurement-level concerns to substantially qualify the evidentiary base supporting inferential confusion and, by implication, the IBA. In their discussion, Myers and Abramowitz (2025) argue that although IC measures exhibit “excellent reliability,” unresolved questions about construct breadth, self-report accessibility of inferential processes, and questionnaire development “contextualize much of the existing evidence for the IBA.” Measurement refinement and independent replication are clearly warranted. However, several of the review’s key inferences appear to conflate limitations of particular operationalizations with limitations of the underlying construct. The inferential step from instrument critique to construct-level doubt is not fully supported by the evidence as presented.

Instrument critique and construct validity

The review characterizes inferential confusion as “broad and nebulous” and raises the concern that existing instruments may not capture its “entirety.” Breadth alone, however, is not sufficient to undermine construct validity. Many established OCD constructs are similarly multifaceted and require multiple operationalizations without invalidating the construct, a point long recognized in the construct-validation literature (Cronbach & Meehl, 1955; Campbell & Fiske, 1959). Acknowledging limitations in existing instruments reflects constraints in current operationalizations rather than indeterminacy of the underlying construct. Within classical construct validation, individual instruments are treated as fallible indicators embedded within a broader nomological network (Cronbach & Meehl, 1955). Limitations of any single operationalization do not, in themselves, invalidate the construct, particularly when convergent patterns are observed across distinct methods (Campbell & Fiske, 1959).

Notably, Myers and Abramowitz (2025) themselves describe inferential confusion as a “genuine signal” whose effects are sometimes obscured by methodological noise. This framing supports a conservative interpretation: the field would benefit from improved measurement, clearer tests of process-level hypotheses, and stronger designs. It does not require skepticism regarding whether inferential confusion represents a coherent reasoning phenomenon relevant to obsessive doubt.

Scope of the review

The article is framed as a critical review of the inference-based approach (IBA). If the aim is to evaluate IBA as a theoretical and clinical model, the scope of the review warrants clarification. Although positioned as a broad critique, several of the review’s central conclusions regarding construct validity are derived primarily from concerns about the wording, response formats, and endorsement properties of inferential confusion (IC) questionnaires.

This focus is informative, but it is not equivalent to a comprehensive evaluation of IBA as a theoretical framework. Measurement critique is one component of construct evaluation; it does not exhaust it. Several elements central to the approach receive comparatively limited engagement, including treatment outcome and process studies of I-CBT, as well as empirical work integrating inferential confusion with fear-of-self constructs.

Importantly, the review explicitly limits its scope to the conceptual aspects of inferential confusion and thereby excludes intervention studies by design. While such a restriction can be defended as a matter of scope definition, it does not follow that intervention and process evidence are conceptually irrelevant. When inferential confusion is explicitly measured as a mechanism of change within longitudinal designs, outcome studies contribute to construct validation by situating the construct within a broader nomological network that includes temporal coherence, clinical responsiveness, and theoretically predicted patterns of change. Excluding such studies necessarily constrains the evidentiary frame within which conclusions about construct validity are drawn.

A similar narrowing occurs with respect to work integrating inferential confusion and fear-of-self constructs. Although this literature is cited as part of the broader correlational evidence base, its theoretical implications for evaluating inferential confusion as a core process within a larger inferential framework are not substantively developed. When considered alongside the exclusion of intervention research, this limited engagement yields an evidentiary base that is narrower than the broader inference-based literature would support. Under these conditions, conclusions regarding the viability or coherence of IBA risk extending beyond the scope of the evidence reviewed.

The review also suggests that the inferential confusion literature relies on a relatively limited number of clinical samples. This characterization warrants nuance. A substantial portion of the empirical work on inferential confusion and I-CBT includes diagnostically confirmed OCD samples and clinically severe groups, alongside nonclinical samples used for scale development and process testing. Moreover, the use of nonclinical and analogue samples has been explicitly defended by Abramowitz and colleagues (2014) under the continuity hypothesis, which conceptualizes obsessive–compulsive phenomena and their underlying cognitive processes as varying along a continuum rather than forming a categorical clinical divide. From this perspective, nonclinical samples are well suited for examining core processes and refining measurement, and evidence derived from such samples complements, rather than undermines, findings from clinical populations. When viewed within this broader methodological context, the evidentiary base appears more heterogeneous than the review’s framing might suggest.

Convergence across measures

A central element of the review’s critique concerns the development of the Inferential Confusion Questionnaire–Expanded Version (Aardema et al., 2010), particularly the removal of items with low endorsement in an OCD sample. The authors suggest that this procedure may have reshaped the instrument and rendered its associations tautological. This interpretation underweights convergent evidence across the broader development history of inferential confusion measures and elevates a single methodological decision to construct-level significance.

Earlier versions of the Inferential Confusion Questionnaire, developed using different item-generation and selection strategies, nonetheless show similar factor structures and associations with obsessive–compulsive symptoms. If the ICQ-EV’s findings were primarily artifacts of endorsement-based pruning, divergence across versions would be expected. That pattern is not evident. In the absence of such divergence, the claim that endorsement trimming produced spurious coherence remains inferential rather than demonstrated.

The review also acknowledges task-based measures such as the Dysfunctional Reasoning Processes Task (DRPT; Baraby, Wong, Radomsky, & Aardema, 2021; Baraby, Bourguignon, & Aardema, 2022), and reports moderate correlations between the DRPT and the ICQ-EV, describing these findings as modest evidence of convergent validity within a multitrait–multimethod framework. Yet cross-method convergence is precisely what should reduce concern that associations reflect idiosyncratic features of a single questionnaire. From a construct validation perspective, convergence across independently developed instruments employing different formats and response demands is a strength rather than a liability.

The suggestion that stronger cross-method convergence would be required for measures of the same construct rests on a particular expectation regarding effect size. An alternative interpretation is equally plausible. When distinct instruments converge across formats and response demands, this pattern more parsimoniously indicates that they are capturing a shared underlying configuration of reasoning rather than reflecting method-specific artifacts (Campbell & Fiske, 1959). For a construct defined as a recurrent pattern of misdirected relevance in reasoning, in which imagined possibilities override direct perceptual or contextual information, such convergence is consistent with functional coherence rather than conceptual vagueness.

Taken together, privileging a single procedural feature while discounting replication across versions, formats, and methods risks selective evidentiary weighting. Construct appraisal requires attention to the broader nomological network rather than isolated psychometric decisions (Meehl, 1990). When the total evidentiary pattern is considered, the inference that endorsement-based procedures undermine construct validity appears insufficiently grounded in the full body of available data.

Endorsement-based item screening

The critique of endorsement-based item screening warrants further qualification. Myers and Abramowitz (2025) note that item reduction based on endorsement can be appropriate, yet suggest that a more rigorous approach would rely on item–total correlations or factor loadings. This recommendation does not fully account for a basic statistical constraint: items with extreme floor effects yield restricted variance and unstable covariance structures, limiting the interpretability of correlational and factor-analytic indices. In such cases, endorsement screening is often a prerequisite for meaningful psychometric evaluation rather than an inferior substitute.

Classical psychometric theory has long recognized the necessity of removing extremely low-endorsement items to avoid skewed distributions and unstable factor solutions (Nunnally & Bernstein, 1994; DeVellis, 2017). Importantly, items in the ICQ-EV were removed on the basis of frequency, not symptom correlation. To substantiate claims of tautology or construct distortion, item-level evidence would be required showing that retained items merely restate symptom content or that selection was guided by associations with OCD severity. No such evidence is presented.

Even if endorsement-based trimming narrowed the measure toward more common expressions of inferential confusion, no data are offered demonstrating that the core inferential process was altered. All psychometric refinement methods privilege items that function well within a given sample. Without item-level demonstration that removed items uniquely indexed core inferential mechanisms, the claim that endorsement screening compromised construct validity remains conjectural rather than established.

Phenomenology and construct definition

The review suggests that the ICQ-EV may reflect “OCD-specific phenomenology rather than inferential confusion per se.” This objection rests on a category error concerning how inferential confusion is defined within the inference-based approach. Inferential confusion is not conceptualized as a content-free cognitive operation that exists independently of experience. Rather, it is defined as a disturbance in reasoning as it is lived and enacted, specifically a pattern in which imagined possibilities acquire sufficient relevance to override direct perceptual and contextual information.

In this framework, phenomenological expression is not an incidental byproduct of the construct but its primary mode of manifestation. Inferential confusion is primarily identified through the subjective experience of doubt, plausibility, and conviction generated by inferential and imaginative reasoning. Accordingly, measuring phenomenological features of reasoning does not constitute contamination by symptom content; it is the means by which the construct is operationalized. Characterizations of inferential confusion as “nebulous” follow naturally if the construct is evaluated as a content-free cognitive operation rather than as a disturbance in lived reasoning.

The inference that endorsement-based item trimming necessarily reshaped the ICQ-EV toward OCD symptom phenomenology is therefore underdetermined. The same procedure could plausibly have increased sensitivity to the experiential features central to inferential confusion by removing items that were rare, weakly discriminating, or insufficiently anchored to lived reasoning patterns. In the absence of item-level evidence demonstrating that retained items primarily restate symptom content rather than index inferential processes, this concern remains theoretically possible but empirically unsubstantiated.

Generalized versus context-specific measurement

The review further argues that the generalized phrasing of the ICQ is incongruent with the inference-based approach’s emphasis on the selective deployment of inferential confusion within obsession-relevant contexts. At the same time, it raises concerns that task-based measures such as the Dysfunctional Reasoning Processes Task (DRPT; Baraby et al., 2021; 2022), which rely on specific scenarios, may fail to capture the idiosyncratic content of individual obsessions. These critiques pull in opposite directions. Generality trades off with contextual specificity, while contextual realism trades off with universality. This tension is common in psychopathology research and does not signal conceptual inconsistency within the IBA.

The distinction between generalized and context-specific measurement is theoretically important, but it does not follow that generalized assessment is invalid or misaligned with selective deployment. Selectivity refers to when inferential confusion is activated in lived experience, not necessarily to how vulnerability must be assessed. Generalized instruments can index a recurrent propensity for inferential confusion to emerge in obsession-relevant reasoning, without implying that this reasoning process operates uniformly across all domains. In contrast, context-specific tasks are designed to capture the moment-to-moment enactment of this reasoning pattern under triggering conditions.

Importantly, responses to generally phrased ICQ items are unlikely to be context-free, particularly in clinical samples. Individuals typically anchor their responses to domains in which they experience recurrent distress or difficulty. As a result, even when items are phrased broadly, responses are implicitly contextualized by the situations that are most salient to the individual and within which inferential confusion most often arises. In this sense, generalized wording does not preclude domain specificity at the level of response, but rather allows inferential confusion to be flexibly mapped onto idiosyncratic patterns of obsessional reasoning.

Context-specific and task-based measures such as the DRPT therefore represent complementary measurement strategies rather than competing or conceptually inconsistent ones. Each involves unavoidable trade-offs. Highly specific measures may offer greater ecological precision but risk idiosyncrasy and limited generalizability across heterogeneous OCD presentations. Generalized measures sacrifice some contextual detail while enabling broader coverage, comparability across samples, and assessment of inferential confusion as a transdiagnostic reasoning vulnerability within OCD. This balance reflects a standard methodological tension, not a flaw in the underlying construct.

Response formats, awareness, and tasks

Concerns regarding agreement/disagreement response formats and self-report awareness raise legitimate issues for refinement but do not uniquely threaten inferential confusion. Agreement formats are routinely used to assess frequency or typicality of experiences even when the underlying construct is procedural rather than propositional. More broadly, limitations in self-report accessibility are well documented across psychopathology, particularly in conditions characterized by overvalued ideation or limited insight, including some presentations of obsessive–compulsive disorder and related disorders such as body dysmorphic disorder. In such cases, reduced insight would be expected to attenuate self-reported endorsement of inferential confusion rather than inflate it, without constituting evidence against the construct itself. Indeed, if introspective access were fundamentally compromised, one would expect random or inconsistent responding rather than the systematic patterns of association repeatedly observed between IC measures and obsessive–compulsive symptom severity. The presence of such patterned associations suggests that self-report, while imperfect, captures meaningful and nontrivial variance.

At the same time, available evidence indicates that many individuals with obsessive–compulsive disorder can recognize, at least descriptively, that their reasoning departs from direct perceptual or contextual evidence and relies on imagined possibilities. This level of awareness does not entail resolution of obsessional doubt. Individuals may accurately report imagination-based reasoning while remaining unaware that this mode of inference renders their doubts epistemically irrelevant, a distinction emphasized within the inference-based approach. Descriptive awareness of one’s reasoning, in this sense, is therefore compatible with symptom persistence and does not undermine the use of self-report measures. This distinction between descriptive awareness and metacognitive appraisal is central: individuals may report how they reason without recognizing the epistemic status of that reasoning, allowing self-report to index inferential patterns even in the presence of ongoing doubt.

Where insight is more severely compromised, limitations of self-report are best understood as a general measurement challenge rather than a construct-specific flaw. Under such conditions, group-level comparisons are more likely to underestimate, rather than exaggerate, differences associated with inferential confusion. Triangulation using task-based measures, clinician ratings, and idiographic methods therefore represents a principled response to measurement constraints rather than a corrective for construct inadequacy. Such triangulation reflects standard practice in construct validation, where no single method is presumed definitive but convergent evidence across methods strengthens confidence in the underlying process.

Concerns that some ICQ-EV items are vague or difficult to rate dimensionally reflect the same psychometric tensions discussed elsewhere. Items that apply broadly or lack temporal anchoring are especially vulnerable to restricted variance, and screening procedures are commonly used to address this. Limitations noted for the DRPT likewise underscore the inevitability of methodological trade-offs. Such trade-offs are characteristic of construct measurement and do not, in themselves, imply conceptual incoherence.

More fundamentally, the review raises concerns about whether inferential processes are accessible to introspection at all, thereby questioning the extent to which inferential confusion can be validly measured using self-report instruments. OCD is characterized by variable levels of insight, both within and across individuals (e.g., Neziroglu et al., 1999). Such variability is itself well documented and does not preclude meaningful assessment at the group level. Recent work further indicates that insight-related reasoning processes can be meaningfully assessed via self-report despite this limitation. In particular, the Cognitive Obsessional Insight Scale (COGINS) demonstrates good internal consistency, test–retest reliability, convergent validity with clinician-rated measures of insight, and sensitivity to treatment-related change (Ouellet-Courtois et al., 2024). While such findings do not resolve broader measurement challenges, they provide proof of principle that reasoning- and insight-adjacent processes are accessible to self-report in OCD.

Consistent with this interpretation, inference-based cognitive therapy has demonstrated clinical efficacy in individuals with limited or poor insight (Visser et al., 2015). This finding indicates that the reasoning processes targeted by the model remain functionally engaged even when metacognitive awareness is reduced. If inferential processes were wholly inaccessible to awareness or measurement, systematic therapeutic engagement with these processes would be difficult to document. Taken together, these observations do not eliminate the need for continued refinement of inferential confusion measurement, but they temper the inference that introspective limitations may render the construct inaccessible or unmeasurable.

Incremental validity and theory

The review appropriately emphasizes inconsistent reporting of semi-partial correlations and incremental R². Improved reporting is clearly needed. At the same time, incremental variance is a limited test of theoretical validity. When predictors are theoretically related and empirically correlated, shared variance is expected and unique variance attenuated. This reflects well-known limitations of regression-based partitioning when constructs are conceptually proximate, rather than evidence of construct redundancy (Meehl, 1990).

From an inference-based perspective, high correlations between inferential confusion and obsessive belief domains are theory-consistent, as beliefs are conceptualized as downstream products of prior inferential processes. In hierarchical or generative relationships, cross-sectional regression is a blunt instrument for adjudicating theoretical primacy. More informative tests require temporal and experimental designs that assess whether shifts in reasoning precede changes in obsessional doubt.

The review further emphasizes the limited number of experimental studies bearing on inferential confusion and concludes that evidence supporting the inference-based approach as a causal model of obsessive–compulsive disorder is confined to a single study. While direct laboratory manipulation of inferential confusion within OCD samples remains sparse, this conclusion is best interpreted as applying to direct experimental manipulation in OCD specifically, rather than to the broader evidentiary landscape relevant to evaluating inferential confusion as a process construct. Treatment outcome and process studies do not provide direct experimental tests of causal mechanisms, nor do they substitute for designs involving manipulation of inferential confusion itself. However, when inferential confusion is explicitly measured as a mechanism of change within longitudinal designs, such studies contribute to construct validation by situating inferential confusion within a broader nomological network that includes temporal coherence, clinical responsiveness, and theoretically predicted patterns of change. This body of evidence is not incorporated into the evidentiary frame used to evaluate construct validity.

In addition, randomized vignette and task-based paradigms that assign participants to experimentally varied inferential contexts, while assessing inferential confusion as a measured process variable, provide partially experimental tests of inferential reasoning mechanisms even when inferential confusion itself is not directly manipulated (e.g. Yang et al., 2021). In addition, experimental work demonstrating inferential confusion mechanisms in related disorders, such as eating disorders (e.g., Ouellet-Courtois et al., 2021), provides convergent support for the broader plausibility of the inferential framework. Considered together, these lines of evidence suggest a broader empirical foundation than is reflected in a narrowly experimental evidentiary frame.

Independent replication, affiliation, and methodological inference

The review highlights that a substantial portion of the inferential confusion literature has been conducted by the model’s developers or by what the authors describe as “closely affiliated colleagues,” and emphasizes the importance of independent replication. The call for independent verification is appropriate and consistent with best practices in clinical science. However, the characterization of “closely affiliated colleagues” is left undefined and therefore difficult to evaluate as a methodological concern.

It is unclear whether this designation refers to shared institutional affiliation, co-authorship history, training lineage, theoretical orientation, or participation in a common research program. Without explicit criteria, such language risks functioning as a rhetorical qualifier rather than as a clearly specified indicator of potential bias. Author proximity, in itself, does not constitute evidence of methodological weakness, nor does author independence guarantee objectivity. Inference about bias requires attention to study design, analytic transparency, preregistration, and reproducibility rather than reliance on imprecise descriptors of affiliation.

Researcher allegiance is a well-recognized phenomenon in psychotherapy research, particularly in comparative treatment outcome studies (e.g., Luborsky et al., 1999; Munder et al., 2013). The standard response to such concerns has been to strengthen methodological safeguards, ensure transparency, and encourage independent replication. Allegiance, however, is typically treated as a potential moderator of outcomes rather than as a construct-level validity threat. Absent identifiable design flaws or analytic bias, evidentiary weight is determined by methodological rigor and reproducibility rather than by investigator investment alone.

Independent replication is a collective enterprise rather than a unilateral obligation of theory originators. The degree of laboratory independence in a research area depends not only on the activities of its developers, but also on the extent to which external groups elect to engage with, test, and extend the framework. The absence of widespread independent replication, while warranting encouragement, does not by itself constitute negative evidence regarding construct validity.

At the same time, growing engagement from additional research groups is both welcome and methodologically valuable. Independent testing strengthens any framework. It is worth noting, however, that such engagement enters a literature that spans several decades of empirical development. Replication and extension are cumulative enterprises; they refine and expand an existing evidentiary base rather than retroactively determining whether prior work counted as evidence. In this sense, scientific maturity is marked not by replacement of earlier contributions, but by their systematic replication and extension across settings and investigators. Progress in clinical science rarely proceeds through forced alignment with competing paradigms, but through cumulative evaluation within an expanding evidentiary network.

In addition, the empirical literature on inferential confusion and inference-based cognitive therapy is not confined to a single laboratory context or methodological tradition. Correlational studies examining inferential confusion, task-based investigations of reasoning processes, and treatment outcome studies of I-CBT have been conducted across diverse research settings, using heterogeneous samples, designs, and measurement approaches. Investigations have emerged from multiple research groups and international contexts, reflecting a degree of methodological dispersion that exceeds what broad references to investigator affiliation might suggest. While continued independent replication remains essential, the existing body of work already demonstrates greater institutional and methodological breadth than such characterizations imply.

More broadly, the recurring emphasis on author involvement risks conflating investigator origin with evidentiary status. In psychological science, theory development is often initiated and advanced by those most invested in articulating and testing it. Such involvement may warrant heightened attention to methodological rigor and replication, but it does not, in itself, constitute a validity threat. Scientific credibility does not attach to geography or institutional provenance. Evidentiary weight is determined by study design, transparency, reproducibility, and convergence across independent samples and methods, not by the biographical or national location of investigators. To treat author participation as a standing source of evidentiary suspicion, independent of methodological considerations, risks substituting sociological inference for empirical evaluation.

If evidentiary status is routinely qualified on the basis of investigator involvement, an unintended corollary is that a research base would appear more independent to the extent that its originators contribute less to it. Such a standard would not strengthen the evidentiary record; it would reduce the volume of relevant empirical tests available for evaluation. Concerns about potential allegiance effects are more productively addressed through transparent methodological safeguards, preregistration, analytic clarity, and replication across laboratories than through provenance-based qualification of findings.

Conclusion

Myers and Abramowitz (2025) provide a methodologically detailed review that emphasizes measurement rigor, replication, and evidentiary standards. Several of their key inferences, however, appear to extend beyond what the data warrant. Measurement-level concerns are treated as substantial qualifications of the construct itself; endorsement-based item screening is framed as a validity threat despite psychometric precedent; phenomenology is construed as contaminating rather than constitutive; scope-based exclusions are treated as neutral rather than inferentially constraining; and generalized wording is interpreted as a theoretical mismatch rather than as a measurement choice compatible with selective deployment of inferential processes.

Additional constraints inherent to contemporary OCD research, including the predominance of cross-sectional designs, shared-method variance in self-report paradigms, and broader challenges in formal reasoning research, are not unique to inferential confusion. These background conditions contextualize the evidentiary landscape across models and therefore do not, in themselves, justify construct-level skepticism. The present correspondence has focused on what appear to be the most consequential inferential steps in the review rather than attempting an exhaustive response to all subsidiary issues.

Taken together, a more conservative interpretation is that inferential confusion remains a coherent reasoning process implicated in obsessive doubt, supported by convergent evidence across measures. As with other process-level constructs in OCD research, continued conceptual refinement and improved operational precision are both expected and welcome, and do not, in themselves, constitute grounds for questioning construct validity. Measurement can be strengthened through multi-method assessment, including context-sensitive designs that complement generalized instruments, clearer reporting of incremental effects, and direct tests of inferential processes, without conflating limitations of current operationalizations with the viability of the underlying construct.

References

Aardema, F., Moulding, R., Radomsky, A. S., Doron, G., Allamby, J., & Souki, E. (2013). Fear of self and obsessionality: Development and validation of the Fear of Self Questionnaire. Journal of Obsessive-Compulsive and Related Disorders, 2, 306-315.

Aardema, F., Wu, K. D., Careau, Y., O’Connor, K., Julien, D., & Dennie, S. (2010). The expanded version of the Inferential Confusion Questionnaire: Further development and validation in clinical and non-clinical samples. Journal of Psychopathology and Behavioral Assessment, 32, 448-462.

Abramowitz, J. S., Fabricant, L. E., Taylor, S., Deacon, B. J., McKay, D., & Storch, E. A. (2014). The relevance of analogue studies for understanding obsessions and compulsions.
Clinical Psychology Review, 34, 206–217.

Baraby, L-P., Bourguignon, L., & Aardema, F. (2022). The relevance of dysfunctional reasoning to OCD and its treatment: Further evidence for inferential confusion utilizing a new task-based measure. Journal of Behavior Therapy and Experimental Psychiatry, 101728.

Baraby, L-P., Wong, S.F., Radomsky, A.S., & Aardema, F. (2021). Dysfunctional reasoning processes and their relationship with feared self-perceptions and obsessive-compulsive symptoms: An experimental investigation with a new task-based measure of inferential confusion. Journal of Obsessive-Compulsive and Related Disorders, 28, 100593.

Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait–multimethod matrix. Psychological Bulletin, 56, 81–105.

DeVellis, R. F. (2017). Scale development: Theory and applications (4th ed.). Thousand Oaks, CA: Sage Publications.

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302.

Luborsky, L., Diguer, L., Seligman, D. A., Rosenthal, R., Krause, E. D., Johnson, S., Halperin, G., Bishop, M., Berman, J. S., & Schweizer, E. (1999). The researcher’s own therapy allegiances: A “wild card” in comparisons of treatment efficacy. Clinical Psychology: Science and Practice, 6(1), 95–106.

Munder, T., Brütsch, O., Leonhart, R., Gerger, H., & Barth, J. (2013). Researcher allegiance in psychotherapy outcome research: An overview of reviews. Clinical Psychology Review, 33(4), 501–511.

Ouellet-Courtois, C., Aardema F, & O’Connor K. (2021). Reality check: An experimental manipulation of inferential confusion in eating disorders. Journal of Behavior Therapy and Experimental Psychiatry, 70, 101614.

Ouellet-Courtois, C., Audet, J.-S., & Aardema, F. (2024). The Cognitive Obsessional Insight Scale (COGINS): A new measure of cognitive insight in obsessive–compulsive and related disorders. Journal of Cognitive Psychotherapy: An International Quarterly, 38, 133-156.

Meehl, P. E. (1990). Why summaries of research on psychological theories are often uninterpretable. Psychological Reports, 66, 195–244.

Myers, N.S. & Abramowitz, J.S. (2025). Unpacking inferential confusion: A critical review of the inference-based approach to obsessive-compulsive disorder. Journal of Obsessive-Compulsive and Related Disorders, 47, 2025, 100983.

Neziroglu, F., McKay, D., Yaryura-Tobias, J. A., Stevens, K. P., & Todaro, J. (1999). The Overvalued Ideas Scale: Development, reliability, and validity in obsessive–compulsive disorder. Behaviour Research and Therapy, 37(9), 881–902.

Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York, NY: McGraw-Hill.

Visser, H. A., van Megen, H. J., van Oppen, P., Eikelenboom, M., Hoogendorn, A. W., Kaarsemaker, M., & van Balkom, A. J. (2015). Inference-based approach versus cognitive behavioral therapy in the treatment of obsessive–compulsive disorder with poor insight: A 24-session randomized controlled trial. Psychotherapy and Psychosomatics, 84, 284–293.

Yang, J.H., Moulding, R., Wynton, S.K.A., Jaeger, T, & Anglim, J. (2021). The role of feared self and inferential confusion in obsessive compulsive symptoms. Journal of Obsessive-Compulsive and Related Disorders, 28, 100607.