Patient Reported Outcomes Measures in Orthopaedic Surgery
A Primer

J. Banks Deal Jr., MD

Patient Reported Outcomes Measures (PROMs) are standardized instruments used to compare the results of orthopaedic treatments and conditions. They attempt to use patients’ own responses to record meaningful information not included in clinician-reported objective data (e.g. union or infection rate, range of motion, estimated blood loss, etc.). As the sophistication of medical research has increased, the importance of identifying subtle differences in outcomes has increased. This emphasis led to an explosion of PROMs, with thousands of different instruments repoted in the literature. The dizzying array of outcomes led inevitably to consolidation as authors, journals, and associations attempted to “speak the same language.” This article is intended to describe the rationale for development of PROMs, explain the measures we use to assess the PROMs themselves, highlight some legacy PROMs commonly encountered in the literature, and finally touch on the solution promulgated by the current U.S. military health care establishment: the computerized Patient-Reported Outcomes Measurement Information System (PROMIS).

Development of PROMs
The development of modern PROMs has yielded three broad classes of instruments: generalized, regional, and disease specific. Generalized PROMs attempt to measure a patient’s overall well-being (Short Form 36, Visual Analog Scale). Regional PROMs reflect specific anatomic locations (Disability of the Shoulder and Hand, DASH). Disease specific PROMs are meant for use on a lone diagnosis (Boston Carpal Tunnel Questionnaire, CTQ).

Classically, PROMs were developed with either a clinimetric or psychometric design philosophy. Clinimetric PROM development addressed multiple patient factors (heterogeneous scales) and relied heavily on the judgment of both the subject and design team. Psychometric PROM development employed statistical analysis from behavioral psychology, and generally aimed to measure a single factor (homogenous scales).1 Some argue that the two strategies have merged, and one may find reference to “psychometric” or “clinimetric” qualities of studies. In general use, psychometric is the preferred term for statistical qualities of PROMs.

The COSMIN study, COnsensus-based Standards for the selection of health Measurement INstruments, established consensus-based definition for psychometric terminology.2 It also suggested a checklist which may be used to evaluate new or unfamiliar PROMs. To briefly summarize, reliability refers to freedom from measurement error. This can be examined under several conditions: results over multiple points of time (test-retest), results performed by different persons (interrater), and by the same person at different times (intrarater). Validity is the degree to which a PROM measures the construct (e.g., hand function) that it purports to measure. Responsiveness is the ability of the PROM to detect change of time. Interpretability refers to the degree one can assign qualitative meaning to an instruments quantitative score.

Floor and ceiling effects, or (F/C), describe an instrument’s ability to discriminate between responses at the end of a scale. Responders who receive identical scores because they are either at the bottom (floor) or top (ceiling) of the scale are indistinguishable, while they may be clinically faring quite differently. One classic example of a ceiling effect is the AOFAS Foot and Ankle Function Index, in which the most difficult activity assessed is walking up a flight of stairs.3 Typically, (F/C) effects are reported as a percentage of total responders who report the highest or lowest possible scores, with an (F/C) threshold < 15% considered adequate for an outcomes measure.4

Important Legacy Measures
Orthopaedic surgeons must frequently interpret generalized, regional, and disease specific PROMs. The following represent a selection of the most commonly employed legacy, i.e. pre-PROMIS, measures used in orthopaedic literature.5-13

The proliferation of PROMs prompted a reaction from clinicians and researchers. The regime of choosing lone or multiple PROMs for each patient, ensuring patient compliance with completing each of these instruments periodically, and maintaining intelligible records of all of these instruments was burdensome on all parties involved. In response, the U.S. National Institutes of Health funded multiple groups to develop a reliable, computerized system that could overcome these issues. PROMIS, the Patient-Reported Outcomes Measurement Information System, was produced as a result of these efforts. The design of this system took advantage of Item Response Theory and Computerized Adaptive Testing to dynamically evaluate patient outcomes, resulting in more efficient questioning without a loss of psychometric properties. It has compared favorably with many legacy measures in direct comparison studies. Furthermore, rather than outputting an arbitrary numeral as the result, PROMIS delivers outcomes as a T-score, allowing for rapid interpretation of results.14 The PROMIS System is still being periodically improved (for example, the Upper Extremity subdomain is now on version 2.0), so one should keep an eye on versions when interpreting results. Several of the comparison studies are listed below in Table 7.

The future development of PROMs will emphasize measures with excellent psychometric properties and a low burden on the patient, clinician, and researcher. Standardization of instruments across fields will aid surgeons looking to easily digest literature. In the near-term, dual-reporting of legacy and PROMIS-style measures will be required.

Works Cited
1. Marx RG, Bombardier C, Hogg-Johnson S, Wright JG. Clinimetric and Psychometric Strategies for Development of a Health Measurement Scale. J Clin Epidemiol. 1999;52(2):105-111.
2. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, de Vet HCW. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63:737-745.
3. Ibrahim T, Beiri A, Azzabi M, Best AJ, Taylor GJ, Menon DK. Reliability and Validity of the Subjective Component of the American Orthopaedic Foot and Ankle Society Clinical Rating Scales. Journal of Foot & Ankle Surgery. 2007;46(2):65-74.
4. Gulledge CM, Lizzio VA, Smith G, Guo E, Makhni EC. What Are the Floor and Ceiling Effects of Patient-Reported Outcomes Measurement Information System Computer Adaptive Test Domains in Orthopaedic Patients? A Systematic Review. Arthroscopy. 2019;1(1):1-12.
5. Hoang-Kim A, Pegreffi F, Moroni A, Ladd A. Measuring wrist and hand function: Common scales and checklists. Injury. 2011;42:253-258.
6. Slobogean GP, Slobogean BL. Measuring shoulder injury function: Common scales and checklists. Injury. 2011;42:248-252.
7. Schoenfeld AJ, Bono CM. Measuring spine fracture outcomes: Common scales and checklists. Injury. 2011;42:265-270.
8. Ahmad MA, Xypnitos FN, Giannoudis PV. Measuring hip outcomes: Common scales and checklists. Injury. 2011;42:259-264.
9. Collins NJ, Prinsen CAC, Christensen R, Bartels EM, Terwee CB, Roos EM. Knee Injury and Osteoarthritis Outcome Score (KOOS): systematic review and meta-analysis of measurement properties. Osteoarthritis and Cartilage. 2016;24(8):1317-1329.
10. Peer MA, Lane J. The Knee Injury and Osteoarthritis Outcome Score (KOOS): A Review of Its Psychometric Properties in People Undergoing Total Knee Arthroplasty. JOSPT. 2013;43(1):20-28.
11. Farrugia P, Goldstein C, Petrisor BA. Measuring foot and ankle injury outcomes: Common scales and checklists. Injury. 2011;42:276-280.
12. Kroenke K, Kreps EE, Turk D, et al. Core Outcome Measures for Chronic Musculoskeletal Pain Research: Recommendations from a Veterans Health Administration Work Group. Pain Medicine. 2018;20(8):1500-1508.
13. Smith BW, Dalen J, Wiggins K, Tooley E, Christopher P, Bernard J. The brief resiliency scale: assessing the ability to bounce back. Int J Behav Med. 2008;15(3):194-200.
14. Brodke DJ, Saltzman CL, Brodke DS. PROMIS for Orthopaedic Outcomes Measurement. JAAOS. 2016;24: 744-749.
15. Fidai MS, Saltzman BM, Meta F, Lizzio VA, Stephens JP, Bozic KJ, Makhni EC. Patient-Reported Outcomes Measurement Information System and Legacy Patient-Reported Outcomes Measures in the Field of Orthopaedics: A Systematic Review. Arthroscopy. 2017;34(2):605-614.
16. Bernstein DN, Houck JR, Mahmood B, Hammert WC. Minimally Clinically Important Differences for PROMIS Physical Function, Upper Extremity, and Pain Interference in Carpal Tunnel Release Using Region- and Condition-Specific PROM Tools. J Hand Surg Am. 2019;44(8):635-640.
17. Makhni EC, Meldau JE, Blanchett J, Borowsky P, Stephens J, Muh S, Moutzouros V. Correlation of PROMIS Physical Function, Pain Interference, and Depression in Pediatric and Adolescent Patients in the Ambulatory Sports Medicine Clinic. OJSM. 2019;7(6):1-6.
18. Gausden EB, Levack AE, Sin DN, Nwachukwu BU, Fabricant PD, Nellestein AM, Wellman DS, Lorich DG. Validating the Patient Reported Outcomes Measurement Information System (PROMIS) Computerized Adaptive Tests for Upper Extremity Fracture Care. JSES. 2018;27:1191-1197.
19. Bernstein DN, Houck JR, Hammert WC. A Comparison of PROMIS UE Versus PF: Correlation to PROMIS PI and Depression, Ceiling and Floor Effects, and Time to Completion. J Hand Surg Am. 2019;44(10):901.e1-e7.
20. Tyser AR, Hung M, Bounsanga J, Voss MW, Kazmers NH. Evaluation of Version 2.0 of the PROMIS Upper Extremity Computer Adaptive Test in Nonshoulder Upper Extremity Patients. J Hand Surg Am. 2019;44(4):267-276.
21. Mahmood B, Chongshu C, Qiu X, Messing S, Hammert WC. Comparison of the Michigan Hand Outcomes Questionnaire, Boston Carpal Tunnel Questionnaire, and PROMIS Instruments in Carpal Tunnel Syndrome. J Hand Surg Am. 2019;44(5):366-373.