A Call to Establish Common Metrics for Consumer-reported Health Status Measurement
Item response theory (IRT) and computerized adaptive testing (CAT) have much potential in establishing common metrics
and more efficient methods for assessing well-defined domains of health outcomes. However, their full benefits will only be achieved if researchers share their results in a new way.
With these goals in mind, Science Teams and staff of QualityMetric Incorporated and the Health Assessment Lab (HAL) led by John E. Ware, Jr, PhD, have called for the health outcomes
community to collaborate in establishing common metrics for consumer-reported health outcomes measures by:
Item response theory (IRT) and computerized adaptive testing (CAT) have much potential in establishing common metrics and more efficient methods for assessing well-defined domains of health outcomes. However, their full benefits will only be achieved if researchers share their results in a new way.
With these goals in mind, Science Teams and staff of QualityMetric Incorporated and the Health Assessment Lab (HAL) led by John E. Ware, Jr, PhD, have called for the health outcomes community to collaborate in establishing common metrics for consumer-reported health outcomes measures by:
To initiate this endeavor, QualityMetric and HAL are posting the item parameters for the 10 Physical Functioning scale (PF-10) items (normed to the US general population). The intent of documenting these parameters is to enable other researchers to link their item calibrations to norm-based calibrations for these widely-used items.
Additionally, QualityMetric and HAL scientists invite recommendations (email@example.com) on how to best to achieve these objectives and to
recommend individual and institutional collaborators.
The potential IRT and CAT is well documented in a recently published series of articles reporting studies of widely-used measures of headache impact and new computerized dynamic health assessment (DYNHA®) software. These studies demonstrate that 90% reductions in respondent burden are possible while maintaining acceptable standards of validity and precision required for purposes of patient screening and outcomes monitoring. This series of nine articles, which were published in a Special (December 2003) Issue of Quality of Life Research, may be viewed by clicking here.
Item Response Theory Item Parameters
The following table presents item response theory (IRT) item parameters for the SF36 Physical Functioning (PF) items (Ware, Jr., Snow, Kosinski, & Gandek, 1993). The item parameters are provided to allow other researchers to compare their results and to link their item calibrations to the standard calibrations used at QualityMetric Inc, thus establishing a common metric. We chose to present item parameters for items in the PF scale because of the widespread use of this scale.
The PF item parameters were estimated from US general population data from the 1998 National Survey of Functional Health Status (NSFHS). The data was drawn from the sampling frame maintained by the National Family Opinion (NFO) Research. The NFO panel includes 550,000 households representative of the non-institutionalized adult U.S. population and is matched to the U.S. Census data on geographical region, market size, age, income (SES) and household size. Panel households are balanced demographically to the four Census regions and nine Census divisions. The sample of the 1998 NSFHS consisted of 14,906 individuals from 14,906 households, randomly assigned to self-administer the standard (4-week recall) or acute (1-week recall) SF-36 Health Survey, Version 2 (Ware, Jr., Kosinski, Dewey, 2000). PF item parameters were estimated among the sample that completed the standard SF-36 Health Survey.
The PF item parameters are based on the following IRT model:
This model is a Partial Credit Model (Masters, 1982; Masters & Wright, 1997) belonging to the Rasch family of IRT models, since all items have the same slope. The common item slope a is used to rescale the item parameters so the US general adult population has a mean PF IRT score of 0 and a standard deviation of 1. When presenting results, this metric is rescaled to set the mean to 50 and the standard deviation to 10 (multiply by 10 and add 50).
The item parameters were estimated using conditional maximum likelihood estimation with the OPLM software (Verhelst, Glas, & Verstralen, 1995). The rescaling of item parameters was based on IRT scores estimated by weighted maximum likelihood (Warm, 1989).
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-173.
Masters, G. N. & Wright, B. D. (1997). The Partial Credit Model. In W.J.van der Linden & R. K.
Hambleton (Eds.), Handbook of Modern Item Response Theory (pp. 101-122). Berlin:Springer.
Verhelst, N. D., Glas, C. A. W., & Verstralen, H. H. F. M. (1995). OPLM - One-Parameter Logistic Model [Computer software]. Arnhem: CITO.
Ware, J. E., Jr., Snow, K. K., Kosinski, M., & Gandek, B. (1993). SF-36 health survey. Manual and interpretation guide. Boston: The Health institute, New England Medical Center.
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427-450.
Home | I Want SF | SF Surveys | Registration | News | Events | FAQs
SF-36® is a registered trademark of the Medical Outcomes Trust