HomeI want SFNewsEventsFAQsResearchDiscussionFeedback

Order the manual Permission to use Scoring Service

The SF-12®: An Even Shorter Health Survey

 Version 2.0

Is it possible to construct a single-page scannable health survey that is useful in monitoring outcomes in general and specific populations? This challenge came from many large-scale outcomes monitoring efforts that omitted functional health surveys because of severe constraints on the amount of data that could be collected.

The discovery that SF-36® physical and mental component summary scales (referred to as PCS-36 and MCS-36, respectively) capture about 85% of the reliable variance in the eight-scale SF-36® health profile provided a new strategy for meeting this challenge. If two outcome measures are satisfactory for many purposes, a survey with fewer questionnaire items could be constructed to estimate these outcomes. Predictive studies supported this strategy. Twelve SF-36® items and improved scoring algorithms reproduced at least 90% of the variance in PCS-36 and MCS-36 in both general and patient populations, and reproduced the profile of eight SF-36® health concepts sufficiently for large sample studies. The reproductions of PCS-36 and MCS-36 proved to be accurate enough to warrant use of published norms for SF-36® summary measures in interpreting SF-12® summary measures.

We labeled the new 12-item short form the SF-12® Health Survey and published both standard (4-week) and acute (1-week) recall versions for self-administration as well as scripts for personal interviews. Because the SF-12® is a subset of the SF-36®, translations and adaptations of the SF-36® currently being evaluated in over 40 countries are also yielding translations of the SF-12®. The SF-12® physical and mental health summary measures are referred to as PCS-12 and MCS-12, respectively.

There has been much progress in evaluating and documenting the SF-12® since its development began at The Health Institute in the Spring of 1994. A journal article summarizing preliminary tests of reliability and validity completed peer review and published in Medical Care early this year. The Medical Outcome Trust's Scientific Advisory Committee has completed its own peer review of the SF-12® and has approved its distribution.

Numerous investigators and health care delivery organizations have adopted the SF-12®, including the National Commission on Quality Assurance (NCQA), which chose the SF-12® for its Annual Member Health Care Survey, and also the Pacific Business Group on Health, which will be one of the first to use it in monitoring outcomes.

Although the shorter SF-12® form improves efficiency and lowers cost for both profiles and summary scales, the SF-12 has some limitations, which are discussed in the first article and user's manual. Briefly, SF-12 reproduces the eight-scale profile with fewer levels than SF-36® scales and yields less precise scores, as would be expected for single-item and two-item scales. For large group studies, these differences are not as important, because confidence intervals for group averages in health scores are largely determined by sample size.

The choice between the SF-12® and the SF-36® is a choice between more and less practical survey tools and between less and more information about health status and outcomes. Time will tell how to best judge those tradeoffs. The SF-12® Health Survey is most likely to prove to be a satisfactory alternative to the SF-36® when samples are large and the objective is to monitor overall physical and mental health outcomes.

Ware JE, Kosinski M, and Keller SD. A 12-Item Short-Form Health Survey: Construction of scales and preliminary tests of reliability and validity. Medical Care, 1996;34(3):220-233.

SF-12v2™ Health Survey
(Version 2.0)

Changes in the SF-12v2™ are a product of more than 10 years of experience with findings reported in thousands of publications about the SF-36 and SF-12 Health Surveys, and follow exactly those previously recommended and adopted for the SF-36v2 survey (see Ware, Kosinski, and Dewey, 2000).

Improvements in v2

Relative to the original SF-12v1, improvements in the SF-12v2 survey include:

  • improvements in instructions and questionnaire items to shorten and simplify the wording and make it more familiar and less ambiguous;
  • an improved layout for questions and answers in the self-administered forms that makes it easier to read and complete, and that reduces missing responses;
  • greater comparability with translations and cultural adaptations widely-used in the U.S. and in other countries;
  • five-level response choices in place of dichotomous response choices for four items in the two role functioning scales; and,
  • five-level (in place of six-level) response categories to simplify items in the Mental Health (MH) and Vitality (VT) scales.
These and other improvements are briefly explained below.


All responses to questions in SF-12v2 are printed in a horizontal (left-to-right) format, rather than with the mixture of horizontal and vertical formats that were used for response categories in v1. Mixed formats of response choices were a source of confusion among respondents and were linked to missing and inconsistent responses, particularly among the elderly. Other improvements in layout include more consistent use of indenting, numbering or instructions, deletion of item labels, and an improved format for boxes that are checked by respondents.

Type-size and Bolding

A larger type size has been adopted throughout the v2 form. Only instructions, as opposed to response choices, are printed in bold typeface to simplify the “look and feel” of v2. These and other refinements were adopted on the basis of lessons learned from health care surveys and from surveys in other fields.

Wording Changes

Evidence from numerous focus group studies, formal cognitive tests, and empirical studies in more than a dozen countries, support the improvements in item wording and changes in some terms used to identify health concepts in v2. These improvements are expected to make the English-language SF-12 easier to understand and administer, as well as making it more objective. These wording changes also make v2 more comparable with translations of the SF-12. Because most of the improvements in item wording were developed during the process of translating and adapting the SF-36 and the SF-12 for use in other countries pursuant to the IQOLA Project, v2 is sometimes referred to as the “international version”.

Five-Choice Response Scales

There is considerable empirical evidence that the SF-12v2 five-choice response categories substantially improve the two SF-12 role functioning scales. The five-level v2 response categories adopted for the Role Physical (RP) and Role Emotional (RE) scales extend the range measured and greatly increase score precision without increasing respondent burden. Specifically, for the RP and RE scales, v2 achieves a:

  • four-fold increase in the number of levels defined;
  • more than five-fold increase in the range measured;
  • substantially smaller SD; and,
  • substantial reduction in the percentages of respondents who score at the ceiling and floor.
The elimination of one of the six response choices (“a good bit of the time”) from the MH and VT items was based on results from studies using the Thurstone Method of Equal-Appearing Intervals (Thurstone, 1929). Specifically, this response category was not consistently ordered in between the “Most of the time” and “Some of the time” response categories, as hypothesized, in studies of translations of v1 of the SF-36 Health Survey (Keller, Ware, Gandek et al., 1998). Eliminating the “A good bit of the time” response category simplified the format of the form with little or no loss of information. Subsequent studies using IRT also support this decision.

Scoring and Norms

Another major advantage of the SF-12v2 form over the original v1 form is the provision for estimating the eight-domain profile of scales, in addition to the PCS-12 and MCS-12 summary measures. As documented in this manual, NBS algorithms have been developed for all eight scales using the same standardization (mean=50, SD=10 in the 1998 general U.S. population) that has made scales and summary measures easier to interpret for the SF-36 Health Survey. The on-line scoring software, now available for SF-12v2, also incorporates QualityMetric Incorporated’s MDE algorithms which reduce the bias in estimates of missing responses, and makes it possible to compute scale and summary scores for many respondents who would have otherwise been lost due to missing data.

Comparability of Results

While adopting improvements in the SF-12, we also sought to create a link so that v1 and v2 scores would be directly comparable. With this goal in mind, cross-sectional and longitudinal norms for the general population were estimated for both v1 and v2 versions using NBS for all eight scales and for the PCS-12 and MCS-12 summary measures.

To assure that results (e.g., group means) for summary measures based on v1 and v2 forms can be directly compared, conversion formulas have been developed using data from administrations of both forms in general population surveys. These formulas allow users to take advantage of the improvements achieved with v2 while retaining the option of comparing results with those published for the original v1 form. NBS algorithms and data from the 1998 norming studies make these comparisons possible.

Acute (1-week recall) Form

The standard 4-week recall period was adopted for the SF-36 and SF-12 Health Surveys to maintain comparability with the long-form Medical Outcome Study (MOS) measures from which it was derived. The 4-week recall period was adopted for the MOS long form measures because it was thought that the previous four weeks would capture a more representative and reproducible sample of recent health, not unduly affected by daily or momentary fluctuations (Fowler, 1984; Stewart and Ware, 1992). However, there are many instances in which a 4-week recall period is not appropriate, particularly in studies that require relatively short intervals between follow-up assessments because changes in health status occur more rapidly.

The acute form of the SF-12 was designed for applications in which health status would be measured weekly or biweekly. It was created by changing the recall period for six SF-12 scales (Role Physical, Bodily Pain, Vitality, Social Functioning, Role Emotional, and Mental Health) from “the past four weeks” to “the past week”. For example, the question, “During the past four weeks, how much of the time has your physical health or emotional problems interfered with your social activities (like visiting friends, relatives, etc.)?” was changed to “During the past week, how much of the time has your physical health or emotional problems interfered with your social activities (like visiting friends, relatives, etc.)?”. Two SF-12 scales, Physical Functioning and General Health, do not have a recall period, so are identical across acute and standard forms.

The rationale behind the use of the 1-week recall period for the SF-12 was that shorter recall periods would be more sensitive than longer recall periods to recent changes in health status. However, support for this hypothesis and the magnitude of any increase in sensitivity with a shorter recall period had not been previously demonstrated. A study was conducted to compare responses to SF-36 items with 1-week and 4-week recall periods in a sample of patients with asthma (Keller, Bayliss, Ware et al., 1997). Because the SF-12 is a subset of items from the SF-36, results may be interpreted in context. This study found that recall period had:

  • no impact in tests of the assumptions underlying the construction and scoring of the SF-36 scales;
  • no impact on the structure of the physical and mental components underlying the SF-36 standard form;
  • an effect on mean scale scores, in particular, the RP, RE and SF scales; and,
  • an impact on the relationship between changes in scale scores and changes in disease state.
The results of the study support the use of the same algorithms for scoring the standard and acute forms. In addition, the finding that the physical and mental health components underlying the standard form replicated with the acute form assured that the acute SF-36 scales have the same factor content and interpretation regardless of whether respondents are asked to consider the previous week or the previous month.

The study found that answers to questions with a 1-week recall period tended to be more responsive to recent changes in disease state as defined by several clinical criteria of asthma. Changes in SF-36 scale scores with the 1-week recall period were generally more highly related to 1-week changes in asthma severity, as hypothesized, however, these differences need to be investigated further. This study was not designed to specifically compare the responsiveness of 1-week and 4-week recall periods. Further, the clinical criteria used to define changes in the disease-state had a 1-week recall period, coinciding with the recall period of the acute form. To study the effect of recall period on responsiveness, both 1-week and 4-week intervals between data collections for scales and clinical variables need to be compared.

Lastly, the results of the study showed higher mean scores for the acute version scales compared to the standard form scales. The higher mean scores from the acute form may have been due to a lower prevalence of negative events during the shorter recall period. As with the SF-36, the potential difference in mean scores by form has implications for norm-based interpretation of SF-12 scale scores. For this reason, we developed distinct NBS algorithms for the acute and standard forms.

Home |  I Want SF |  SF Surveys |  Registration |  News |  Events |  FAQs
Research |  Discussion |  Feedback |  Privacy |  Terms of Use |  Copyright |  Disclaimer

SF-36® is a registered trademark of the Medical Outcomes Trust

QualityMetric Incorporated   International Quality of Life Assessment Project