Pay-for-Performance: Is It Quality or Cost That Matters?

April 4, 2006

Insurance, Medicare, Pay-for-performance, P4P

Within the past year, the concept of "Pay for Performance"-or P4P-has evolved from a nascent idea with only a few pilot programs to a driving force with a life of its own. The Centers for Medicare and Medicaid Services (CMS) and most major insurers are testing the application of P4P measures in relation to both quality and cost control. But P4P initiatives raise several concerns. On what criteria should P4P be based? Do we adhere to certain standards of care and practice, and if so, who sets the standards? Will physicians be required to cover the costs of implementing P4P programs? Is it, in actuality, a system that robs Peter to pay Paul, after first deducting a service charge?Initially, P4P was conceived as a method of rewarding high-quality medical care. As an executive director of an independent practice association (IPA), I was introduced to the concept early, under a different rubric. The major HMO in my region had implemented a program in which our IPA was rewarded for meeting certain threshold criteria, such as mammography rates and eye examinations for diabetics. The incentive money went directly to the IPA, which, in turn, distributed bonuses to the participating physicians. However, these bonuses, doled out 9 months after the end of the fiscal year, were based on attendance at meetings and other nonclinical criteria. Such delayed rewards for activities unrelated to the stated goals failed to meet even minimal standards for behavioral modification among clinicians.Since the early days of experimenting with reward systems for physicians, there have been growing pressures to rein in health care expenditures. Indeed, the news media has become jam-packed with reports about runaway costs. Politicians, for their part, have no greater expertise in this area but have created all kinds of quick fixes in an attempt to respond to the concerns of their constituents. Managed care-having been criticized both for its collective failure to control costs and for its often inappropriate utilization of services-has responded with attempts to combine, if not subvert, P4P with a focus on cost control.How did this all happen so fast? It's worth looking at what led employers, insurance companies, and federal policymakers to conclude that quality is lacking and overutilization is rampant. When I looked into the issue to try to understand market forces affecting my IPA, it was surprisingly easy to find the information.STUDYING COST AND QUALITYIn a series of articles published in Health Affairs in October 2004,1-3 a group led by John E. Wennberg, MD, MPH, director of the Center for the Evaluative Clinical Sciences at Dartmouth College in Hanover, NH, described research into utilization patterns among Medicare recipients who received care at major academic centers. In comparing high-quality institutions, Wennberg's team was able to document a greater than 60% difference in total costs. For example, patients with chronic conditions consumed twice as many hospital days at Boston University Medical Center than at Yale-New Haven Medical Center. The number of hospital days and physician visits among Medicare recipients at New York University Medical Center was 3 times that of Medicare recipients receiving care at Stanford University Medical Center for the same conditions. Medicare recipients at Mount Sinai Hospital in New York City spent twice as many days in the hospital in their last 6 months of life as their cohorts at the Mayo Clinic. These differences led those paying for care to consider that up to 60% of resource utilization is unnecessary.Because physicians prefer to focus on quality, these measures are challenging to apply. The National Committee on Quality Assurance (NCQA) was formed to develop rating measures for managed care performance. The same measures are now being used to evaluate care at the level of the individual physician. The NCQA has made its survey results, describing several quality indicators, available on its Web site ( For example, their physician Heart/Stroke Recognition Program ( gives "partial credit" to physicians whose patients maintain low-density lipoprotein (LDL) levels that do not exceed 129 mg/dL. In 2004, the NCQA found that after a cardiac event, only 49% of patients had LDL levels under 100, and in 20% of the patients surveyed, the LDL level was not even checked.4 Although LDL levels were examined as a measure for heart disease prevention, a similar quality parameter could be readily adapted to patients treated for stroke and eventually used as a tool to study quality care in neurology. However, whether these and other criteria will provide a fair assessment of the care neurologists deliver remains a big question.In another article published in Health Affairs in April 2004,5 Katherine Baicker, PhD, and Amitabh Chandra, PhD, both currently adjunct assistant professors of community and family medicine at Dartmouth, suggested that there is an inverse correlation between the cost of care and quality. Regression analysis, based on Health Plan Employer Data and Information Set (HEDIS) criteria and on a state-by-state comparison, indicated that higher cost was associated with poorer performance.The Florida Healthcare Coalition ( conducted an extensive assessment of care delivered to patients with chronic conditions with the intent of identifying variations in practice patterns. Among its conclusions was that the best predictor of the pattern of care delivered was the year a physician graduated from medical school.6 The implication was that after training, physicians stop learning. Furthermore, a study commissioned by the American Health Insurance Plans concluded that 30% of health care costs were an effect of poor quality care.7COST SHIFTINGEmployers in large companies worry that health insurance costs are rising uncontrollably. Benefit managers are well aware of the published studies about it and are privy to many measures of quality and utilization. Charles M. Cutler, MD, MS, Aetna's national medical director for quality, commented in an interview that employers have implemented quality assurance programs that provide both a reduction in costs and improved quality of their products or services. Payors are demanding that the same techniques be implemented in the delivery of health care to their employees. He frequently hears from employers that "we are not getting our money's worth." The employer response to the high and uncontrolled costs of health care has been to shift the burden to employees through higher insurance contributions, the use of health care savings accounts, high-deductible plans, or even refusing to provide health insurance altogether. Payors are developing plans to respond to employers' demands. A few that attempt to reward quality and "efficiency" (lower costs) are on the market with many more expected to become available in the coming months.At the federal level, the cost of Medicare Part B (which pays for physician and other outpatient services) is controlled under a cap called the sustainable growth rate (SGR). The SGR has been in place for more than a decade. Part B is the only part of Medicare spending that is under control, kept in check by a very conservative inflation rate along with a provision for a growth resulting from new Medicare beneficiaries and new technology. The SGR essentially dictates a greater than 20% decline in Medicare reimbursements for physician services over the next 4 years.Many experts have proposed that P4P should be instituted as a method of cost control if the SGR is to be repealed. Legislators recognize that this huge cutback in Medicare payments will only lead to a widespread lack of access for patients, but they also have to deal with the unfettered growth in the entitlement program at a time of a record-breaking deficit. To date, P4P has not been included as a temporary fix, but it appears likely that it will be implemented by Medicare over the next several years.P4P already has been implemented in some programs designed to address the perceived problems in health care delivery. General Electric's Bridges to Excellence ( provides a bonus to physicians meeting specific threshold criteria and even publishes the names of higher performing physicians. The program uses new money with the goal of a return on investment. It insists that quality medical care is also cost-effective and that by encouraging good practices, the overall cost of care can be reduced. The strategy of infusing new money into the effort is unique. Most other programs can be expected to use, at best, a budget-neutral approach with redistribution of funds. Some payors, including the CMS, anticipate savings from P4P. Indeed, the American Health Insurance Plan's Web site ( touts the cost savings of P4P.The CMS has several demonstration projects that withhold 25% of the Medicare payments and are not paid until 1 year after the end of the 3-year project and only if certain thresholds are met. This delay-up to 4 years between the desired behavior and the reward-may diminish the project's effectiveness, a circumstance that I witnessed in my role at the IPA. As physicians, we expect test results and patient responses in hours to days, and the gratification we receive from caring for patients often takes place in the same time frame. A reward received months later-never mind years later-is not likely to modify how the average office-based physician delivers care.Integrated Health Care (, an insurance coalition in California that is 4 years into a P4P program, uses standard HEDIS criteria as the basis for measuring quality. It heavily emphasizes patient "experience" and information technology and does not pay its reward until well into the next year. The concept of using "patient experience" may not necessarily be bad in and of itself, but it conveys the impression that we are providing entertainment rather than medical care. The difference is that entertainment-good and bad-is very well rewarded in this country, but poor quality care as entertainment is not punished by our legal system.Reward criteria, as applied to medical performance, are constructed in various ways. Most often a threshold is used-if you meet the desired level, you get the bonus. The problem is that low-level performers have such a long way to go that they do not often try. In the tiered approach, which is an alternative method, the top 2 quartiles may get a bonus whereas the bottom 2 may be penalized. But if you do not know where you fall during the period of measurement, you may think that you are doing well, while other clinicians are performing substantially better. Once again, this discourages low-level performers from even trying to improve how they deliver care. Finally, the incentive may be based on improvement. No matter where you start, the incentive is to improve. The latter approach is the best strategy to improve care, but it requires greater effort to implement.There are, of course, incentives other than financial rewards. These include reduction of administrative burden; receipt of new hardware, software, or other performance-enhancing tools; and recognition for high-performing practices through being published in lists related to best practices. The last incentive is most flawed. Directors of hospitals, the quality performance measures of which are already subject to publication, argue that quality performance measures are full of statistical inaccuracies. When applied to individual physicians and using less data, true quality is less likely to be reflected than that contained in hospital quality performance reports.PITFALLSThe P4P initiatives present a number of potential pitfalls.Self-promotion and bias Some quality care efforts are viewed as tainted by self-interest. If, for example, a pharmaceutical company that produced a statin suggested that a lower LDL level be the new criterion, or if a cardiologist suggested that echocardiography be performed on every patient in whom a myocardial infarction occurred, would their recommendations be based on best practices or would they be motivated by self-interest? Methods of independent assessment are necessary to guard against blatantly opportunistic measures of performance.Evidence-based versus consensus criteria Evidence-based criteria are usually well founded but those based on consensus must be regarded with less confidence, since, by definition, consensus represents opinion in the absence of medical proof. Reasonable physicians may apply different criteria when the evidence is absent or conflicting, yet in these cases, it isn't reasonable to make a determination of best practices.Current limitations of data capture Information technology, in particular an electronic medical record (EMR), may be required to capture the complexity of this information. Most small offices do not have the available resources, including capital and expertise, to introduce an EMR. Until such time, they are hamstrung by inadequate measurement and reporting tools. An alternative source of quality information is claims data, but claims data ignore contributory patient negligence, such as forgetting an appointment or refusing appropriate monitoring or intervention.Statistical limitations Will the sample be large enough that it is statistically valid? If you have 1 patient with a chronic condition, you could either be 0% or 100% in compliance. While an extreme example, it points out the limitations of quality measures for many conditions, even if good, evidenced-based care is defined. For many specialists, the practice may not include sufficient numbers of patients with a designated condition for quality measures to be meaningful.Criteria measures and selection While measures for some chronic conditions may be relatively easy to develop, what about all of medicine? The Association of Health Center and Practice Administrators' databank ( has more than 40 measures for diabetes, but how do we address patients who have multiple coexisting conditions? It is not unusual to see patients with comorbidities such as hypertension, hyperlipidemia, diabetes, and vascular disease. Quality measures could potentially be in conflict, and it is unlikely that the metrics will be sufficiently sophisticated to capture the complexity of caring for such patients.The decision-making process Finally, who will decide? Nonphysicians often look at medical evidence and think that high-quality care is a simple matter of integrating evidence-based best practices. They cannot comprehend the complexity of coexisting medical problems and the reality of caring for patients who come from a variety of social situations and who represent a diversity of personalities with unique personal insights. Physicians must not only be at the table, they must dominate the decision-making process.As physicians, what should we demand? I suggest the following: (1) standardize: the metrics should be uniform across payors and others; (2) simplify: the metrics and process to capture the data should be simple; (3) science: the criteria should be based on science, not opinion or consensus; (4) accountability: the physician should only be held accountable for the aspects of care that he or she controls; (5) link: there should be a clear link between incentives and the actions of an individual physician in a timely measure, not months or years later.P4P is coming whether we like it or not. As Cutler of Aetna stated, "The age of accountability is here." The combination of perceived rampant poor-quality care and resistance to the escalation of health care spending is driving the concept forward. It has the potential to reward high-quality care. On the other hand, quality may morph into merely cost-efficient care. P4P may only reward low-cost providers. It is incumbent on all of us to monitor the evolution of P4P and to participate in the design of metrics.BRUCE SIGSBEE, MD, is a neurologist in Rockport, ME, and is a member of several American Academy of Neurology committees, including the Medical Economics and Management Committee. The author wishes to acknowledge the contributions of Orly Avitzur, MD, MBA, and Kenneth A. Vatz, MD, for their assistance in the preparation of this article.REFERENCES1. Wennberg JE, Fisher ES, Stukel TA, Sharp SM. Use of Medicare claims data to monitor provider-specific performance among patients with severe chronic illness. Health Aff (Millwood). 2004;suppl Web Exclusive:VAR5-18.2. Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ. Variations in the longitudinal efficiency of academic medical centers. Health Aff (Millwood). 2004;suppl Web Exclusive:VAR19-32.3. Baicker K, Chandra A, Skinner JS, Wennberg JE. Who you are and where you live: how race and geography affect the treatment of Medicare beneficiaries. Health Aff (Millwood). 2004;suppl Web Exclusive:VAR33-44.4. National Committee for Quality Assurance. State of Health Care Quality 2005 Industry Trends and Analysis. Available at: Accessed March 16, 2006.5. Baicker K, Chandra A. Medicare spending, the physician workforce, and beneficiaries' quality of care. Health Aff (Millwood). 2004;suppl Web Exclusive:W184-197.6. Florida Health Care Commission. Outpatient Quality Initiative. Available at: Accessed March 16, 2006.7. Price Waterhouse Coopers. The Factors Fueling Rising Healthcare Costs 2006. Available at: Accessed March 13, 2006.---Thoughts Leaders Speak Out on P4POrly Avitzur, MD, MBA, who wrote about tiered programs in last month's issue of Applied Neurology, spoke to the experts about P4P. Here's what they had to say:Neurosurgeon James R. Bean, MD, is concerned about selecting criteria that are relevant and measurable using administrative records. Bean, who is treasurer of the American Association of Neurological Surgeons, believes that the best measures-actual outcomes measures-are generally impractical for P4P programs because they are not even recorded. He said, "Process measures are a proxy for quality and can be measured, but are not necessarily relevant to actual outcomes."He noted that use of perioperative antibiotics and thromboembolism prevention were selected by the Centers for Medicare and Medicaid Services for use in surgery for the first round of P4P. "The problem," he explained, "is that these criteria are only peripherally relevant to the actual reasons for the surgery, the conduct of the procedure, or the outcome of the case-they only measure routine use of complication-preventive steps."He pointed out, "The patient may have bled to death, but the measures would indicate a high-quality surgery if antibiotics had been administered and TED hose used." He is also worried that such programs may give neurosurgeons the incentive to avoid patients who have or are at risk for complications, depriving those who need surgery most from getting care. He said, "It is difficult to rate risk, so surgeons who treat the sickest patients will appear to be the lowest quality doctors and will be punished for their efforts."National Coordinator for Health Information Technology David J. Brailer, MD, PhD, is committed to expediting adoption of electronic health records. "There is no question that P4P and health [information technology] move together. They are both ultimately focused on improving the ability to deliver high-quality care." Nevertheless, he voiced 2 reservations. He believes that current P4P programs do not provide a sufficient financial incentive to physicians. He said, "It is more of a declaration of direction, but as an economic stimulus for health [information technology] adoption, it is weak."Furthermore, he noted that programs that involve small practices are very much needed, adding that it is easy for larger practices to institute P4P measures. "I am worried about the 5- or 10-person practice that may not know how to reengineer workflow accordingly," he said.