Methodology
The 2009 Report Card, which was only partially based on the 2006 Report Card, was developed using a Modified Delphi Technique. The process comprised the steps described below.
1) Assembling the Report Card Task Force and Work Groups:
The American College of Emergency Physicians (ACEP) assembled a Report Card Task Force (RCTF) in October 2006 to oversee the development of the 2009 Report Card. The selection of the Task Force members was based on recommendations made by ACEP staff and leaders in September 2006 about ACEP members who had expertise in subject areas or specific issues directly related to the Report Card. Research experience and geographic location (to ensure most regions across the country were represented) were also important, but to a lesser degree than topical expertise. Based on these criteria, Task Force members were chosen and appointed by the ACEP president. The RCTF was charged with:
- Selection and oversight of a contractor to conduct the data collection, analysis, writing, and design of the Report Card,
- Providing expert advice and guidance on the selection and definition of indicators that accurately reflect the subject-matter categories being considered,
- Providing guidance on weighting the indicators and creating grades, and
- Carefully reviewing all drafts.
One of the initial tasks of the RCTF was to identify and confirm critical topic areas. Although substantial groundwork had been previously laid through the efforts related to the 2006 Report Card, it was necessary to confirm the specific content areas, consider other potential content areas, and eventually develop quantitative weights for each topic category.
At the Task Force’s May 2007 meeting, the full RCTF ultimately decided to keep the four categories used in the 2006 Report Card - Access to Emergency Care, Quality and Patient Safety Environment, Public Health and Injury Prevention, and Medical Liability Environment, and to add a fifth category – Disaster Preparedness. At that point, members were asked to volunteer to chair work groups on each of these five topical areas based on their particular areas of expertise. One of these work group leaders, Dr. William Jermyn of the Disaster Preparedness group, passed away in the spring of 2008 and was replaced based on the recommendation of the Task Force Chair.
In order to accomplish the tasks described above, the full RCTF met in person four times and by conference call six times between its inception in October 2006 and the completion of the Report Card. In addition, there was frequent and timely communication via telephone and e-mail among the RCTF, the work groups, and the contractor during this period.
For many of the deliberations described in the following sections, the contractor worked directly with the work group leaders when making decisions specific to their subject areas (e.g. adding or removing an indicator, weighting the individual indicators). When necessary the contractor and/or work group leader would consult with the Task Force Chair or the full RCTF. These exchanges between the contractor and the work group lead typically took place via e-mail or phone. For more complicated decisions, such as grading methodology, the Task Force either discussed the issue during in-person meetings or via the telephone until everyone in the group generally agreed to one method. In some cases, the Task Force was polled via e-mail about a specific issue or concern (e.g. headings for the state narrative sections) and the majority ruled. Some specifics around the timing and frequency of communication are included in the sections to follow.
2) Selecting Specific Indicators
Each work group leader was responsible for proposing a list of potential indicators in their category based on subject matter knowledge and potential availability of data – some on the list were repeats from 2006 and others were thoughtful additions. The selected contractor, Altarum Institute, contributed background research on the feasibility of measuring each indicator consistently on the state level, and conducted conference calls with each work group leader to discuss their findings. Based on this information, the work groups reconsidered the indicators, modified definitions when necessary, and proposed new indicators when data were unavailable. The draft sets of indicators were then presented to the full RCTF at an in-person meeting in October 2007 and discussed further during two conference calls in the winter of 2007-2008. Changes suggested by the RCTF were discussed further with the leaders of each of the work groups via follow-up telephone calls or e-mails.
The selection of indicators ultimately depended not only on their relative importance as determined by the RCTF, in consultation with ACEP Sections, Committees, and topic area experts, but also on the availability of data. Therefore, for data element inclusion, they needed to be: 1) relevant, 2) reliable, 3) valid, 4) consistent across the states, and 5) current (collected within the past 3 years).
Some indicators, such as many for the Public Health and Injury Prevention and the Quality and Patient Safety sections, lent themselves to the collection of consistent, publicly available data. However, similar statistics were not available for Disaster Preparedness indicators. Therefore, a survey, based primarily on data elements proposed by the Disaster Preparedness subcommittee, was developed to collect selected indicators directly from state officials. The survey was sent to the Assistant Secretary for Preparedness and Response (ASPR) coordinators in each state, the District of Columbia, and Puerto Rico with the understanding that they would be able to answer some key questions about disaster preparedness, and also identify other appropriate state officials who could answer the remaining questions that pertained to other categories. The survey included 29 questions, but not every survey question was included in the Report Card as an indicator, primarily because of incomplete responses. Additional data was collected in cooperation with the State and Territorial Injury Prevention Directors Association (STIPDA). STIPDA surveyed injury prevention directors in each state and the District of Columbia on a variety of injury prevention issues. Working with the RCTF, STIPDA agreed to include survey questions related to funding for injury prevention programs, the results of which were included as indicators in the Public Health and Injury Prevention category. The results of the STIPDA survey were scheduled for public release in October 2008.
Overall, the RCTF developed and/or defined 116 indicators.
3) Assigning Weights
The full RCTF met in person in October 2007 and, based on a voting process, determined the weight to be attributed to each of the five broad categories.
The weights were meant to reflect the extent to which the categories influence or impact the current state of emergency medicine. Moreover, the addition of the new Disaster Preparedness category required that the weights of the other four topic areas be reevaluated. After careful consideration, the importance of the categories was distributed as follows:
- Public Health and Injury Prevention – 15 percent
- Medical Liability Environment – 20 percent
- Access to Emergency Care – 30 percent
- Quality and Patient Safety – 20 percent
- Disaster Preparedness – 15 percent
Once these category weights were determined and the set of indicators was finalized, the work groups addressed the issue of importance of the individual indicators. Each of the five broad categories was divided into sub-categories; weights were assigned to each sub-category and then the total weight in each sub-category was distributed among the indicators within that category. Again, these decisions were made during conference calls between the work group leaders and Altarum Institute. The draft weights assigned by the work groups were then presented to the full RCTF during two conference calls in the winter of 2007-2008. Once finalized, these individual weights were used to score and grade the states within each category. If the work group leaders needed to be consulted beyond these two planned conference calls, the contractor typically communicated with them via e-mail and telephone.
In addition to approving the indicator sets and weights proposed by the work groups, the RCTF was also responsible for deciding how to score and grade the states. At the in-person meeting in May 2008, Altarum Institute presented several grading methodologies to the full RCTF. After much deliberation, the Task Force came to agreement on the methodology to employ for the Report Card. This methodology is described in the following sections.
4) Comparing and Scoring States
The indicator weights added up to a total of 100 points for each of the categories. The percentage of available points scored by each state was calculated by comparing the states on each indicator, assigning them a fraction of the indicator’s weight, and summing these values. The scoring convention used was largely dependent on the three types of data elements included in the report card: binary, categorical, and continuous. This scoring convention is described below:
- For the continuous indicators, the states were ranked against each other. Fractional ranks were used to apportion each state a fraction of the indicator’s weight. For example, for an indicator weighted as 5 percent of the category, a state that ranked 25th out of 51 (50 states plus the District of Columbia) would receive 2.45 points (25/51 x 5). In the case of a tie, each state was assigned the highest rank among the tied states. In other words, if the 24th, 25th, and 26th states in the ordered list were tied, each would be assigned a rank of 26 and allotted an identical number of points.
- For categorical indicators, states were not ranked against one another but rather assigned a fraction of the total possible points scored. For example, for an indicator worth 5 percent of the category, a state that scored 8 out of a possible 8 points would receive 5 points (8/8 x 5).
- For binary responses, the state received either the full weight or none of the weight.
In addition, missing data were handled in one of two ways depending on the data source. For data that were collected from publicly available sources (including data collected from the STIPDA survey), missing data did not count against the states. The RCTF believed this would place too great an emphasis on missing data that may have been the result of inadequate data collection efforts (not the fault of the states). The end result of counting this data against the states would be to inappropriately skew the ranks and grades. For this reason, not all states had the same denominator. If a state was missing data from a publicly available data source on a particular indicator, the weight for the criterion was excluded from its denominator (the total possible points they could score). For example, if a state was missing data on an indicator worth 5 percent of the category, then its denominator would be 95, not 100.
On the other hand, missing data on data elements that were collected from the ACEP state survey did count against the states in most cases. If a state health official did not provide a response to a survey question that was answered by the vast majority of other state respondents (after multiple requests by e-mail and telephone), the weight for that indicator was still included in the state’s denominator and the numerator was equivalent to a zero. The RCTF felt that responses to or data for these questions should be available, tracked, or known by the state or a state health official. For this reason, the Task Force felt that such responses should be counted against the non-responding state. However, if a state specifically told us that they were not allowed to release particular data items because of state law, we did not count this against them.
Once the percentage of points scored was calculated for each of the states, these values were ranked. The state rankings for each of the categories can be found on the state pages adjacent to their grades.
5) Assigning Grades Using a Modified Curve
State level grades
This section describes the methodology used to calculate category-specific and overall grades at the state level. The basis for all calculations related to category-specific grades is the percentage of points that states scored for each category. Again, the denominators vary some by state based on the missing data.
Category-specific
Overall, the category-specific grades were based on the number of standard deviations each state’s score fell from the maximum values. The cut points selected for the number of standard deviations from the maximum are fairly straightforward. We considered the number of required cut points (total of 13, A+, A, A-, B+, B, B-, C+, C, C-, D+, D, D-, F) and decided that, given the spread of the data, using increments of 0.25 standard deviations was the best approach. In the final version of the Report Card, the ‘A+’ and ‘A’ categories were collapsed into one group and presented as a straight ‘A.’ The RCTF wished to keep the convention of grading similar to that used in high schools and universities, many of which do not award an ‘A+’ and treat a 4.0 grade point average as an A. Moreover, the RCTF felt that presenting a state grade as an ‘A+’ would give the impression that there is little room for improvement when, in fact, that was rarely the case.
Below is a step-by-step description of how the category-specific grades were calculated.
Step #1 – Using the percentage of points scored for each category, the maximum value and the standard deviation were calculated based on the mean.
Step #2 – The letter grades, including pluses and minuses, were calculated based on the number of standard deviations that each state’s score fell from the maximum value. For example:
Range for an A: within 0.5 standard deviations from the maximum (because A+ and A were collapsed into one group)
Range for an A-: between 0.5 and 0.75 standard deviations from the maximum
Range for a B+: between 0.75 and 1.0 standard deviations from the maximum
Ultimately this methodology was selected not only because it is straightforward and easy to explain, but because it is a method that can easily be repeated in subsequent versions of the Report Card.
Overall
The method employed to calculate overall state grades for the Report Card is similar to the method used by most high schools and universities to assign students an overall grade (and rank): by averaging across their subject area grade point averages. For the Report Card, the category grades were multiplied by their relative weights (contribution to the overall grade), as previously described, and then summed. Essentially, each state’s overall grade is a weighted average of its category grades.
National level grades
Category-specific
The national level grades are based on population-weighted averages for each of the categories. The steps taken to determine national level grades are described below:
Step #1 – Multiply each state’s percent of points scored for each category by the percentage of the U.S. population that resides in the state.
Step #2 – Calculate the national average (average population-weighted percentage of points scored) for each category.
Step #3 – Calculate how many standard deviations this national average fell from the maximum state value in each category, and determine to which letter grade that corresponds based on the methodology described above for the states.
Overall
The overall grade for the nation was calculated using the same methodology described above for the overall state grades. The overall grade for the nation is a weighted average of the nation’s category-specific grades.