Quality Improvement Charts

An implementation of statistical process control charts for R

Jacob Anhøj

2017-12-16


The qicharts2 package contains two main functions for analysis and visualisation of data for continuous quality improvement: qic() and paretochart().

qic() provides a simple interface for constructing statistical process control (SPC) charts for time series data.

paretochart() constructs Pareto charts from categorical variables.

Statistical Process Control is not about statistics, it is not about “process-hyphen-control”, and it is not about conformance to specifications. […] It is about the continual improvement of processes and outcomes. And it is, first and foremost, a way of thinking with some tools attached.

Donald J. Wheeler. Understanding Variation, 2. ed., p 152

This vignette will teach you the easy part of SPC, the tools, as implemented by qicharts2 for R. I strongly recommend that you also study the way of thinking, for example by reading Wheeler’s excellent book.

Installation

Install stable version from CRAN:

install.packages('qicharts2')

Or install development version from GitHub:

devtools::install_github('anhoej/qicharts2', build_vignettes = TRUE)

Your first run and control charts

Load qicharts2 and generate some random numbers to play with:

# Load qicharts2
library(qicharts2)

# Lock random number generator to make examples reproducible.
set.seed(19)

# Generate 24 random numbers from a normal distribution.
y <- rnorm(24)

Make a run chart:

qic(y)

Make an I control chart:

qic(y, chart = 'i')

Run charts and control charts are point-and-line graphs showing measures or counts over time. A horizontal line represents the process centre expressed by either the median (run charts) or the mean (control charts). In addition, control charts visualise the limits of the natural variation inherent in the process. Because of the way these limits are computed, they are often referred to as 3-sigma limits (see Appendix 1). Other terms for 3-sigma limits are control limits and natural process limits.

Understanding variation

SPC is the application of statistical thinking and statistical tools to continuous quality improvement. Central to SPC is the understanding of process variation over time.

The purpose of analysing process data is to know when a change occurs so that we can take appropriate action. However, numbers may change even if the process stays the same (and vice versa). So how do we distinguish changes in numbers that represent change of the underlying process from those that are essentially noise?

Walther A. Shewhart, who founded SPC, described two types of variation, chance cause variation and assignable cause variation. Today, the terms common cause and special cause variation are commonly used.

Common cause variation

Special cause variation

A process will be said to be predictabe when, through the use of past experience, we can describe, at least within limits, how the process will behave in the future.

Donald J. Wheeler. Understanding Variation, 2. ed., p 153

The overall purpose of SPC charts is to tell the two types of variation apart.

Testing for special cause variation

As expected from random numbers, the previous run and control charts exhibit common cause variation. Special cause variation, if present, may be found by statistical tests to identify patterns in data that are unlikely to be the result of pure chance.

Shewhart’s 3-sigma rule

Shewhart devised one test to identify special cause variation, the 3-sigma rule, which signals if one or more data points fall outside the 3-sigma limits. The 3-sigma rule is effective in detecting larger (> 2SD) shifts in data.

# Simulate a special cause to y in the form of a large, transient shift of 4 standard deviations.
y[22] <- 4

qic(y, chart = 'i')

However, minor to moderate shifts may go unnoticed by the 3-sigma rule for long periods of time.

# Simulate a special cause to y in the form of a moderate, persistent shift of 2 standard deviations.
y[13:24] <- rnorm(12, mean = 2)

qic(y, chart = 'i')

Therefore, many additional tests have been proposed to increase the sensitivity to non-random variation.

It is important to realise that while it may be tempting to apply more tests in order to increase the sensitivity of the chart, more test will increase the risk of false signals thereby reducing the specificity of the chart. The choice of which and how many tests to apply is therefore critical.

The Western Electric rules

The Best known tests for non-random variation are probably the Western Electric Rules described in the Statistical Quality Control Handbook (Western Electric Company 1956). The WE rules consist of four simple tests that can be applied to control charts by visual inspection and are based on the identification of unusual patterns in the distribution of data points relative to the control and centre lines.

  1. One or more points outside of the 3-sigma limits (Shewhart’s original 3-sigma rule).

  2. Two out of three successive points beyond a 2-sigma limit (two thirds of the distance between the centre line and the control line).

  3. Four out of five successive points beyond a 1-sigma limit.

  4. A run of eight successive points on one side of the centre line.

The WE rules have proven their worth during half a century. One thing to notice is that the WE rules are most effective with control charts that have between 20 and 30 data points. With fewer data points, they loose sensitivity (more false negatives), and with more data points they loose specificity (more false positives).

The Anhoej rules

The least known tests for non-random variation are probably the Anhoej rules proposed and validated by me in two recent publications (Anhøj 2014, Anhøj 2015). The Anhoej rules are the core of qicharts2 and consist of two tests that are based solely on the distribution of data points in relation to the centre line:

Critical values for longest runs and number of crossings for 10 – 100 data points are tabulated in Appendix 2.

Apart from being comparable in sensitivity and specificity to WE rules 2-4 with 20-30 data points, the Anhøj rules have some advantages:

In the previous control chart, a shift of 2 standard deviations was introduced halfway through the process. While the shift is too small to be picked up by the 3-sigma rule, both Anhoej rules detect the signal. The longest run include 13 data points (upper limit = 8), and the number of crossings is 4 (lower limit = 8). Note that the shift is also picked up by WE rule 4.

Non-random variation found by the Anhoej rules is visualised by qic() using a red and dashed centre line.

The parameters of the analysis are available using the summary() function on a qic object. See ?summary.qic for details.

# Save qic object to o
o <- qic(y, chart = 'i')

# Print summary data frame
summary(o)
#>   facet1 facet2 part n.obs n.useful longest.run longest.run.max
#> 1      1      1    1    24       24          13               8
#>   n.crossings n.crossings.min runs.signal      aLCL       CL     aUCL
#> 1           4               8           1 -2.114884 1.057091 4.229067
#>   sigma.signal
#> 1            0

In summary, many additional test for special cause variation have been proposed since Shewhart invented the 3-sigma rule, and many tests are currently being used. The more tests that are applied to the analysis the higher the risk of false signals. qicharts2 takes a conservative approach by applying only three tests, the 3-sigma rule and the two Anhoej rules. That means that a signal from qic() most likely represents a significant change in the underlying process.

Choosing the right control chart

By default, qic() produces run charts. To make a control chart, the chart argument must be specified.

While there is only one type of run charts, there are many types of control charts. Each type use an assumed probability model to compute the 3-sigma limits.

qic() includes eleven control charts.

Chart Description Assumed distribution
Charts for measures
I Individual measurements Gaussian
MR Moving ranges of individual measurements Gaussian
Xbar Sample means Gaussian
S Sample standard deviations Gaussian
T Time between rare events Exponential
Charts for counts
C Defect counts Poisson
U Defect rates Poisson
U’ U chart using Laney’s correction for large sample sizes Mixed
P Proportion of defectives Binomial
P’ P chart using Laney’s correction for large sample sizes Mixed
G Opportunities between rare events Geometric

I recommend to always begin any analysis with the “assumption free” run chart. If the Anhøj rules find non-random variation, it makes no sense to compute 3-sigma limits based on assumptions regarding the location and dispersion of data because these figures are meaningless when one or more shifts have already been identified.

Over the years, I have developed an increasing affection for the much-neglected run chart: a time plot of your process data with the median drawn in as a reference (yes, the median – not the average). It is “filter No. 1” for any process data and answers the question: “Did this process have at least one shift during this time period?” (This is generally signaled by a clump of eight consecutive data points either all above or below the median.)

If it did, then it makes no sense to do a control chart at this time because the overall average of all these data doesn’t exist. (Sort of like: If I put my right foot in a bucket of boiling water and my left foot in a bucket of ice water, on average, I’m pretty comfortable.)

Davis Balestracci. Quality Digest 2014

Case 1: Patient harm

The gtt dataset contains data on patient harm during hospitalisation. Harm is defined as any unintended physical injury resulting from treatment or care during hospitalisation and not a result of the disease itself. The Global Trigger Tool is a methodology for detection of adverse events in hospital patients using a retrospective record review approach. Each month a random sample of 20 patient records is reviewed and the number of adverse events and patient days found in the sample are counted. See ?gtt for details.

head(gtt)
#>   admission_id admission_dte discharge_dte      month days harms
#> 1            1    2010-01-02    2010-01-03 2010-01-01    2     0
#> 2            2    2009-12-24    2010-01-06 2010-01-01   14     2
#> 3            3    2009-12-28    2010-01-07 2010-01-01   11     0
#> 4            4    2010-01-05    2010-01-08 2010-01-01    4     0
#> 5            5    2010-01-05    2010-01-08 2010-01-01    4     0
#> 6            6    2010-01-05    2010-01-11 2010-01-01    7     0
#>                E                F    G    H    I
#> 1           <NA>             <NA> <NA> <NA> <NA>
#> 2 Pressure ulcer Gastrointestinal <NA> <NA> <NA>
#> 3           <NA>             <NA> <NA> <NA> <NA>
#> 4           <NA>             <NA> <NA> <NA> <NA>
#> 5           <NA>             <NA> <NA> <NA> <NA>
#> 6           <NA>             <NA> <NA> <NA> <NA>

Run chart of adverse events per 1000 patient days

qic(month, harms, days,
    data     = gtt,
    multiply = 1000,
    title    = 'Patient harm',
    ylab     = 'Adverse events per 1000 patient days',
    xlab     = 'Month')

Since only random variation is present in the run chart, a U control chart for harm rate is appropriate for computing the limits of the common cause variation.

U chart of adverse events per 1000 patient days

qic(month, harms, days,
    data     = gtt,
    chart    = 'u',
    multiply = 1000,
    title    = 'Patient harm (U chart)',
    ylab     = 'Adverse events per 1000 patient days',
    xlab     = 'Month')

Because the sample sizes (number of patient days) are not the same each month, the 3-sigma limits vary. Large samples give narrow limits and vice versa. Varying limits are also seen in P and Xbar charts.

Run and P control charts of percent harmed patients

To plot the percentage of harmed patients (patients with 1 or more harms) we need to add a variable and to aggregate data by month.

suppressPackageStartupMessages(library(dplyr))

gtt_by_month <- gtt %>%
  mutate(harmed = harms > 0) %>% 
  group_by(month) %>% 
  summarise(harms    = sum(harms),
            days     = sum(days),
            n.harmed = sum(harmed),
            n        = n())

head(gtt_by_month)
#> # A tibble: 6 x 5
#>        month harms  days n.harmed     n
#>       <date> <dbl> <dbl>    <int> <int>
#> 1 2010-01-01     5   113        3    20
#> 2 2010-02-01     7   132        6    20
#> 3 2010-03-01     5   121        4    20
#> 4 2010-04-01     7   134        5    20
#> 5 2010-05-01     7   116        7    20
#> 6 2010-06-01     5   131        5    20
qic(month, n.harmed,
    n         = n,
    data      = gtt_by_month,
    y.percent = TRUE,
    title     = 'Harmed patients',
    ylab      = NULL,
    xlab      = 'Month')

Again, since the run chart finds random variation, a control chart may be applied. For proportion (or percent) harmed patients we use a P chart.

qic(month, n.harmed, n,
    data      = gtt_by_month,
    chart     = 'p',
    y.percent = TRUE,
    title     = 'Harmed patients (P chart)',
    ylab      = NULL,
    xlab      = 'Month')

Since both the harm rate and the proportion of harmed patients exhibit common cause variation only, we conclude that the risk of adverse events during hospitalisation is predictable and that we in the future should expect around 60 adverse events per 1000 patient days harming around 33% of patients.

Actually, we often do not need to aggregate data in advance. With a little creativity we can make qic do the hard work. The previous plot could have been produced with this code. Note the use of the original gtt dataset and the number of harmed and total number of patients created on the fly.

qic(month, 
    harms > 0,        # TRUE if patient had one or more harms
    days > 0,         # Always TRUE 
    data      = gtt,
    chart     = 'p',
    y.percent = TRUE,
    title     = 'Harmed patients (P chart)',
    ylab      = NULL,
    xlab      = 'Month')

Pareto analysis of adverse event types and severity

The Pareto chart, named after Vilfred Pareto, was invented by Joseph M. Juran as a tool to identify the most important causes of a problem.

Types and severity of adverse events in gtt are organised in an untidy wide format. Since paretochart() needs a vector of categorical data we need to rearrange harm severity and harm category into separate columns.

suppressPackageStartupMessages(library(tidyr))

gtt_ae_types <- gtt %>%
  gather(severity, category, E:I) %>% 
  filter(complete.cases(.))

head(gtt_ae_types)
#>   admission_id admission_dte discharge_dte      month days harms severity
#> 1            2    2009-12-24    2010-01-06 2010-01-01   14     2        E
#> 2           11    2010-01-08    2010-01-18 2010-01-01   11     2        E
#> 3           12    2010-01-11    2010-01-19 2010-01-01    9     1        E
#> 4           29    2010-01-27    2010-02-10 2010-02-01   15     1        E
#> 5           41    2010-02-24    2010-03-01 2010-03-01    6     1        E
#> 6           45    2010-02-22    2010-03-06 2010-03-01   13     1        E
#>           category
#> 1   Pressure ulcer
#> 2        Infection
#> 3   Pressure ulcer
#> 4   Pressure ulcer
#> 5 Gastrointestinal
#> 6 Gastrointestinal
paretochart(gtt_ae_types$category,
            title = 'Pareto chart of harm category')

The bars show the number in each category, and the curve shows the cumulated percentage over categories. Almost 80% of harms come from 3 categories: gastrointestinal, infection, and procedure.

paretochart(gtt_ae_types$severity,
            title = 'Pareto chart of harm severity')

Almost all harms are temporary (E-F).

Case 2: Clostridium difficile infections

The cdi dataset contains data on hospital acquired Clostridium difficile infections (CDI) before and after an intervention to reduce the risk of CDI. See ?cdi for details.

head(cdi)
#>        month  n     days period notes
#> 1 2012-11-01 17 14768.42    pre  <NA>
#> 2 2012-12-01 12 14460.33    pre  <NA>
#> 3 2013-01-01 27 16176.92    pre  <NA>
#> 4 2013-02-01 20 14226.50    pre  <NA>
#> 5 2013-03-01 20 14816.25    pre  <NA>
#> 6 2013-04-01 18 14335.29    pre  <NA>

Run charts for number of infections

We begin by plotting a run chart of the number of CDIs per month.

qic(month, n,
    notes    = notes,
    data     = cdi,
    decimals = 0,
    title    = 'Hospital acquired Clostridium difficile infections',
    ylab     = 'Count',
    xlab     = 'Month')

A downward shift in the number of CDIs is clearly visible to the naked eye and is supported by the runs analysis resulting in the centre line being dashed red. The shift seem to begin around the time of the intervention.

Even though it is not necessary in this case, we can strengthen the analysis by using the median of the before-intervention period to test for non-random variation in the after period.

qic(month, n,
    data        = cdi,
    decimals    = 0,
    freeze      = 24,
    part.labels = c('Baseline', 'Intervention'),
    title       = 'Hospital acquired Clostridium difficile infections',
    ylab        = 'Count',
    xlab        = 'Month')

The median number of CDIs per month in the before period is 19. An unusually long run of 15 data points below the median proves that the CDI rate is reduced.

When a shift in the desired direction is the result of a deliberate change to the process, we may split the chart to compare the numbers before and after the intervention.

qic(month, n,
    data     = cdi,
    decimals = 0,
    part     = 24,
    title    = 'Hospital acquired Clostridium difficile infections',
    ylab     = 'Count',
    xlab     = 'Month')

C chart for number of infections

Since both periods show only random variation, a control chart may be applied to test for larger transient shifts in data and to establish the natural limits of the current process. The correct chart in this case is the C chart for number of infections.

qic(month, n,
    data  = cdi,
    chart = 'c',
    part  = 24,
    title = 'Hospital acquired Clostridium difficile infections (C chart)',
    ylab  = 'Count',
    xlab  = 'Month')

The split C chart shows that the CDI rate has dropped from an average of 19 to 7 per month. Furthermore, the 3-sigma limits tell that the current process is predictable and that we in the future should expect between 0 and 15 CDIs per month.

U chart for infection rate

Until now we have assumed that the area of opportunity is constant, i.e. the number of patients or patient days have not changed significantly over time. When the area of opportunity varies, a U chart (U for unequal area of opportunity) is appropriate.

qic(month, n, days,
    data     = cdi,
    chart    = 'u',
    part     = 24,
    multiply = 10000, 
    title    = 'Hospital acquired Clostridium difficile infections (U chart)',
    ylab     = 'Cases per 10,000 risk days',
    xlab     = 'Month')

In fact, U charts can always be used instead of C charts. When the area of opportunity is constant the two charts will reach the same conclusion. But the raw numbers in a C chart may be easier to communicate than counts per something.

It is very important to recognise that while run and control charts can identify non-random variation, they cannot identify its causes. The analysis only tells that the CDI rate was reduced after an intervention. The causal relationship between the intervention and the shift must be established using other methods.

When should you recompute the limits?

When splitting a chart as in the example above you make a deliberate decision based on your understanding of the process. Some software packages offer automated recalculation of limits whenever a shift is detected. I find this approach very wrong. Limits are the voice of the process and recomputing limits without understanding the context, which computers don’t, is like telling someone to shut up, when they are really trying to tell you something important.

Wheeler suggests that limits may be recomputed when the answers to all four of these question is “yes” (Wheeler 2010, p. 224):

If the answer to any of these question is “no” you should rather seek to find the assignable cause of the shift and, if need be, eliminate it.

Thus, when a deliberate change has been made to the process and the chart finds a shift attributable to this change, it may be time to recompute the limits.

Case 3: Coronary artery bypass graft

The cabg dataset contains data on individual coronary artery bypass operations. See ?cabg for details.

head(cabg)
#>         date    age gender los death readmission
#> 1 2011-07-01 72.534   Male   8 FALSE        TRUE
#> 2 2011-07-01 59.225   Male   9 FALSE       FALSE
#> 3 2011-07-04 66.841   Male  15 FALSE       FALSE
#> 4 2011-07-04 70.337   Male   9 FALSE       FALSE
#> 5 2011-07-04 60.816   Male   6 FALSE        TRUE
#> 6 2011-07-05 57.027   Male   9 FALSE       FALSE

I and MR charts for age of individual patients

In this and the following cases we skip the obligatory first step of using a run chart.

First, we will make a control chart of the age of the last 100 patients. Since we are plotting individual measurements, we use an I chart.

qic(age,
    data      = tail(cabg, 100), 
    chart     = 'i',
    show.grid = TRUE,
    title     = 'Age of the last 100 patients (I chart)',
    ylab      = 'Years',
    xlab      = 'Patient #')