- Installation
- Your first run and control charts
- Understanding variation
- Testing for special cause variation
- Choosing the right control chart
- Case 1: Patient harm
- Case 2: Clostridium difficile infections
- Case 3: Coronary artery bypass graft
- Case 4: Faceting hospital infections by hospital and infection
- Case 5: Prime charts for count data with very large sample sizes
- Conclusion
- References
- Appendix 1: Calculating 3-sigma limits
- Appendix 2: Critical values for longest run and number of crossings

The `qicharts2`

package contains two main functions for analysis and visualisation of data for continuous quality improvement: `qic()`

and `paretochart()`

.

`qic()`

provides a simple interface for constructing statistical process control (SPC) charts for time series data.

`paretochart()`

constructs Pareto charts from categorical variables.

Statistical Process Control is not about statistics, it is not about “process-hyphen-control”, and it is not about conformance to specifications. […] It is about the continual improvement of processes and outcomes. And it is, first and foremost,

a way of thinkingwith some tools attached.Donald J. Wheeler. Understanding Variation, 2. ed., p 152

This vignette will teach you the easy part of SPC, *the tools*, as implemented by `qicharts2`

for R. I strongly recommend that you also study *the way of thinking*, for example by reading Wheeler’s excellent book.

Install stable version from CRAN:

`install.packages('qicharts2')`

Or install development version from GitHub:

`devtools::install_github('anhoej/qicharts2', build_vignettes = TRUE)`

Load `qicharts2`

and generate some random numbers to play with:

```
# Load qicharts2
library(qicharts2)
# Lock random number generator to make examples reproducible.
set.seed(19)
# Generate 24 random numbers from a normal distribution.
y <- rnorm(24)
```

Make a run chart:

`qic(y)`

Make an I control chart:

`qic(y, chart = 'i')`

Run charts and control charts are point-and-line graphs showing measures or counts over time. A horizontal line represents the process centre expressed by either the median (run charts) or the mean (control charts). In addition, control charts visualise the limits of the natural variation inherent in the process. Because of the way these limits are computed, they are often referred to as 3-sigma limits (see Appendix 1). Other terms for 3-sigma limits are control limits and natural process limits.

SPC is the application of statistical thinking and statistical tools to continuous quality improvement. Central to SPC is the understanding of process variation over time.

The purpose of analysing process data is to know when a change occurs so that we can take appropriate action. However, numbers may change even if the process stays the same (and vice versa). So how do we distinguish changes in numbers that represent change of the underlying process from those that are essentially noise?

Walther A. Shewhart, who founded SPC, described two types of variation, *chance cause variation* and *assignable cause variation*. Today, the terms *common cause* and *special cause* variation are commonly used.

**Common cause variation**

is also called random variation or noise,

is present in all processes,

is caused by phenomena that are always present within the system,

makes the process predictable within limits.

**Special cause variation**

is also called non-random variation or signal,

is present in some processes,

is caused by phenomena that are not normally present in the system,

makes the process unpredictable.

A process will be said to be predictabe when, through the use of past experience, we can describe, at least within limits, how the process will behave in the future.

Donald J. Wheeler. Understanding Variation, 2. ed., p 153

The overall purpose of SPC charts is to tell the two types of variation apart.

As expected from random numbers, the previous run and control charts exhibit common cause variation. Special cause variation, if present, may be found by statistical tests to identify patterns in data that are unlikely to be the result of pure chance.

Shewhart devised one test to identify special cause variation, the 3-sigma rule, which signals if one or more data points fall outside the 3-sigma limits. The 3-sigma rule is effective in detecting larger (> 2SD) shifts in data.

```
# Simulate a special cause to y in the form of a large, transient shift of 4 standard deviations.
y[22] <- 4
qic(y, chart = 'i')
```

However, minor to moderate shifts may go unnoticed by the 3-sigma rule for long periods of time.

```
# Simulate a special cause to y in the form of a moderate, persistent shift of 2 standard deviations.
y[13:24] <- rnorm(12, mean = 2)
qic(y, chart = 'i')
```

Therefore, many additional tests have been proposed to increase the sensitivity to non-random variation.

It is important to realise that while it may be tempting to apply more tests in order to increase the sensitivity of the chart, more test will increase the risk of false signals thereby reducing the specificity of the chart. The choice of which and how many tests to apply is therefore critical.

The Best known tests for non-random variation are probably the Western Electric Rules described in the Statistical Quality Control Handbook (Western Electric Company 1956). The WE rules consist of four simple tests that can be applied to control charts by visual inspection and are based on the identification of unusual patterns in the distribution of data points relative to the control and centre lines.

One or more points outside of the 3-sigma limits (Shewhart’s original 3-sigma rule).

Two out of three successive points beyond a 2-sigma limit (two thirds of the distance between the centre line and the control line).

Four out of five successive points beyond a 1-sigma limit.

A run of eight successive points on one side of the centre line.

The WE rules have proven their worth during half a century. One thing to notice is that the WE rules are most effective with control charts that have between 20 and 30 data points. With fewer data points, they loose sensitivity (more false negatives), and with more data points they loose specificity (more false positives).

The least known tests for non-random variation are probably the Anhoej rules proposed and validated by me in two recent publications (Anhøj 2014, Anhøj 2015). The Anhoej rules are the core of `qicharts2`

and consist of two tests that are based solely on the distribution of data points in relation to the centre line:

**Unusually long runs**: A run is one or more consecutive data points on the same side of the centre line. Data points that fall on the centre line do neither break nor contribute to the run. The upper 95% prediction limit for longest run is approximately \(log_2(n)+3\) (rounded to the nearest integer), where \(n\) is the number of useful data points. For example, in a run chart with 24 data points a run of*more*than`round(log2(24) + 3)`

= 8 would suggest a shift in the process.**Unusually few crossings**: A crossing is when two consecutive data points are on opposite sides of the centre line. In a random process, the number of crossings has a binomial distribution, \(b(n-1,0.5)\). The lower 5% prediction limit for number of crossings is found using`qbinom(p = 0.05, size = n - 1, prob = 0.5)`

. Thus, in a run chart with 24 useful data points,*fewer*than`qbinom(0.05, 24 - 1, 0.5)`

= 8 crossings would suggest that the process is shifting.

Critical values for longest runs and number of crossings for 10 – 100 data points are tabulated in Appendix 2.

Apart from being comparable in sensitivity and specificity to WE rules 2-4 with 20-30 data points, the Anhøj rules have some advantages:

They do not depend on sigma limits and are useful as stand-alone rules with run charts.

They adapt dynamically to the number of available data points and can be applied to charts with as few as 10 and up to indefinitely many data points without loosing sensitivity and specificity.

In the previous control chart, a shift of 2 standard deviations was introduced halfway through the process. While the shift is too small to be picked up by the 3-sigma rule, both Anhoej rules detect the signal. The longest run include 13 data points (upper limit = 8), and the number of crossings is 4 (lower limit = 8). Note that the shift is also picked up by WE rule 4.

Non-random variation found by the Anhoej rules is visualised by `qic()`

using a red and dashed centre line.

The parameters of the analysis are available using the `summary()`

function on a qic object. See `?summary.qic`

for details.

```
# Save qic object to o
o <- qic(y, chart = 'i')
# Print summary data frame
summary(o)
#> facet1 facet2 part n.obs n.useful longest.run longest.run.max
#> 1 1 1 1 24 24 13 8
#> n.crossings n.crossings.min runs.signal aLCL CL aUCL
#> 1 4 8 1 -2.114884 1.057091 4.229067
#> sigma.signal
#> 1 0
```

In summary, many additional test for special cause variation have been proposed since Shewhart invented the 3-sigma rule, and many tests are currently being used. The more tests that are applied to the analysis the higher the risk of false signals. `qicharts2`

takes a conservative approach by applying only three tests, the 3-sigma rule and the two Anhoej rules. That means that a signal from `qic()`

most likely represents a significant change in the underlying process.

By default, `qic()`

produces run charts. To make a control chart, the `chart`

argument must be specified.

While there is only one type of run charts, there are many types of control charts. Each type use an assumed probability model to compute the 3-sigma limits.

`qic()`

includes eleven control charts.

Chart | Description | Assumed distribution |
---|---|---|

Charts for measures |
||

I | Individual measurements | Gaussian |

MR | Moving ranges of individual measurements | Gaussian |

Xbar | Sample means | Gaussian |

S | Sample standard deviations | Gaussian |

T | Time between rare events | Exponential |

Charts for counts |
||

C | Defect counts | Poisson |

U | Defect rates | Poisson |

U’ | U chart using Laney’s correction for large sample sizes | Mixed |

P | Proportion of defectives | Binomial |

P’ | P chart using Laney’s correction for large sample sizes | Mixed |

G | Opportunities between rare events | Geometric |

I recommend to always begin any analysis with the “assumption free” run chart. If the Anhøj rules find non-random variation, it makes no sense to compute 3-sigma limits based on assumptions regarding the location and dispersion of data because these figures are meaningless when one or more shifts have already been identified.

Over the years, I have developed an increasing affection for the much-neglected run chart: a time plot of your process data with the median drawn in as a reference (yes, the median – not the average). It is “filter No. 1” for any process data and answers the question: “Did this process have at least one shift during this time period?” (This is generally signaled by a clump of eight consecutive data points either all above or below the median.)

If it did, then it makes no sense to do a control chart at this time because the overall average of all these data doesn’t exist. (Sort of like: If I put my right foot in a bucket of boiling water and my left foot in a bucket of ice water, on average, I’m pretty comfortable.)

Davis Balestracci. Quality Digest 2014

The gtt dataset contains data on patient harm during hospitalisation. Harm is defined as any unintended physical injury resulting from treatment or care during hospitalisation and not a result of the disease itself. The Global Trigger Tool is a methodology for detection of adverse events in hospital patients using a retrospective record review approach. Each month a random sample of 20 patient records is reviewed and the number of adverse events and patient days found in the sample are counted. See `?gtt`

for details.

```
head(gtt)
#> admission_id admission_dte discharge_dte month days harms
#> 1 1 2010-01-02 2010-01-03 2010-01-01 2 0
#> 2 2 2009-12-24 2010-01-06 2010-01-01 14 2
#> 3 3 2009-12-28 2010-01-07 2010-01-01 11 0
#> 4 4 2010-01-05 2010-01-08 2010-01-01 4 0
#> 5 5 2010-01-05 2010-01-08 2010-01-01 4 0
#> 6 6 2010-01-05 2010-01-11 2010-01-01 7 0
#> E F G H I
#> 1 <NA> <NA> <NA> <NA> <NA>
#> 2 Pressure ulcer Gastrointestinal <NA> <NA> <NA>
#> 3 <NA> <NA> <NA> <NA> <NA>
#> 4 <NA> <NA> <NA> <NA> <NA>
#> 5 <NA> <NA> <NA> <NA> <NA>
#> 6 <NA> <NA> <NA> <NA> <NA>
```

```
qic(month, harms, days,
data = gtt,
multiply = 1000,
title = 'Patient harm',
ylab = 'Adverse events per 1000 patient days',
xlab = 'Month')
```

Since only random variation is present in the run chart, a U control chart for harm rate is appropriate for computing the limits of the common cause variation.

```
qic(month, harms, days,
data = gtt,
chart = 'u',
multiply = 1000,
title = 'Patient harm (U chart)',
ylab = 'Adverse events per 1000 patient days',
xlab = 'Month')
```

Because the sample sizes (number of patient days) are not the same each month, the 3-sigma limits vary. Large samples give narrow limits and vice versa. Varying limits are also seen in P and Xbar charts.

To plot the percentage of harmed patients (patients with 1 or more harms) we need to add a variable and to aggregate data by month.

```
suppressPackageStartupMessages(library(dplyr))
gtt_by_month <- gtt %>%
mutate(harmed = harms > 0) %>%
group_by(month) %>%
summarise(harms = sum(harms),
days = sum(days),
n.harmed = sum(harmed),
n = n())
head(gtt_by_month)
#> # A tibble: 6 x 5
#> month harms days n.harmed n
#> <date> <dbl> <dbl> <int> <int>
#> 1 2010-01-01 5. 113. 3 20
#> 2 2010-02-01 7. 132. 6 20
#> 3 2010-03-01 5. 121. 4 20
#> 4 2010-04-01 7. 134. 5 20
#> 5 2010-05-01 7. 116. 7 20
#> 6 2010-06-01 5. 131. 5 20
```

```
qic(month, n.harmed,
n = n,
data = gtt_by_month,
y.percent = TRUE,
title = 'Harmed patients',
ylab = NULL,
xlab = 'Month')
```

Again, since the run chart finds random variation, a control chart may be applied. For proportion (or percent) harmed patients we use a P chart.

```
qic(month, n.harmed, n,
data = gtt_by_month,
chart = 'p',
y.percent = TRUE,
title = 'Harmed patients (P chart)',
ylab = NULL,
xlab = 'Month')
```

Since both the harm rate and the proportion of harmed patients exhibit common cause variation only, we conclude that the risk of adverse events during hospitalisation is predictable and that we in the future should expect around 60 adverse events per 1000 patient days harming around 33% of patients.

Actually, we often do not need to aggregate data in advance. With a little creativity we can make `qic`

do the hard work. The previous plot could have been produced with this code. Note the use of the original gtt dataset and the number of harmed and total number of patients created on the fly.

```
qic(month,
harms > 0, # TRUE if patient had one or more harms
days > 0, # Always TRUE
data = gtt,
chart = 'p',
y.percent = TRUE,
title = 'Harmed patients (P chart)',
ylab = NULL,
xlab = 'Month')
```

The Pareto chart, named after Vilfred Pareto, was invented by Joseph M. Juran as a tool to identify the most important causes of a problem.

Types and severity of adverse events in gtt are organised in an untidy wide format. Since `paretochart()`

needs a vector of categorical data we need to rearrange harm severity and harm category into separate columns.

```
suppressPackageStartupMessages(library(tidyr))
gtt_ae_types <- gtt %>%
gather(severity, category, E:I) %>%
filter(complete.cases(.))
head(gtt_ae_types)
#> admission_id admission_dte discharge_dte month days harms severity
#> 1 2 2009-12-24 2010-01-06 2010-01-01 14 2 E
#> 2 11 2010-01-08 2010-01-18 2010-01-01 11 2 E
#> 3 12 2010-01-11 2010-01-19 2010-01-01 9 1 E
#> 4 29 2010-01-27 2010-02-10 2010-02-01 15 1 E
#> 5 41 2010-02-24 2010-03-01 2010-03-01 6 1 E
#> 6 45 2010-02-22 2010-03-06 2010-03-01 13 1 E
#> category
#> 1 Pressure ulcer
#> 2 Infection
#> 3 Pressure ulcer
#> 4 Pressure ulcer
#> 5 Gastrointestinal
#> 6 Gastrointestinal
```

```
paretochart(gtt_ae_types$category,
title = 'Pareto chart of harm category')
```

The bars show the number in each category, and the curve shows the cumulated percentage over categories. Almost 80% of harms come from 3 categories: gastrointestinal, infection, and procedure.

```
paretochart(gtt_ae_types$severity,
title = 'Pareto chart of harm severity')
```

Almost all harms are temporary (E-F).

The cdi dataset contains data on hospital acquired Clostridium difficile infections (CDI) before and after an intervention to reduce the risk of CDI. See `?cdi`

for details.

```
head(cdi)
#> month n days period notes
#> 1 2012-11-01 17 14768.42 pre <NA>
#> 2 2012-12-01 12 14460.33 pre <NA>
#> 3 2013-01-01 27 16176.92 pre <NA>
#> 4 2013-02-01 20 14226.50 pre <NA>
#> 5 2013-03-01 20 14816.25 pre <NA>
#> 6 2013-04-01 18 14335.29 pre <NA>
```

We begin by plotting a run chart of the number of CDIs per month.

```
qic(month, n,
notes = notes,
data = cdi,
decimals = 0,
title = 'Hospital acquired Clostridium difficile infections',
ylab = 'Count',
xlab = 'Month')
```

A downward shift in the number of CDIs is clearly visible to the naked eye and is supported by the runs analysis resulting in the centre line being dashed red. The shift seem to begin around the time of the intervention.

Even though it is not necessary in this case, we can strengthen the analysis by using the median of the before-intervention period to test for non-random variation in the after period.

```
qic(month, n,
data = cdi,
decimals = 0,
freeze = 24,
part.labels = c('Baseline', 'Intervention'),
title = 'Hospital acquired Clostridium difficile infections',
ylab = 'Count',
xlab = 'Month')
```

The median number of CDIs per month in the before period is 19. An unusually long run of 15 data points below the median proves that the CDI rate is reduced.

When a shift in the desired direction is the result of a deliberate change to the process, we may split the chart to compare the numbers before and after the intervention.

```
qic(month, n,
data = cdi,
decimals = 0,
part = 24,
title = 'Hospital acquired Clostridium difficile infections',
ylab = 'Count',
xlab = 'Month')
```

Since both periods show only random variation, a control chart may be applied to test for larger transient shifts in data and to establish the natural limits of the current process. The correct chart in this case is the C chart for number of infections.

```
qic(month, n,
data = cdi,
chart = 'c',
part = 24,
title = 'Hospital acquired Clostridium difficile infections (C chart)',
ylab = 'Count',
xlab = 'Month')
```

The split C chart shows that the CDI rate has dropped from an average of 19 to 7 per month. Furthermore, the 3-sigma limits tell that the current process is predictable and that we in the future should expect between 0 and 15 CDIs per month.

Until now we have assumed that the area of opportunity is constant, i.e. the number of patients or patient days have not changed significantly over time. When the area of opportunity varies, a U chart (U for *unequal* area of opportunity) is appropriate.

```
qic(month, n, days,
data = cdi,
chart = 'u',
part = 24,
multiply = 10000,
title = 'Hospital acquired Clostridium difficile infections (U chart)',
ylab = 'Cases per 10,000 risk days',
xlab = 'Month')
```

In fact, U charts can always be used instead of C charts. When the area of opportunity is constant the two charts will reach the same conclusion. But the raw numbers in a C chart may be easier to communicate than counts per something.

It is very important to recognise that while run and control charts can identify non-random variation, they cannot identify its causes. The analysis only tells that the CDI rate was reduced after an intervention. The causal relationship between the intervention and the shift must be established using other methods.

When splitting a chart as in the example above you make a deliberate decision based on your understanding of the process. Some software packages offer automated recalculation of limits whenever a shift is detected. I find this approach very wrong. Limits are the voice of the process and recomputing limits without understanding the context, which computers don’t, is like telling someone to shut up, when they are really trying to tell you something important.

Wheeler suggests that limits *may* be recomputed when the answers to all four of these question is “yes” (Wheeler 2010, p. 224):

Do the data display a distinctly different kind of behaviour than in the past?

Is the reason for this change in behaviour known?

Is the new process behaviour desirable?

Is it intended and expected that the new behaviour will continue?

If the answer to any of these question is “no” you should rather seek to find the assignable cause of the shift and, if need be, eliminate it.

Thus, when a deliberate change has been made to the process and the chart finds a shift attributable to this change, it may be time to recompute the limits.

The cabg dataset contains data on individual coronary artery bypass operations. See `?cabg`

for details.

```
head(cabg)
#> date age gender los death readmission
#> 1 2011-07-01 72.534 Male 8 FALSE TRUE
#> 2 2011-07-01 59.225 Male 9 FALSE FALSE
#> 3 2011-07-04 66.841 Male 15 FALSE FALSE
#> 4 2011-07-04 70.337 Male 9 FALSE FALSE
#> 5 2011-07-04 60.816 Male 6 FALSE TRUE
#> 6 2011-07-05 57.027 Male 9 FALSE FALSE
```

In this and the following cases we skip the obligatory first step of using a run chart.

First, we will make a control chart of the age of the last 100 patients. Since we are plotting individual measurements, we use an I chart.

```
qic(age,
data = tail(cabg, 100),
chart = 'i',
show.grid = TRUE,
title = 'Age of the last 100 patients (I chart)',
ylab = 'Years',
xlab = 'Patient #')
```