Example: Coffee Taste Test¶
Problem Description¶
A coffee taste test was conducted at the Stat-Ease office to improve the taste of the coffee [1]. This example uses a simplified version of that experiment.
We will look at 5 input factors:
- Amount of Coffee (2.5 to 4.0 oz.)
- Grind size (8-10mm)
- Brew time (3.5 to 4.5 minutes)
- Grind Type (burr vs blade)
- Coffee beans (light vs dark)
With one output, or response, variable:
- Average overall liking (1-9)
The liking is an average of the scores of a panel of 5 office coffee drinkers.
Design Choice¶
A full factorial, that is, running all combinations of lows and highs, would take 25 = 32 taste tests. We want to add 8 center point runs to check for curvature, bringing the total number of runs up to 36. We can only do 3 per day, so as not to over-caffienate our testers, and can only do the tests on days when all 5 testers are in the office. That means the test will probably take a month or so.
We can calculate the power of a full factorial using dexpy, assuming a signal to noise ratio of 2:
import dexpy.factorial
import dexpy.power
import pandas as pd
import numpy as np
coffee_design = dexpy.factorial.build_factorial(5, 2**5)
center_points = [
[0, 0, 0, -1, -1],
[0, 0, 0, -1, 1],
[0, 0, 0, 1, -1],
[0, 0, 0, 1, 1]
]
coffee_design = coffee_design.append(pd.DataFrame(center_points * 2, columns=coffee_design.columns))
coffee_design.index = np.arange(0, len(coffee_design))
sn = 2.0
alpha = 0.05
factorial_power = dexpy.power.f_power(model, coffee_design, sn, alpha)
factorial_power.pop(0) # remove intercept
factorial_power = ['{0:.2f}%'.format(i*100) for i in factorial_power] # convert to %
Term | Power |
---|---|
amount | 99.98% |
grind_size | 99.98% |
brew_time | 99.98% |
grind_type | 99.99% |
beans | 99.99% |
This means we have a 99.97% chance of detecting a change of 2 taste rating, assuming a standard deviation of 1 taste rating for the experiment. This is high enough that we decide to run a fraction instead, and get the experiment done more quickly. We can create a 25-1 fractional factorial, which will have 16 runs, along with the 4 center points for a total of 20. As you can see the power is still quite good.
coffee_design = dexpy.factorial.build_factorial(5, 2**(5-1))
coffee_design.columns = ['amount', 'grind_size', 'brew_time', 'grind_type', 'beans']
center_points = [
[0, 0, 0, -1, -1],
[0, 0, 0, -1, 1],
[0, 0, 0, 1, -1],
[0, 0, 0, 1, 1]
]
coffee_design = coffee_design.append(pd.DataFrame(center_points * 2, columns=coffee_design.columns))
coffee_design.index = np.arange(0, len(coffee_design))
model = ' + '.join(coffee_design.columns)
sn = 2.0
alpha = 0.05
factorial_power = dexpy.power.f_power(model, coffee_design, sn, alpha)
factorial_power.pop(0) # remove intercept
factorial_power = ['{0:.2f}%'.format(i*100) for i in factorial_power] # convert to %
Term | Power |
---|---|
amount | 96.55% |
grind_size | 96.55% |
brew_time | 96.55% |
grind_type | 99.61% |
beans | 99.61% |
We can also check the power for the interaction model:
twofi_model = "(" + '+'.join(coffee_design.columns) + ")**2"
sn = 2.0
alpha = 0.05
factorial_power = dexpy.power.f_power(twofi_model, coffee_design, sn, alpha)
factorial_power.pop(0)
factorial_power = ['{0:.2f}%'.format(i*100) for i in factorial_power] # convert to %
Term | Power |
---|---|
amount | 93.67% |
grind_size | 93.67% |
brew_time | 93.67% |
grind_type | 98.91% |
beans | 98.91% |
amount:grind_size | 93.67% |
amount:brew_time | 93.67% |
amount:grind_type | 93.67% |
amount:beans | 93.67% |
grind_size:brew_time | 93.67% |
grind_size:grind_type | 93.67% |
grind_size:beans | 93.67% |
brew_time:grind_type | 93.67% |
brew_time:beans | 93.67% |
grind_type:beans | 98.91% |
Run the Experiment¶
We can build the 25-1 design using build_factorial, then appending the 8 center point runs. We actually already did this to evaluate the power, but here is the code again.
coffee_design = dexpy.factorial.build_factorial(5, 2**(5-1))
coffee_design.columns = ['amount', 'grind_size', 'brew_time', 'grind_type', 'beans']
center_points = [
[0, 0, 0, -1, -1],
[0, 0, 0, -1, 1],
[0, 0, 0, 1, -1],
[0, 0, 0, 1, 1]
]
coffee_design = coffee_design.append(pd.DataFrame(center_points * 2, columns=coffee_design.columns))
coffee_design.index = np.arange(0, len(coffee_design))
It is convenient to print out the design in actual values, rather than the coded -1 and +1 values, for when we make the coffee.
actual_lows = { 'amount' : 2.5, 'grind_size' : 8, 'brew_time': 3.5,
'grind_type': 'burr', 'beans': 'light' }
actual_highs = { 'amount' : 4, 'grind_size' : 10, 'brew_time': 4.5,
'grind_type': 'blade', 'beans': 'dark' }
actual_design = dexpy.design.coded_to_actual(coffee_design, actual_lows, actual_highs)
run | amount | grind_size | brew_time | grind_type | beans |
---|---|---|---|---|---|
0 | 2.5 | 8 | 3.5 | burr | dark |
1 | 2.5 | 8 | 3.5 | blade | light |
2 | 2.5 | 8 | 4.5 | burr | light |
3 | 2.5 | 8 | 4.5 | blade | dark |
4 | 2.5 | 10 | 3.5 | burr | light |
5 | 2.5 | 10 | 3.5 | blade | dark |
6 | 2.5 | 10 | 4.5 | burr | dark |
7 | 2.5 | 10 | 4.5 | blade | light |
8 | 4 | 8 | 3.5 | burr | light |
9 | 4 | 8 | 3.5 | blade | dark |
10 | 4 | 8 | 4.5 | burr | dark |
11 | 4 | 8 | 4.5 | blade | light |
12 | 4 | 10 | 3.5 | burr | dark |
13 | 4 | 10 | 3.5 | blade | light |
14 | 4 | 10 | 4.5 | burr | light |
15 | 4 | 10 | 4.5 | blade | dark |
16 | 3.25 | 9 | 4 | burr | light |
17 | 3.25 | 9 | 4 | burr | dark |
18 | 3.25 | 9 | 4 | blade | light |
19 | 3.25 | 9 | 4 | blade | dark |
20 | 3.25 | 9 | 4 | burr | light |
21 | 3.25 | 9 | 4 | burr | dark |
22 | 3.25 | 9 | 4 | blade | light |
23 | 3.25 | 9 | 4 | blade | dark |
All that is left is to drink 24 pots of coffee and record the results. Note that, while the tables in this example are in a sorted order, the actual experiment was run in random order. This is done to reduce the possibility of incidental variables influencing the results. For example, if the temperature in the office for the first 8 runs was cold, the testers may have rated the taste higher. Hot coffee being more pleasing in a cold environment. If the first 8 runs were the only runs where amount was at its low setting, as it is in the sorted table above, we would confound the low amount effect with the effect of the cold office, and incorrectly conclude that a lower amount of coffee is better.
run | hank | joe | neal | mike | martin | mean |
---|---|---|---|---|---|---|
0 | 4 | 4 | 4 | 5 | 5 | 4.4 |
1 | 3 | 3 | 3 | 3 | 1 | 2.6 |
2 | 2 | 3 | 3 | 2 | 2 | 2.4 |
3 | 9 | 9 | 9 | 8 | 8 | 8.6 |
4 | 1 | 3 | 1 | 2 | 1 | 1.6 |
5 | 2 | 6 | 1 | 2 | 3 | 2.8 |
6 | 7 | 6 | 6 | 8 | 9 | 7.2 |
7 | 1 | 4 | 2 | 5 | 5 | 3.4 |
8 | 5 | 7 | 8 | 6 | 8 | 6.8 |
9 | 2 | 6 | 3 | 5 | 1 | 3.4 |
10 | 2 | 5 | 4 | 5 | 3 | 3.8 |
11 | 9 | 9 | 9 | 9 | 9 | 9 |
12 | 8 | 3 | 4 | 4 | 7 | 5.2 |
13 | 4 | 6 | 2 | 4 | 2 | 3.6 |
14 | 9 | 8 | 8 | 8 | 8 | 8.2 |
15 | 7 | 6 | 5 | 8 | 9 | 7 |
16 | 7 | 6 | 4 | 5 | 5 | 5.4 |
17 | 7 | 7 | 7 | 7 | 6 | 6.8 |
18 | 7 | 2 | 2 | 4 | 3 | 3.6 |
19 | 6 | 6 | 4 | 5 | 6 | 5.4 |
20 | 7 | 4 | 4 | 3 | 6 | 4.8 |
21 | 6 | 7 | 5 | 7 | 6 | 6.2 |
22 | 5 | 4 | 3 | 6 | 4 | 4.4 |
23 | 7 | 6 | 4 | 5 | 7 | 5.8 |
We’ll store the mean for later as another column in the DataFrame.
coffee_design['taste_rating'] = [
4.4, 2.6, 2.4, 8.6, 1.6, 2.8, 7.2, 3.4,
6.8, 3.4, 3.8, 9.0, 5.2, 3.6, 8.2, 7.0,
5.4, 6.8, 3.6, 5.4, 4.8, 6.2, 4.4, 5.8
]
Fit a Model¶
The statsmodels package has an OLS fitting routine that takes a patsy formula:
from statsmodels.formula.api import ols
lm = ols("taste_rating ~" + twofi_model, data=coffee_design).fit()
print(lm.summary2())
term | Coef. | Std.Err. | t | P>|t| | [0.025 | 0.975] |
---|---|---|---|---|---|---|
Intercept | 5.1 | 0.1434 | 35.5718 | 0 | 4.7694 | 5.4306 |
amount | 0.875 | 0.1756 | 4.9831 | 0.0011 | 0.4701 | 1.2799 |
grind_size | -0.125 | 0.1756 | -0.7119 | 0.4968 | -0.5299 | 0.2799 |
brew_time | 1.2 | 0.1756 | 6.8339 | 0.0001 | 0.7951 | 1.6049 |
grind_type | -0.1333 | 0.1434 | -0.93 | 0.3796 | -0.4639 | 0.1973 |
beans | 0.45 | 0.1434 | 3.1387 | 0.0138 | 0.1194 | 0.7806 |
amount:grind_size | 0.25 | 0.1756 | 1.4237 | 0.1923 | -0.1549 | 0.6549 |
amount:brew_time | -0.075 | 0.1756 | -0.4271 | 0.6806 | -0.4799 | 0.3299 |
amount:grind_type | -0.175 | 0.1756 | -0.9966 | 0.3481 | -0.5799 | 0.2299 |
amount:beans | -1.325 | 0.1756 | -7.5458 | 0.0001 | -1.7299 | -0.9201 |
grind_size:brew_time | 0.375 | 0.1756 | 2.1356 | 0.0652 | -0.0299 | 0.7799 |
grind_size:grind_type | -0.725 | 0.1756 | -4.1288 | 0.0033 | -1.1299 | -0.3201 |
grind_size:beans | 0.375 | 0.1756 | 2.1356 | 0.0652 | -0.0299 | 0.7799 |
brew_time:grind_type | 0.75 | 0.1756 | 4.2712 | 0.0027 | 0.3451 | 1.1549 |
brew_time:beans | 0.15 | 0.1756 | 0.8542 | 0.4178 | -0.2549 | 0.5549 |
grind_type:beans | 0.0833 | 0.1434 | 0.5812 | 0.5771 | -0.2473 | 0.4139 |
We can reduce this model by keeping only terms that have a p-value below 0.05 (bolded in the table above):
# this gets the pvalues dataframe from the RegressionResults
# and slices out the rows with p < 0.05
pvalues = lm.pvalues[1:]
reduced_model = '+'.join(pvalues.loc[pvalues < 0.05].index)
# you could also just specify it by hand:
# reduced_model = "amount + brew_time + beans + amount:beans + " \
# "grind_size:grind_type + brew_time:grind_type"
lm = ols("taste_rating ~" + reduced_model, data=coffee_design).fit()
print(lm.summary2())
Coef. | Std.Err. | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 5.1 | 0.1659 | 30.7405 | 0 | 4.75 | 5.45 |
amount | 0.875 | 0.2032 | 4.3063 | 0.0005 | 0.4463 | 1.3037 |
brew_time | 1.2 | 0.2032 | 5.9058 | 0 | 0.7713 | 1.6287 |
beans | 0.45 | 0.1659 | 2.7124 | 0.0148 | 0.1 | 0.8 |
amount:beans | -1.325 | 0.2032 | -6.5209 | 0 | -1.7537 | -0.8963 |
grind_size:grind_type | -0.725 | 0.2032 | -3.5681 | 0.0024 | -1.1537 | -0.2963 |
brew_time:grind_type | 0.75 | 0.2032 | 3.6911 | 0.0018 | 0.3213 | 1.1787 |
[1] | http://www.statease.com/publications/newsletter/stat-teaser-09-16#article1 |