« PreviousHomeNext »
A Simple Example
Probably the easiest way to begin understanding factorial designs is by looking at an example. Let's imagine a design where we have an educational program where we would like to look at a variety of program variations to see which works best. For instance, we would like to vary the amount of time the children receive instruction with one group getting 1 hour of instruction per week and another getting 4 hours per week. And, we'd like to vary the setting with one group getting the instruction in-class (probably pulled off into a corner of the classroom) and the other group being pulled-out of the classroom for instruction in another room. We could think about having four separate groups to do this, but when we are varying the amount of time in instruction, what setting would we use: in-class or pull-out? And, when we were studying setting, what amount of instruction time would we use: 1 hour, 4 hours, or something else?
With factorial designs, we don't have to compromise when answering these questions. We can have it both ways if we cross each of our two time in instruction conditions with each of our two settings. Let's begin by doing some defining of terms. In factorial designs, a factor is a major independent variable. In this example we have two factors: time in instruction and setting. A level is a subdivision of a factor. In this example, time in instruction has two levels and setting has two levels. Sometimes we depict a factorial design with a numbering notation. In this example, we can say that we have a 2 x 2 (spoken "two-by-two) factorial design. In this notation, the number of numbers tells you how many factors there are and the number values tell you how many levels. If I said I had a 3 x 4 factorial design, you would know that I had 2 factors and that one factor had 3 levels while the other had 4. Order of the numbers makes no difference and we could just as easily term this a 4 x 3 factorial design. The number of different treatment groups that we have in any factorial design can easily be determined by multiplying through the number notation. For instance, in our example we have 2 x 2 = 4 groups. In our notational example, we would need 3 x 4 = 12 groups.
We can also depict a factorial design in design notation. Because of the treatment level combinations, it is useful to use subscripts on the treatment (X) symbol. We can see in the figure that there are four groups, one for each combination of levels of factors. It is also immediately apparent that the groups were randomly assigned and that this is a posttest-only design.
Now, let's look at a variety of different results we might get from this simple 2 x 2 factorial design. Each of the following figures describes a different possible outcome. And each outcome is shown in table form (the 2 x 2 table with the row and column averages) and in graphic form (with each factor taking a turn on the horizontal axis). You should convince yourself that the information in the tables agrees with the information in both of the graphs. You should also convince yourself that the pair of graphs in each figure show the exact same information graphed in two different ways. The lines that are shown in the graphs are technically not necessary -- they are used as a visual aid to enable you to easily track where the averages for a single level go across levels of another factor. Keep in mind that the values shown in the tables and graphs are group averages on the outcome variable of interest. In this example, the outcome might be a test of achievement in the subject being taught. We will assume that scores on this test range from 1 to 10 with higher values indicating greater achievement. You should study carefully the outcomes in each figure in order to understand the differences between these cases.
The Null Outcome
Let's begin by looking at the "null" case. The null case is a situation where the treatments have no effect. This figure assumes that even if we didn't give the training we could expect that students would score a 5 on average on the outcome test. You can see in this hypothetical case that all four groups score an average of 5 and therefore the row and column averages must be 5. You can't see the lines for both levels in the graphs because one line falls right on top of the other.
The Main Effects
A main effect is an outcome that is a consistent difference between levels of a factor. For instance, we would say theres a main effect for setting if we find a statistical difference between the averages for the in-class and pull-out groups, at all levels of time in instruction. The first figure depicts a main effect of time. For all settings, the 4 hour/week condition worked better than the 1 hour/week one. It is also possible to have a main effect for setting (and none for time).
In the second main effect graph we see that in-class training was better than pull-out training for all amounts of time.
Finally, it is possible to have a main effect on both variables simultaneously as depicted in the third main effect figure. In this instance 4 hours/week always works better than 1 hour/week and in-class setting always works better than pull-out.
If we could only look at main effects, factorial designs would be useful. But, because of the way we combine levels in factorial designs, they also enable us to examine the interaction effects that exist between factors. An interaction effect exists when differences on one factor depend on the level you are on another factor. It's important to recognize that an interaction is between factors, not levels. We wouldn't say there's an interaction between 4 hours/week and in-class treatment. Instead, we would say that there's an interaction between time and setting, and then we would go on to describe the specific levels involved.
How do you know if there is an interaction in a factorial design? There are three ways you can determine there's an interaction. First, when you run the statistical analysis, the statistical table will report on all main effects and interactions. Second, you know there's an interaction when can't talk about effect on one factor without mentioning the other factor. if you can say at the end of our study that time in instruction makes a difference, then you know that you have a main effect and not an interaction (because you did not have to mention the setting factor when describing the results for time). On the other hand, when you have an interaction it is impossible to describe your results accurately without mentioning both factors. Finally, you can always spot an interaction in the graphs of group means -- whenever there are lines that are not parallel there is an interaction present! If you check out the main effect graphs above, you will notice that all of the lines within a graph are parallel. In contrast, for all of the interaction graphs, you will see that the lines are not parallel.
In the first interaction effect graph, we see that one combination of levels -- 4 hours/week and in-class setting -- does better than the other three. In the second interaction we have a more complex "cross-over" interaction. Here, at 1 hour/week the pull-out group does better than the in-class group while at 4 hours/week the reverse is true. Furthermore, the both of these combinations of levels do equally well.
Factorial design has several important features. First, it has great flexibility for exploring or enhancing the signal (treatment) in our studies. Whenever we are interested in examining treatment variations, factorial designs should be strong candidates as the designs of choice. Second, factorial designs are efficient. Instead of conducting a series of independent studies we are effectively able to combine these studies into one. Finally, factorial designs are the only effective way to examine interaction effects.
So far, we have only looked at a very simple 2 x 2 factorial design structure. You may want to look at some factorial design variations to get a deeper understanding of how they work. You may also want to examine how we approach the statistical analysis of factorial experimental designs.
Statistical Analysis of Factorial Designs
« PreviousHomeNext »
Copyright ©2006, William M.K. Trochim, All Rights Reserved
Purchase a printed copy of the Research Methods Knowledge Base
Last Revised: 10/20/2006
This article is about factorial design. For factor loadings, see Factor analysis.
In statistics, a full factorial experiment is an experiment whose design consists of two or more factors, each with discrete possible values or "levels", and whose experimental units take on all possible combinations of these levels across all such factors. A full factorial design may also be called a fully crossed design. Such an experiment allows the investigator to study the effect of each factor on the response variable, as well as the effects of interactions between factors on the response variable.
For the vast majority of factorial experiments, each factor has only two levels. For example, with two factors each taking two levels, a factorial experiment would have four treatment combinations in total, and is usually called a 2×2 factorial design.
If the number of combinations in a full factorial design is too high to be logistically feasible, a fractional factorial design may be done, in which some of the possible combinations (usually at least half) are omitted.
Factorial designs were used in the 19th century by John Bennet Lawes and Joseph Henry Gilbert of the Rothamsted Experimental Station.
Ronald Fisher argued in 1926 that "complex" designs (such as factorial designs) were more efficient than studying one factor at a time.
"No aphorism is more frequently repeated in connection with field trials, than that we must ask Nature few questions, or, ideally, one question, at a time. The writer is convinced that this view is wholly mistaken."
Nature, he suggests, will best respond to "a logical and carefully thought out questionnaire". A factorial design allows the effect of several factors and even interactions between them to be determined with the same number of trials as are necessary to determine any one of the effects by itself with the same degree of accuracy.
Frank Yates made significant contributions, particularly in the analysis of designs, by the Yates analysis.
The term "factorial" may not have been used in print before 1935, when Fisher used it in his book The Design of Experiments.
Advantages of factorial experiments
Many experiments examine the effect of only a single factor or variable. Compared to such one-factor-at-a-time (OFAT) experiments, factorial experiments offer several advantages 
- Factorial designs are more efficient than OFAT experiments. They provide more information at similar or lower cost. They can find optimal conditions faster than OFAT experiments.
- Factorial designs allow additional factors to be examined at no additional cost.
- When the effect of one factor is different for different levels of another factor, it cannot be detected by an OFAT experiment design. Factorial designs are required to detect such interactions. Use of OFAT when interactions are present can lead to serious misunderstanding of how the response changes with the factors.
- Factorial designs allow the effects of a factor to be estimated at several levels of the other factors, yielding conclusions that are valid over a range of experimental conditions.
Example of advantages of factorial experiments
In his book, "Improving Almost Anything", the famous statistician George Box gives many examples of the benefits of factorial experiments. Here is one. Engineers at the bearing manufacturer SKF wanted to know if changing to a less expensive "cage" design would affect bearing life. The engineers asked Christer Hellstrand, a statistician, for help in designing the experiment.
Box reports the following. "The results were assessed by an accelerated life test. … The runs were expensive because they needed to be made on an actual production line and the experimenters were planning to make four runs with the standard cage and four with the modified cage. Christer asked if there were other factors they would like to test. They said there were, but that making added runs would exceed their budget. Christer showed them how they could test two additional factors "for free" – without increasing the number of runs and without reducing the accuracy of their estimate of the cage effect. In this arrangement, called a 2×3 factorial design, each of the three factors would be run at two levels and all the eight possible combinations included. The various combinations can conveniently be shown as the vertices of a cube ... " "In each case, the standard condition is indicated by a minus sign and the modified condition by a plus sign. The factors changed were heat treatment, outer ring osculation, and cage design. The numbers show the relative lengths of lives of the bearings. If you look at [the cube plot], you can see that the choice of cage design did not make a lot of difference. … But, if you average the pairs of numbers for cage design, you get the [table below], which shows what the two other factors did. … It led to the extraordinary discovery that, in this particular application, the life of a bearing can be increased fivefold if the two factor(s) outer ring osculation and inner ring heat treatments are increased together."
|Osculation −||Osculation +|
"Remembering that bearings like this one have been made for decades, it is at first surprising that it could take so long to discover so important an improvement. A likely explanation is that, because most engineers have, until recently, employed only one factor at a time experimentation, interaction effects have been missed."
The simplest factorial experiment contains two levels for each of two factors. Suppose an engineer wishes to study the total power used by each of two different motors, A and B, running at each of two different speeds, 2000 or 3000 RPM. The factorial experiment would consist of four experimental units: motor A at 2000 RPM, motor B at 2000 RPM, motor A at 3000 RPM, and motor B at 3000 RPM. Each combination of a single level selected from every factor is present once.
This experiment is an example of a 22 (or 2×2) factorial experiment, so named because it considers two levels (the base) for each of two factors (the power or superscript), or #levels#factors, producing 22=4 factorial points.
Designs can involve many independent variables. As a further example, the effects of three input variables can be evaluated in eight experimental conditions shown as the corners of a cube.
This can be conducted with or without replication, depending on its intended purpose and available resources. It will provide the effects of the three independent variables on the dependent variable and possible interactions.
The notation used to denote factorial experiments conveys a lot of information. When a design is denoted a 23 factorial, this identifies the number of factors (3); how many levels each factor has (2); and how many experimental conditions there are in the design (23 = 8). Similarly, a 25 design has five factors, each with two levels, and 25 = 32 experimental conditions. Factorial experiments can involve factors with different numbers of levels. A 243 design has five factors, four with two levels and one with three levels, and has 16 × 3 = 48 experimental conditions. 
To save space, the points in a two-level factorial experiment are often abbreviated with strings of plus and minus signs. The strings have as many symbols as factors, and their values dictate the level of each factor: conventionally, for the first (or low) level, and for the second (or high) level. The points in this experiment can thus be represented as , , , and .
The factorial points can also be abbreviated by (1), a, b, and ab, where the presence of a letter indicates that the specified factor is at its high (or second) level and the absence of a letter indicates that the specified factor is at its low (or first) level (for example, "a" indicates that factor A is on its high setting, while all other factors are at their low (or first) setting). (1) is used to indicate that all factors are at their lowest (or first) values.
For more than two factors, a 2k factorial experiment can usually be recursively designed from a 2k−1 factorial experiment by replicating the 2k−1 experiment, assigning the first replicate to the first (or low) level of the new factor, and the second replicate to the second (or high) level. This framework can be generalized to, e.g., designing three replicates for three level factors, etc.
A factorial experiment allows for estimation of experimental error in two ways. The experiment can be replicated, or the sparsity-of-effects principle can often be exploited. Replication is more common for small experiments and is a very reliable way of assessing experimental error. When the number of factors is large (typically more than about 5 factors, but this does vary by application), replication of the design can become operationally difficult. In these cases, it is common to only run a single replicate of the design, and to assume that factor interactions of more than a certain order (say, between three or more factors) are negligible. Under this assumption, estimates of such high order interactions are estimates of an exact zero, thus really an estimate of experimental error.
When there are many factors, many experimental runs will be necessary, even without replication. For example, experimenting with 10 factors at two levels each produces 210=1024 combinations. At some point this becomes infeasible due to high cost or insufficient resources. In this case, fractional factorial designs may be used.
As with any statistical experiment, the experimental runs in a factorial experiment should be randomized to reduce the impact that bias could have on the experimental results. In practice, this can be a large operational challenge.
Factorial experiments can be used when there are more than two levels of each factor. However, the number of experimental runs required for three-level (or more) factorial designs will be considerably greater than for their two-level counterparts. Factorial designs are therefore less attractive if a researcher wishes to consider more than two levels.
Main article: Yates analysis
A factorial experiment can be analyzed using ANOVA or regression analysis. It is relatively easy to estimate the main effect for a factor. To compute the main effect of a factor "A", subtract the average response of all experimental runs for which A was at its low (or first) level from the average response of all experimental runs for which A was at its high (or second) level.
Other useful exploratory analysis tools for factorial experiments include main effects plots, interaction plots, Pareto plots, and a normal probability plot of the estimated effects.
When the factors are continuous, two-level factorial designs assume that the effects are linear. If a quadratic effect is expected for a factor, a more complicated experiment should be used, such as a central composite design. Optimization of factors that could have quadratic effects is the primary goal of response surface methodology.
Montgomery  gives the following example of analysis of a factorial experiment:.
An engineer would like to increase the filtration rate (output) of a process to produce a chemical, and to reduce the amount of formaldehyde used in the process. Previous attempts to reduce the formaldehyde have lowered the filtration rate. The current filtration rate is 75 gallons per hour. Four factors are considered: temperature (A), pressure (B), formaldehyde concentration (C), and stirring rate (D). Each of the four factors will be tested at two levels.
Onwards, the minus (−) and plus (+) signs will indicate whether the factor is run at a low or high level, respectively.
Plot of the main effects showing the filtration rates for the low (−) and high (+) settings for each factor.
Plot of the interaction effects showing the mean filtration rate at each of the four possible combinations of levels for a given pair of factors.
The non-parallel lines in the A:C interaction plot indicate that the effect of factor A depends on the level of factor C. A similar results holds for the A:D interaction. The graphs indicate that factor B has little effect on filtration rate. The analysis of variance (ANOVA) including all 4 factors and all possible interaction terms between them yields the coefficient estimates shown in the table below.
Because there are 16 observations and 16 coefficients (intercept, main effects, and interactions), p-values cannot be calculated for this model. The coefficient values and the graphs suggest that the important factors are A, C, and D, and the interaction terms A:C and A:D.
The coefficients for A, C, and D are all positive in the ANOVA, which would suggest running the process with all three variables set to the high value. However, the main effect of each variable is the average over the levels of the other variables. The A:C interaction plot above shows that the effect of factor A depends on the level of factor C, and vice versa. Factor A (temperature) has very little effect on filtration rate when factor C is at the + level. But Factor A has a large effect on filtration rate when factor C (formaldehyde) is at the − level. The combination of A at the + level and C at the − level gives the highest filtration rate. This observation indicates how one-factor-at-a-time analyses can miss important interactions. Only by varying both factors A and C at the same time could the engineer discover that the effect of factor A depends on the level of factor C.
The best filtration rate is seen when A and D are at the high level, and C is at the low level. This result also satisfies the objective of reducing formaldehyde (factor C). Because B does not appear to be important, it can be dropped from the model. Performing the ANOVA using factors A, C, and D, and the interaction terms A:C and A:D, gives the result shown in the following table, in which all the terms are significant (p-value < 0.05).
|Coefficient||Estimate||Standard error||t value||p-value|
- Box, G. E.; Hunter, W. G.; Hunter, J. S. (2005). Statistics for Experimenters: Design, Innovation, and Discovery (2nd ed.). Wiley. ISBN 0-471-71813-0.
Pareto plot showing the relative magnitude of the factor coefficients.
Cube plot for the ANOVA using factors A, C, and D, and the interaction terms A:C and A:D. The plot aids in visualizing the result and shows that the best combination is A+, D+, and C−.
- ^Frank Yates and Kenneth Mather (1963). "Ronald Aylmer Fisher". Biographical Memoirs of Fellows of the Royal Society. 9: 91–120. doi:10.1098/rsbm.1963.0006. Archived from the original(PDF) on 2009-02-18.
- ^Ronald Fisher (1926). "The Arrangement of Field Experiments"(PDF). Journal of the Ministry of Agriculture of Great Britain. 33: 503–513.
- ^ abMontgomery, Douglas C. (2013), Design and Analysis of Experiments (8th ed.), Wiley
- ^Oehlert, Gary (2000), A First Course in Design and Analysis of Experiments (Revised ed.), W. H. Freeman
- ^George, Box (2006), Improving Almost Anything (Revised ed.), John Wiley & Sons
- ^Hellstrand, C.; Oosterhoorn, A. D.; Sherwin, D. J.; Gerson, M. (24 February 1989). "The Necessity of Modern Quality Improvement and Some Experience with its Implementation in the Manufacture of Rolling Bearings [and Discussion]". Philosophical Transactions of the Royal Society A. 327 (1596): 529–537. doi:10.1098/rsta.1989.0008.
- ^Penn State University College of Health and Human Development. "Introduction to Factorial Experimental Designs".
- ^Cohen, J (1968). "Multiple regression as a general data-analytic system". Psychological Bulletin. 70 (6): 426–443. doi:10.1037/h0026714.