• Privacy Policy

Research Method

Home » Experimental Design – Types, Methods, Guide

Experimental Design – Types, Methods, Guide

Table of Contents

Experimental design is a structured approach used to conduct scientific experiments. It enables researchers to explore cause-and-effect relationships by controlling variables and testing hypotheses. This guide explores the types of experimental designs, common methods, and best practices for planning and conducting experiments.

Experimental Research Design

Experimental Design

Experimental design refers to the process of planning a study to test a hypothesis, where variables are manipulated to observe their effects on outcomes. By carefully controlling conditions, researchers can determine whether specific factors cause changes in a dependent variable.

Key Characteristics of Experimental Design :

  • Manipulation of Variables : The researcher intentionally changes one or more independent variables.
  • Control of Extraneous Factors : Other variables are kept constant to avoid interference.
  • Randomization : Subjects are often randomly assigned to groups to reduce bias.
  • Replication : Repeating the experiment or having multiple subjects helps verify results.

Purpose of Experimental Design

The primary purpose of experimental design is to establish causal relationships by controlling for extraneous factors and reducing bias. Experimental designs help:

  • Test Hypotheses : Determine if there is a significant effect of independent variables on dependent variables.
  • Control Confounding Variables : Minimize the impact of variables that could distort results.
  • Generate Reproducible Results : Provide a structured approach that allows other researchers to replicate findings.

Types of Experimental Designs

Experimental designs can vary based on the number of variables, the assignment of participants, and the purpose of the experiment. Here are some common types:

1. Pre-Experimental Designs

These designs are exploratory and lack random assignment, often used when strict control is not feasible. They provide initial insights but are less rigorous in establishing causality.

  • Example : A training program is provided, and participants’ knowledge is tested afterward, without a pretest.
  • Example : A group is tested on reading skills, receives instruction, and is tested again to measure improvement.

2. True Experimental Designs

True experiments involve random assignment of participants to control or experimental groups, providing high levels of control over variables.

  • Example : A new drug’s efficacy is tested with patients randomly assigned to receive the drug or a placebo.
  • Example : Two groups are observed after one group receives a treatment, and the other receives no intervention.

3. Quasi-Experimental Designs

Quasi-experiments lack random assignment but still aim to determine causality by comparing groups or time periods. They are often used when randomization isn’t possible, such as in natural or field experiments.

  • Example : Schools receive different curriculums, and students’ test scores are compared before and after implementation.
  • Example : Traffic accident rates are recorded for a city before and after a new speed limit is enforced.

4. Factorial Designs

Factorial designs test the effects of multiple independent variables simultaneously. This design is useful for studying the interactions between variables.

  • Example : Studying how caffeine (variable 1) and sleep deprivation (variable 2) affect memory performance.
  • Example : An experiment studying the impact of age, gender, and education level on technology usage.

5. Repeated Measures Design

In repeated measures designs, the same participants are exposed to different conditions or treatments. This design is valuable for studying changes within subjects over time.

  • Example : Measuring reaction time in participants before, during, and after caffeine consumption.
  • Example : Testing two medications, with each participant receiving both but in a different sequence.

Methods for Implementing Experimental Designs

  • Purpose : Ensures each participant has an equal chance of being assigned to any group, reducing selection bias.
  • Method : Use random number generators or assignment software to allocate participants randomly.
  • Purpose : Prevents participants or researchers from knowing which group (experimental or control) participants belong to, reducing bias.
  • Method : Implement single-blind (participants unaware) or double-blind (both participants and researchers unaware) procedures.
  • Purpose : Provides a baseline for comparison, showing what would happen without the intervention.
  • Method : Include a group that does not receive the treatment but otherwise undergoes the same conditions.
  • Purpose : Controls for order effects in repeated measures designs by varying the order of treatments.
  • Method : Assign different sequences to participants, ensuring that each condition appears equally across orders.
  • Purpose : Ensures reliability by repeating the experiment or including multiple participants within groups.
  • Method : Increase sample size or repeat studies with different samples or in different settings.

Steps to Conduct an Experimental Design

  • Clearly state what you intend to discover or prove through the experiment. A strong hypothesis guides the experiment’s design and variable selection.
  • Independent Variable (IV) : The factor manipulated by the researcher (e.g., amount of sleep).
  • Dependent Variable (DV) : The outcome measured (e.g., reaction time).
  • Control Variables : Factors kept constant to prevent interference with results (e.g., time of day for testing).
  • Choose a design type that aligns with your research question, hypothesis, and available resources. For example, an RCT for a medical study or a factorial design for complex interactions.
  • Randomly assign participants to experimental or control groups. Ensure control groups are similar to experimental groups in all respects except for the treatment received.
  • Randomize the assignment and, if possible, apply blinding to minimize potential bias.
  • Follow a consistent procedure for each group, collecting data systematically. Record observations and manage any unexpected events or variables that may arise.
  • Use appropriate statistical methods to test for significant differences between groups, such as t-tests, ANOVA, or regression analysis.
  • Determine whether the results support your hypothesis and analyze any trends, patterns, or unexpected findings. Discuss possible limitations and implications of your results.

Examples of Experimental Design in Research

  • Medicine : Testing a new drug’s effectiveness through a randomized controlled trial, where one group receives the drug and another receives a placebo.
  • Psychology : Studying the effect of sleep deprivation on memory using a within-subject design, where participants are tested with different sleep conditions.
  • Education : Comparing teaching methods in a quasi-experimental design by measuring students’ performance before and after implementing a new curriculum.
  • Marketing : Using a factorial design to examine the effects of advertisement type and frequency on consumer purchase behavior.
  • Environmental Science : Testing the impact of a pollution reduction policy through a time series design, recording pollution levels before and after implementation.

Experimental design is fundamental to conducting rigorous and reliable research, offering a systematic approach to exploring causal relationships. With various types of designs and methods, researchers can choose the most appropriate setup to answer their research questions effectively. By applying best practices, controlling variables, and selecting suitable statistical methods, experimental design supports meaningful insights across scientific, medical, and social research fields.

  • Campbell, D. T., & Stanley, J. C. (1963). Experimental and Quasi-Experimental Designs for Research . Houghton Mifflin Company.
  • Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference . Houghton Mifflin.
  • Fisher, R. A. (1935). The Design of Experiments . Oliver and Boyd.
  • Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics . Sage Publications.
  • Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences . Routledge.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Descriptive Research Design

Descriptive Research Design – Types, Methods and...

Observational Research

Observational Research – Methods and Guide

Correlational Research Design

Correlational Research – Methods, Types and...

Transformative Design

Transformative Design – Methods, Types, Guide

Survey Research

Survey Research – Types, Methods, Examples

One-to-One Interview in Research

One-to-One Interview – Methods and Guide

Calcworkshop

Experimental Design in Statistics w/ 11 Examples!

// Last Updated: September 20, 2020 - Watch Video //

A proper experimental design is a critical skill in statistics.

Jenn (B.S., M.Ed.) of Calcworkshop® teaching why experimental design is important

Jenn, Founder Calcworkshop ® , 15+ Years Experience (Licensed & Certified Teacher)

Without proper controls and safeguards, unintended consequences can ruin our study and lead to wrong conclusions.

So let’s dive in to see what’s this is all about!

What’s the difference between an observational study and an experimental study?

An observational study is one in which investigators merely measure variables of interest without influencing the subjects.

And an experiment is a study in which investigators administer some form of treatment on one or more groups?

In other words, an observation is hands-off, whereas an experiment is hands-on.

So what’s the purpose of an experiment?

To establish causation (i.e., cause and effect).

All this means is that we wish to determine the effect an independent explanatory variable has on a dependent response variable.

The explanatory variable explains a response, similar to a child falling and skins their knee and starting to cry. The child is crying in response to falling and skinning their knee. So the explanatory variable is the fall, and the response variable is crying.

explanatory vs response variable in everyday life

Explanatory Vs Response Variable In Everyday Life

Let’s look at another example. Suppose a medical journal describes two studies in which subjects who had a seizure were randomly assigned to two different treatments:

  • No treatment.
  • A high dose of vitamin C.

The subjects were observed for a year, and the number of seizures for each subject was recorded. Identify the explanatory variable (independent variable), response variable (dependent variable), and include the experimental units.

The explanatory variable is whether the subject received either no treatment or a high dose of vitamin C. The response variable is whether the subject had a seizure during the time of the study. The experimental units in this study are the subjects who recently had a seizure.

Okay, so using the example above, notice that one of the groups did not receive treatment. This group is called a control group and acts as a baseline to see how a new treatment differs from those who don’t receive treatment. Typically, the control group is given something called a placebo, a substance designed to resemble medicine but does not contain an active drug component. A placebo is a dummy treatment, and should not have a physical effect on a person.

Before we talk about the characteristics of a well-designed experiment, we need to discuss some things to look out for:

  • Confounding
  • Lurking variables

Confounding happens when two explanatory variables are both associated with a response variable and also associated with each other, causing the investigator not to be able to identify their effects and the response variable separately.

A lurking variable is usually unobserved at the time of the study, which influences the association between the two variables of interest. In essence, a lurking variable is a third variable that is not measured in the study but may change the response variable.

For example, a study reported a relationship between smoking and health. A study of 1430 women were asked whether they smoked. Ten years later, a follow-up survey observed whether each woman was still alive or deceased. The researchers studied the possible link between whether a woman smoked and whether she survived the 10-year study period. They reported that:

  • 21% of the smokers died
  • 32% of the nonsmokers died

So, is smoking beneficial to your health, or is there something that could explain how this happened?

Older women are less likely to be smokers, and older women are more likely to die. Because age is a variable that influences the explanatory and response variable, it is considered a confounding variable.

But does smoking cause death?

Notice that the lurking variable, age, can also be a contributing factor. While there is a correlation between smoking and mortality, and also a correlation between smoking and age, we aren’t 100% sure that they are the cause of the mortality rate in women.

lurking confounding correlation causation diagram

Lurking – Confounding – Correlation – Causation Diagram

Now, something important to point out is that a lurking variable is one that is not measured in the study that could influence the results. Using the example above, some other possible lurking variables are:

  • Stress Level.

These variables were not measured in the study but could influence smoking habits as well as mortality rates.

What is important to note about the difference between confounding and lurking variables is that a confounding variable is measured in a study, while a lurking variable is not.

Additionally, correlation does not imply causation!

Alright, so now it’s time to talk about blinding: single-blind, double-blind experiments, as well as the placebo effect.

A single-blind experiment is when the subjects are unaware of which treatment they are receiving, but the investigator measuring the responses knows what treatments are going to which subject. In other words, the researcher knows which individual gets the placebo and which ones receive the experimental treatment. One major pitfall for this type of design is that the researcher may consciously or unconsciously influence the subject since they know who is receiving treatment and who isn’t.

A double-blind experiment is when both the subjects and investigator do not know who receives the placebo and who receives the treatment. A double-blind model is considered the best model for clinical trials as it eliminates the possibility of bias on the part of the researcher and the possibility of producing a placebo effect from the subject.

The placebo effect is when a subject has an effect or response to a fake treatment because they “believe” that the result should occur as noted by Yale . For example, a person struggling with insomnia takes a placebo (sugar pill) but instantly falls asleep because they believe they are receiving a sleep aid like Ambien or Lunesta.

placebo effect real life example

Placebo Effect – Real Life Example

So, what are the three primary requirements for a well-designed experiment?

  • Randomization

In a controlled experiment , the researchers, or investigators, decide which subjects are assigned to a control group and which subjects are assigned to a treatment group. In doing so, we ensure that the control and treatment groups are as similar as possible, and limit possible confounding influences such as lurking variables. A replicated experiment that is repeated on many different subjects helps reduce the chance of variation on the results. And randomization means we randomly assign subjects into control and treatment groups.

When subjects are divided into control groups and treatment groups randomly, we can use probability to predict the differences we expect to observe. If the differences between the two groups are higher than what we would expect to see naturally (by chance), we say that the results are statistically significant.

For example, if it is surmised that a new medicine reduces the effects of illness from 72 hours to 71 hours, this would not be considered statistically significant. The difference from 72 hours to 71 hours is not substantial enough to support that the observed effect was due to something other than normal random variation.

Now there are two major types of designs:

  • Completely-Randomized Design (CRD)
  • Block Design

A completely randomized design is the process of assigning subjects to control and treatment groups using probability, as seen in the flow diagram below.

completely randomized design example

Completely Randomized Design Example

A block design is a research method that places subjects into groups of similar experimental units or conditions, like age or gender, and then assign subjects to control and treatment groups using probability, as shown below.

randomized block design example

Randomized Block Design Example

Additionally, a useful and particular case of a blocking strategy is something called a matched-pair design . This is when two variables are paired to control for lurking variables.

For example, imagine we want to study if walking daily improved blood pressure. If the blood pressure for five subjects is measured at the beginning of the study and then again after participating in a walking program for one month, then the observations would be considered dependent samples because the same five subjects are used in the before and after observations; thus, a matched-pair design.

Please note that our video lesson will not focus on quasi-experiments. A quasi experimental design lacks random assignments; therefore, the independent variable can be manipulated prior to measuring the dependent variable, which may lead to confounding. For the sake of our lesson, and all future lessons, we will be using research methods where random sampling and experimental designs are used.

Together we will learn how to identify explanatory variables (independent variable) and response variables (dependent variables), understand and define confounding and lurking variables, see the effects of single-blind and double-blind experiments, and design randomized and block experiments.

Experimental Designs – Lesson & Examples (Video)

1 hr 06 min

  • Introduction to Video: Experiments
  • 00:00:29 – Observational Study vs Experimental Study and Response and Explanatory Variables (Examples #1-4)
  • Exclusive Content for Members Only
  • 00:09:15 – Identify the response and explanatory variables and the experimental units and treatment (Examples #5-6)
  • 00:14:47 – Introduction of lurking variables and confounding with ice cream and homicide example
  • 00:18:57 – Lurking variables, Confounding, Placebo Effect, Single Blind and Double Blind Experiments (Example #7)
  • 00:27:20 – What was the placebo effect and was the experiment single or double blind? (Example #8)
  • 00:30:36 – Characteristics of a well designed and constructed experiment that is statistically significant
  • 00:35:08 – Overview of Complete Randomized Design, Block Design and Matched Pair Design
  • 00:44:23 – Design and experiment using complete randomized design or a block design (Examples #9-10)
  • 00:56:09 – Identify the response and explanatory variables, experimental units, lurking variables, and design an experiment to test a new drug (Example #11)
  • Practice Problems with Step-by-Step Solutions
  • Chapter Tests with Video Solutions

Get access to all the courses and over 450 HD videos with your subscription

Monthly and Yearly Plans Available

Get My Subscription Now

Still wondering if CalcWorkshop is right for you? Take a Tour and find out how a membership can take the struggle out of learning math.

statistics in experimental method

Understanding Statistics and Experimental Design

How to Not Lie with Statistics

  • Open Access
  • © 2019

You have full access to this open access Textbook

  • Michael H. Herzog 0 ,
  • Gregory Francis 1 ,
  • Aaron Clarke 2

Brain Mind Institute, École Polytechnique Fédérale de Lausanne Brain Mind Institute, Lausanne, Switzerland

You can also search for this author in PubMed   Google Scholar

Dept. Psychological Sciences, Purdue University Dept. Psychological Sciences, West Lafayette, USA

Psychology department, bilkent university, ankara, turkey.

  • Short and mathematical as simple as possible
  • Provides a full account to the mostly used statistical tests
  • Makes the key statistical concepts and reasoning readily accessible
  • Teaches the reader the meta-statistical principles
  • Offers a completely new way of judging the quality of scientific studies in science and daily life

Part of the book series: Learning Materials in Biosciences (LMB)

399k Accesses

47 Citations

746 Altmetric

Buy print copy

  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

About this book

Similar content being viewed by others.

statistics in experimental method

Selected Statistical Methods in Experimental Studies

statistics in experimental method

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

The bayesian new statistics: hypothesis testing, estimation, meta-analysis, and power analysis from a bayesian perspective.

  • Experimental design
  • statistics in life sciences
  • concepts of statistics
  • correlations
  • meta-statistics
  • reproduction
  • hypothesis testing
  • simple probabilities
  • questionable research practices

Table of contents (12 chapters)

Front matter, the essentials of statistics, basic probability theory.

  • Michael H. Herzog, Gregory Francis, Aaron Clarke

Experimental Design and the Basics of Statistics: Signal Detection Theory (SDT)

The core concept of statistics, variations on the t -test, the multiple testing problem, experimental design: model fits, power, and complex designs, correlation, meta-analysis and the science crisis, meta-analysis, understanding replication, magnitude of excess success, suggested improvements and challenges.

“Readers with little or no background in statistics will appreciate how these fundamental concepts are so well illustrated in this book to establish the solid foundation of probability and statistics.” (David Han, Mathematical Reviews, April, 2020)

Authors and Affiliations

Michael H. Herzog

Gregory Francis

Aaron Clarke

About the authors

Bibliographic information.

Book Title : Understanding Statistics and Experimental Design

Book Subtitle : How to Not Lie with Statistics

Authors : Michael H. Herzog, Gregory Francis, Aaron Clarke

Series Title : Learning Materials in Biosciences

DOI : https://doi.org/10.1007/978-3-030-03499-3

Publisher : Springer Cham

eBook Packages : Biomedical and Life Sciences , Biomedical and Life Sciences (R0)

Copyright Information : The Editor(s) (if applicable) and The Author(s) 2019

Softcover ISBN : 978-3-030-03498-6 Published: 22 August 2019

eBook ISBN : 978-3-030-03499-3 Published: 13 August 2019

Series ISSN : 2509-6125

Series E-ISSN : 2509-6133

Edition Number : 1

Number of Pages : XI, 142

Number of Illustrations : 6 b/w illustrations, 29 illustrations in colour

Topics : Molecular Medicine , Biostatistics , Science Education , Statistics for Life Sciences, Medicine, Health Sciences , Psychology Research , Behavioral Sciences

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Statistical Design and Analysis of Biological Experiments

Chapter 1 principles of experimental design, 1.1 introduction.

The validity of conclusions drawn from a statistical analysis crucially hinges on the manner in which the data are acquired, and even the most sophisticated analysis will not rescue a flawed experiment. Planning an experiment and thinking about the details of data acquisition is so important for a successful analysis that R. A. Fisher—who single-handedly invented many of the experimental design techniques we are about to discuss—famously wrote

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ( Fisher 1938 )

(Statistical) design of experiments provides the principles and methods for planning experiments and tailoring the data acquisition to an intended analysis. Design and analysis of an experiment are best considered as two aspects of the same enterprise: the goals of the analysis strongly inform an appropriate design, and the implemented design determines the possible analyses.

The primary aim of designing experiments is to ensure that valid statistical and scientific conclusions can be drawn that withstand the scrutiny of a determined skeptic. Good experimental design also considers that resources are used efficiently, and that estimates are sufficiently precise and hypothesis tests adequately powered. It protects our conclusions by excluding alternative interpretations or rendering them implausible. Three main pillars of experimental design are randomization , replication , and blocking , and we will flesh out their effects on the subsequent analysis as well as their implementation in an experimental design.

An experimental design is always tailored towards predefined (primary) analyses and an efficient analysis and unambiguous interpretation of the experimental data is often straightforward from a good design. This does not prevent us from doing additional analyses of interesting observations after the data are acquired, but these analyses can be subjected to more severe criticisms and conclusions are more tentative.

In this chapter, we provide the wider context for using experiments in a larger research enterprise and informally introduce the main statistical ideas of experimental design. We use a comparison of two samples as our main example to study how design choices affect an analysis, but postpone a formal quantitative analysis to the next chapters.

1.2 A Cautionary Tale

For illustrating some of the issues arising in the interplay of experimental design and analysis, we consider a simple example. We are interested in comparing the enzyme levels measured in processed blood samples from laboratory mice, when the sample processing is done either with a kit from a vendor A, or a kit from a competitor B. For this, we take 20 mice and randomly select 10 of them for sample preparation with kit A, while the blood samples of the remaining 10 mice are prepared with kit B. The experiment is illustrated in Figure 1.1 A and the resulting data are given in Table 1.1 .

One option for comparing the two kits is to look at the difference in average enzyme levels, and we find an average level of 10.32 for vendor A and 10.66 for vendor B. We would like to interpret their difference of -0.34 as the difference due to the two preparation kits and conclude whether the two kits give equal results or if measurements based on one kit are systematically different from those based on the other kit.

Such interpretation, however, is only valid if the two groups of mice and their measurements are identical in all aspects except the sample preparation kit. If we use one strain of mice for kit A and another strain for kit B, any difference might also be attributed to inherent differences between the strains. Similarly, if the measurements using kit B were conducted much later than those using kit A, any observed difference might be attributed to changes in, e.g., mice selected, batches of chemicals used, device calibration, or any number of other influences. None of these competing explanations for an observed difference can be excluded from the given data alone, but good experimental design allows us to render them (almost) arbitrarily implausible.

A second aspect for our analysis is the inherent uncertainty in our calculated difference: if we repeat the experiment, the observed difference will change each time, and this will be more pronounced for a smaller number of mice, among others. If we do not use a sufficient number of mice in our experiment, the uncertainty associated with the observed difference might be too large, such that random fluctuations become a plausible explanation for the observed difference. Systematic differences between the two kits, of practically relevant magnitude in either direction, might then be compatible with the data, and we can draw no reliable conclusions from our experiment.

In each case, the statistical analysis—no matter how clever—was doomed before the experiment was even started, while simple ideas from statistical design of experiments would have provided correct and robust results with interpretable conclusions.

1.3 The Language of Experimental Design

By an experiment we understand an investigation where the researcher has full control over selecting and altering the experimental conditions of interest, and we only consider investigations of this type. The selected experimental conditions are called treatments . An experiment is comparative if the responses to several treatments are to be compared or contrasted. The experimental units are the smallest subdivision of the experimental material to which a treatment can be assigned. All experimental units given the same treatment constitute a treatment group . Especially in biology, we often compare treatments to a control group to which some standard experimental conditions are applied; a typical example is using a placebo for the control group, and different drugs for the other treatment groups.

The values observed are called responses and are measured on the response units ; these are often identical to the experimental units but need not be. Multiple experimental units are sometimes combined into groupings or blocks , such as mice grouped by litter, or samples grouped by batches of chemicals used for their preparation. More generally, we call any grouping of the experimental material (even with group size one) a unit .

In our example, we selected the mice, used a single sample per mouse, deliberately chose the two specific vendors, and had full control over which kit to assign to which mouse. In other words, the two kits are the treatments and the mice are the experimental units. We took the measured enzyme level of a single sample from a mouse as our response, and samples are therefore the response units. The resulting experiment is comparative, because we contrast the enzyme levels between the two treatment groups.

Three designs to determine the difference between two preparation kits A and B based on four mice. A: One sample per mouse. Comparison between averages of samples with same kit. B: Two samples per mouse treated with the same kit. Comparison between averages of mice with same kit requires averaging responses for each mouse first. C: Two samples per mouse each treated with different kit. Comparison between two samples of each mouse, with differences averaged.

Figure 1.1: Three designs to determine the difference between two preparation kits A and B based on four mice. A: One sample per mouse. Comparison between averages of samples with same kit. B: Two samples per mouse treated with the same kit. Comparison between averages of mice with same kit requires averaging responses for each mouse first. C: Two samples per mouse each treated with different kit. Comparison between two samples of each mouse, with differences averaged.

In this example, we can coalesce experimental and response units, because we have a single response per mouse and cannot distinguish a sample from a mouse in the analysis, as illustrated in Figure 1.1 A for four mice. Responses from mice with the same kit are averaged, and the kit difference is the difference between these two averages.

By contrast, if we take two samples per mouse and use the same kit for both samples, then the mice are still the experimental units, but each mouse now groups the two response units associated with it. Now, responses from the same mouse are first averaged, and these averages are used to calculate the difference between kits; even though eight measurements are available, this difference is still based on only four mice (Figure 1.1 B).

If we take two samples per mouse, but apply each kit to one of the two samples, then the samples are both the experimental and response units, while the mice are blocks that group the samples. Now, we calculate the difference between kits for each mouse, and then average these differences (Figure 1.1 C).

If we only use one kit and determine the average enzyme level, then this investigation is still an experiment, but is not comparative.

To summarize, the design of an experiment determines the logical structure of the experiment ; it consists of (i) a set of treatments (the two kits); (ii) a specification of the experimental units (animals, cell lines, samples) (the mice in Figure 1.1 A,B and the samples in Figure 1.1 C); (iii) a procedure for assigning treatments to units; and (iv) a specification of the response units and the quantity to be measured as a response (the samples and associated enzyme levels).

1.4 Experiment Validity

Before we embark on the more technical aspects of experimental design, we discuss three components for evaluating an experiment’s validity: construct validity , internal validity , and external validity . These criteria are well-established in areas such as educational and psychological research, and have more recently been discussed for animal research ( Würbel 2017 ) where experiments are increasingly scrutinized for their scientific rationale and their design and intended analyses.

1.4.1 Construct Validity

Construct validity concerns the choice of the experimental system for answering our research question. Is the system even capable of providing a relevant answer to the question?

Studying the mechanisms of a particular disease, for example, might require careful choice of an appropriate animal model that shows a disease phenotype and is accessible to experimental interventions. If the animal model is a proxy for drug development for humans, biological mechanisms must be sufficiently similar between animal and human physiologies.

Another important aspect of the construct is the quantity that we intend to measure (the measurand ), and its relation to the quantity or property we are interested in. For example, we might measure the concentration of the same chemical compound once in a blood sample and once in a highly purified sample, and these constitute two different measurands, whose values might not be comparable. Often, the quantity of interest (e.g., liver function) is not directly measurable (or even quantifiable) and we measure a biomarker instead. For example, pre-clinical and clinical investigations may use concentrations of proteins or counts of specific cell types from blood samples, such as the CD4+ cell count used as a biomarker for immune system function.

1.4.2 Internal Validity

The internal validity of an experiment concerns the soundness of the scientific rationale, statistical properties such as precision of estimates, and the measures taken against risk of bias. It refers to the validity of claims within the context of the experiment. Statistical design of experiments plays a prominent role in ensuring internal validity, and we briefly discuss the main ideas before providing the technical details and an application to our example in the subsequent sections.

Scientific Rationale and Research Question

The scientific rationale of a study is (usually) not immediately a statistical question. Translating a scientific question into a quantitative comparison amenable to statistical analysis is no small task and often requires careful consideration. It is a substantial, if non-statistical, benefit of using experimental design that we are forced to formulate a precise-enough research question and decide on the main analyses required for answering it before we conduct the experiment. For example, the question: is there a difference between placebo and drug? is insufficiently precise for planning a statistical analysis and determine an adequate experimental design. What exactly is the drug treatment? What should the drug’s concentration be and how is it administered? How do we make sure that the placebo group is comparable to the drug group in all other aspects? What do we measure and what do we mean by “difference?” A shift in average response, a fold-change, change in response before and after treatment?

The scientific rationale also enters the choice of a potential control group to which we compare responses. The quote

The deep, fundamental question in statistical analysis is ‘Compared to what?’ ( Tufte 1997 )

highlights the importance of this choice.

There are almost never enough resources to answer all relevant scientific questions. We therefore define a few questions of highest interest, and the main purpose of the experiment is answering these questions in the primary analysis . This intended analysis drives the experimental design to ensure relevant estimates can be calculated and have sufficient precision, and tests are adequately powered. This does not preclude us from conducting additional secondary analyses and exploratory analyses , but we are not willing to enlarge the experiment to ensure that strong conclusions can also be drawn from these analyses.

Risk of Bias

Experimental bias is a systematic difference in response between experimental units in addition to the difference caused by the treatments. The experimental units in the different groups are then not equal in all aspects other than the treatment applied to them. We saw several examples in Section 1.2 .

Minimizing the risk of bias is crucial for internal validity and we look at some common measures to eliminate or reduce different types of bias in Section 1.5 .

Precision and Effect Size

Another aspect of internal validity is the precision of estimates and the expected effect sizes. Is the experimental setup, in principle, able to detect a difference of relevant magnitude? Experimental design offers several methods for answering this question based on the expected heterogeneity of samples, the measurement error, and other sources of variation: power analysis is a technique for determining the number of samples required to reliably detect a relevant effect size and provide estimates of sufficient precision. More samples yield more precision and more power, but we have to be careful that replication is done at the right level: simply measuring a biological sample multiple times as in Figure 1.1 B yields more measured values, but is pseudo-replication for analyses. Replication should also ensure that the statistical uncertainties of estimates can be gauged from the data of the experiment itself, without additional untestable assumptions. Finally, the technique of blocking , shown in Figure 1.1 C, can remove a substantial proportion of the variation and thereby increase power and precision if we find a way to apply it.

1.4.3 External Validity

The external validity of an experiment concerns its replicability and the generalizability of inferences. An experiment is replicable if its results can be confirmed by an independent new experiment, preferably by a different lab and researcher. Experimental conditions in the replicate experiment usually differ from the original experiment, which provides evidence that the observed effects are robust to such changes. A much weaker condition on an experiment is reproducibility , the property that an independent researcher draws equivalent conclusions based on the data from this particular experiment, using the same analysis techniques. Reproducibility requires publishing the raw data, details on the experimental protocol, and a description of the statistical analyses, preferably with accompanying source code. Many scientific journals subscribe to reporting guidelines to ensure reproducibility and these are also helpful for planning an experiment.

A main threat to replicability and generalizability are too tightly controlled experimental conditions, when inferences only hold for a specific lab under the very specific conditions of the original experiment. Introducing systematic heterogeneity and using multi-center studies effectively broadens the experimental conditions and therefore the inferences for which internal validity is available.

For systematic heterogeneity , experimental conditions are systematically altered in addition to the treatments, and treatment differences estimated for each condition. For example, we might split the experimental material into several batches and use a different day of analysis, sample preparation, batch of buffer, measurement device, and lab technician for each batch. A more general inference is then possible if effect size, effect direction, and precision are comparable between the batches, indicating that the treatment differences are stable over the different conditions.

In multi-center experiments , the same experiment is conducted in several different labs and the results compared and merged. Multi-center approaches are very common in clinical trials and often necessary to reach the required number of patient enrollments.

Generalizability of randomized controlled trials in medicine and animal studies can suffer from overly restrictive eligibility criteria. In clinical trials, patients are often included or excluded based on co-medications and co-morbidities, and the resulting sample of eligible patients might no longer be representative of the patient population. For example, Travers et al. ( 2007 ) used the eligibility criteria of 17 random controlled trials of asthma treatments and found that out of 749 patients, only a median of 6% (45 patients) would be eligible for an asthma-related randomized controlled trial. This puts a question mark on the relevance of the trials’ findings for asthma patients in general.

1.5 Reducing the Risk of Bias

1.5.1 randomization of treatment allocation.

If systematic differences other than the treatment exist between our treatment groups, then the effect of the treatment is confounded with these other differences and our estimates of treatment effects might be biased.

We remove such unwanted systematic differences from our treatment comparisons by randomizing the allocation of treatments to experimental units. In a completely randomized design , each experimental unit has the same chance of being subjected to any of the treatments, and any differences between the experimental units other than the treatments are distributed over the treatment groups. Importantly, randomization is the only method that also protects our experiment against unknown sources of bias: we do not need to know all or even any of the potential differences and yet their impact is eliminated from the treatment comparisons by random treatment allocation.

Randomization has two effects: (i) differences unrelated to treatment become part of the ‘statistical noise’ rendering the treatment groups more similar; and (ii) the systematic differences are thereby eliminated as sources of bias from the treatment comparison.

Randomization transforms systematic variation into random variation.

In our example, a proper randomization would select 10 out of our 20 mice fully at random, such that the probability of any one mouse being picked is 1/20. These ten mice are then assigned to kit A, and the remaining mice to kit B. This allocation is entirely independent of the treatments and of any properties of the mice.

To ensure random treatment allocation, some kind of random process needs to be employed. This can be as simple as shuffling a pack of 10 red and 10 black cards or using a software-based random number generator. Randomization is slightly more difficult if the number of experimental units is not known at the start of the experiment, such as when patients are recruited for an ongoing clinical trial (sometimes called rolling recruitment ), and we want to have reasonable balance between the treatment groups at each stage of the trial.

Seemingly random assignments “by hand” are usually no less complicated than fully random assignments, but are always inferior. If surprising results ensue from the experiment, such assignments are subject to unanswerable criticism and suspicion of unwanted bias. Even worse are systematic allocations; they can only remove bias from known causes, and immediately raise red flags under the slightest scrutiny.

The Problem of Undesired Assignments

Even with a fully random treatment allocation procedure, we might end up with an undesirable allocation. For our example, the treatment group of kit A might—just by chance—contain mice that are all bigger or more active than those in the other treatment group. Statistical orthodoxy recommends using the design nevertheless, because only full randomization guarantees valid estimates of residual variance and unbiased estimates of effects. This argument, however, concerns the long-run properties of the procedure and seems of little help in this specific situation. Why should we care if the randomization yields correct estimates under replication of the experiment, if the particular experiment is jeopardized?

Another solution is to create a list of all possible allocations that we would accept and randomly choose one of these allocations for our experiment. The analysis should then reflect this restriction in the possible randomizations, which often renders this approach difficult to implement.

The most pragmatic method is to reject highly undesirable designs and compute a new randomization ( Cox 1958 ) . Undesirable allocations are unlikely to arise for large sample sizes, and we might accept a small bias in estimation for small sample sizes, when uncertainty in the estimated treatment effect is already high. In this approach, whenever we reject a particular outcome, we must also be willing to reject the outcome if we permute the treatment level labels. If we reject eight big and two small mice for kit A, then we must also reject two big and eight small mice. We must also be transparent and report a rejected allocation, so that critics may come to their own conclusions about potential biases and their remedies.

1.5.2 Blinding

Bias in treatment comparisons is also introduced if treatment allocation is random, but responses cannot be measured entirely objectively, or if knowledge of the assigned treatment affects the response. In clinical trials, for example, patients might react differently when they know to be on a placebo treatment, an effect known as cognitive bias . In animal experiments, caretakers might report more abnormal behavior for animals on a more severe treatment. Cognitive bias can be eliminated by concealing the treatment allocation from technicians or participants of a clinical trial, a technique called single-blinding .

If response measures are partially based on professional judgement (such as a clinical scale), patient or physician might unconsciously report lower scores for a placebo treatment, a phenomenon known as observer bias . Its removal requires double blinding , where treatment allocations are additionally concealed from the experimentalist.

Blinding requires randomized treatment allocation to begin with and substantial effort might be needed to implement it. Drug companies, for example, have to go to great lengths to ensure that a placebo looks, tastes, and feels similar enough to the actual drug. Additionally, blinding is often done by coding the treatment conditions and samples, and effect sizes and statistical significance are calculated before the code is revealed.

In clinical trials, double-blinding creates a conflict of interest. The attending physicians do not know which patient received which treatment, and thus accumulation of side-effects cannot be linked to any treatment. For this reason, clinical trials have a data monitoring committee not involved in the final analysis, that performs intermediate analyses of efficacy and safety at predefined intervals. If severe problems are detected, the committee might recommend altering or aborting the trial. The same might happen if one treatment already shows overwhelming evidence of superiority, such that it becomes unethical to withhold this treatment from the other patients.

1.5.3 Analysis Plan and Registration

An often overlooked source of bias has been termed the researcher degrees of freedom or garden of forking paths in the data analysis. For any set of data, there are many different options for its analysis: some results might be considered outliers and discarded, assumptions are made on error distributions and appropriate test statistics, different covariates might be included into a regression model. Often, multiple hypotheses are investigated and tested, and analyses are done separately on various (overlapping) subgroups. Hypotheses formed after looking at the data require additional care in their interpretation; almost never will \(p\) -values for these ad hoc or post hoc hypotheses be statistically justifiable. Many different measured response variables invite fishing expeditions , where patterns in the data are sought without an underlying hypothesis. Only reporting those sub-analyses that gave ‘interesting’ findings invariably leads to biased conclusions and is called cherry-picking or \(p\) -hacking (or much less flattering names).

The statistical analysis is always part of a larger scientific argument and we should consider the necessary computations in relation to building our scientific argument about the interpretation of the data. In addition to the statistical calculations, this interpretation requires substantial subject-matter knowledge and includes (many) non-statistical arguments. Two quotes highlight that experiment and analysis are a means to an end and not the end in itself.

There is a boundary in data interpretation beyond which formulas and quantitative decision procedures do not go, where judgment and style enter. ( Abelson 1995 )
Often, perfectly reasonable people come to perfectly reasonable decisions or conclusions based on nonstatistical evidence. Statistical analysis is a tool with which we support reasoning. It is not a goal in itself. ( Bailar III 1981 )

There is often a grey area between exploiting researcher degrees of freedom to arrive at a desired conclusion, and creative yet informed analyses of data. One way to navigate this area is to distinguish between exploratory studies and confirmatory studies . The former have no clearly stated scientific question, but are used to generate interesting hypotheses by identifying potential associations or effects that are then further investigated. Conclusions from these studies are very tentative and must be reported honestly as such. In contrast, standards are much higher for confirmatory studies, which investigate a specific predefined scientific question. Analysis plans and pre-registration of an experiment are accepted means for demonstrating lack of bias due to researcher degrees of freedom, and separating primary from secondary analyses allows emphasizing the main goals of the study.

Analysis Plan

The analysis plan is written before conducting the experiment and details the measurands and estimands, the hypotheses to be tested together with a power and sample size calculation, a discussion of relevant effect sizes, detection and handling of outliers and missing data, as well as steps for data normalization such as transformations and baseline corrections. If a regression model is required, its factors and covariates are outlined. Particularly in biology, handling measurements below the limit of quantification and saturation effects require careful consideration.

In the context of clinical trials, the problem of estimands has become a recent focus of attention. An estimand is the target of a statistical estimation procedure, for example the true average difference in enzyme levels between the two preparation kits. A main problem in many studies are post-randomization events that can change the estimand, even if the estimation procedure remains the same. For example, if kit B fails to produce usable samples for measurement in five out of ten cases because the enzyme level was too low, while kit A could handle these enzyme levels perfectly fine, then this might severely exaggerate the observed difference between the two kits. Similar problems arise in drug trials, when some patients stop taking one of the drugs due to side-effects or other complications.

Registration

Registration of experiments is an even more severe measure used in conjunction with an analysis plan and is becoming standard in clinical trials. Here, information about the trial, including the analysis plan, procedure to recruit patients, and stopping criteria, are registered in a public database. Publications based on the trial then refer to this registration, such that reviewers and readers can compare what the researchers intended to do and what they actually did. Similar portals for pre-clinical and translational research are also available.

1.6 Notes and Summary

The problem of measurements and measurands is further discussed for statistics in Hand ( 1996 ) and specifically for biological experiments in Coxon, Longstaff, and Burns ( 2019 ) . A general review of methods for handling missing data is Dong and Peng ( 2013 ) . The different roles of randomization are emphasized in Cox ( 2009 ) .

Two well-known reporting guidelines are the ARRIVE guidelines for animal research ( Kilkenny et al. 2010 ) and the CONSORT guidelines for clinical trials ( Moher et al. 2010 ) . Guidelines describing the minimal information required for reproducing experimental results have been developed for many types of experimental techniques, including microarrays (MIAME), RNA sequencing (MINSEQE), metabolomics (MSI) and proteomics (MIAPE) experiments; the FAIRSHARE initiative provides a more comprehensive collection ( Sansone et al. 2019 ) .

The problems of experimental design in animal experiments and particularly translation research are discussed in Couzin-Frankel ( 2013 ) . Multi-center studies are now considered for these investigations, and using a second laboratory already increases reproducibility substantially ( Richter et al. 2010 ; Richter 2017 ; Voelkl et al. 2018 ; Karp 2018 ) and allows standardizing the treatment effects ( Kafkafi et al. 2017 ) . First attempts are reported of using designs similar to clinical trials ( Llovera and Liesz 2016 ) . Exploratory-confirmatory research and external validity for animal studies is discussed in Kimmelman, Mogil, and Dirnagl ( 2014 ) and Pound and Ritskes-Hoitinga ( 2018 ) . Further information on pilot studies is found in Moore et al. ( 2011 ) , Sim ( 2019 ) , and Thabane et al. ( 2010 ) .

The deliberate use of statistical analyses and their interpretation for supporting a larger argument was called statistics as principled argument ( Abelson 1995 ) . Employing useless statistical analysis without reference to the actual scientific question is surrogate science ( Gigerenzer and Marewski 2014 ) and adaptive thinking is integral to meaningful statistical analysis ( Gigerenzer 2002 ) .

In an experiment, the investigator has full control over the experimental conditions applied to the experiment material. The experimental design gives the logical structure of an experiment: the units describing the organization of the experimental material, the treatments and their allocation to units, and the response. Statistical design of experiments includes techniques to ensure internal validity of an experiment, and methods to make inference from experimental data efficient.

IMAGES

  1. 29: Experimental Methods

    statistics in experimental method

  2. PPT

    statistics in experimental method

  3. PPT

    statistics in experimental method

  4. Types of Variables in Science Experiments

    statistics in experimental method

  5. PPT

    statistics in experimental method

  6. PPT

    statistics in experimental method

VIDEO

  1. Statistics

  2. Experimental Design Techniques

  3. A level Statistics

  4. Statistics for Experimental 03

  5. Types of Statistics in a Simple way

  6. NCEA Statistics L3 Probability Concepts: Experimental Models

COMMENTS

  1. Experimental Design: Definition and Types

    How your experimental design assigns subjects to the groups affects how confident you can be that the findings represent true causal effects rather than mere correlation caused by confounders. Indeed, the assignment method influences how you control for confounding variables. This is the difference between correlation and causation.

  2. Experimental Design

    With various types of designs and methods, researchers can choose the most appropriate setup to answer their research questions effectively. By applying best practices, controlling variables, and selecting suitable statistical methods, experimental design supports meaningful insights across scientific, medical, and social research fields ...

  3. PDF Chapter 10. Experimental Design: Statistical Analysis of Data Purpose

    In previous chapters, we have discussed the basic principles of good experimental design. Before examining specific experimental designs and the way that their data are analyzed, we thought that it would be a good idea to review some basic principles of statistics. We assume that most of you reading this book have taken a course in statistics.

  4. Statistics

    Statistics - Sampling, Variables, Design: Data for statistical studies are obtained by conducting either experiments or surveys. Experimental design is the branch of statistics that deals with the design and analysis of experiments. The methods of experimental design are widely used in the fields of agriculture, medicine, biology, marketing research, and industrial production.

  5. Experimental Design in Statistics (w/ 11 Examples!)

    A quasi experimental design lacks random assignments; therefore, the independent variable can be manipulated prior to measuring the dependent variable, which may lead to confounding. For the sake of our lesson, and all future lessons, we will be using research methods where random sampling and experimental designs are used.

  6. Guide to Experimental Design

    Guide to Experimental Design | Overview, 5 steps & Examples. Published on December 3, 2019 by Rebecca Bevans.Revised on June 21, 2023. Experiments are used to study causal relationships.You manipulate one or more independent variables and measure their effect on one or more dependent variables.. Experimental design create a set of procedures to systematically test a hypothesis.

  7. PDF Statistical Principles for the Design of Experiments

    5 Experimental units 107 5.0 Preliminary examples 107 5.1 Different forms of basic experimental units 109 5.2 Experimental units as collections 113 5.3 A part as the unit and sequences of treatments 115 5.4 Multiple levels of experimental units 118 5.5 Time as a factor and repeated measurements 120 5.6 Protection of units, randomisation ...

  8. Understanding Statistics and Experimental Design

    Finally, the textbook shows how complex statistics can be avoided by using clever experimental design. Both non-scientists and students in Biology, Biomedicine and Engineering will benefit from the book by learning the statistical basis of scientific claims and by discovering ways to evaluate the quality of scientific reports in academic ...

  9. Experimental Design

    Use the new teaching method on the treatment group and the standard method on the control group, ensuring that the method of treatment is the only condition that is different. Administer a post-test to both groups. Assess the differences between groups. Two issues can affect the Randomized Control-Group Pretest Posttest Design:

  10. Chapter 1 Principles of Experimental Design

    1.3 The Language of Experimental Design. By an experiment we understand an investigation where the researcher has full control over selecting and altering the experimental conditions of interest, and we only consider investigations of this type. The selected experimental conditions are called treatments.An experiment is comparative if the responses to several treatments are to be compared or ...