Statistical Analysis in BE Studies: How Power and Sample Size Determine Success
Mar, 18 2026
When a generic drug company wants to bring a new product to market, they don’t need to run full clinical trials like the original brand. Instead, they must prove bioequivalence - that their version behaves in the body just like the brand-name drug. This is done through a bioequivalence (BE) study, usually a small, tightly controlled crossover trial where volunteers take both the test and reference drugs. But here’s the catch: if the study is underpowered or has the wrong sample size, it fails. And failing a BE study isn’t just a delay - it’s a costly, months-long setback that can kill a product launch.
So how do you know how many people you need? It’s not guesswork. It’s math. And that math has to meet strict regulatory standards from the FDA and EMA. The goal? To be 80% to 90% sure you’ll correctly say two drugs are bioequivalent - if they really are. Miss that target, and you risk a Type II error: concluding the drugs are different when they’re not. That’s a failure no company can afford.
Why Power and Sample Size Are Non-Negotiable
Power in a BE study is the probability that your test will correctly detect bioequivalence when it exists. A power of 80% means you have an 80% chance of passing the study if the drugs are truly equivalent. The FDA and EMA both require at least 80% power. Many sponsors aim for 90%, especially for drugs with narrow therapeutic indexes - where small differences can mean serious safety risks.
Sample size is the direct result of that power calculation. Too few subjects? You might miss a real difference and fail the study. Too many? You’re wasting money, time, and exposing more people than necessary to drug dosing. The sweet spot is precise, and it depends on three things: variability, expected ratio, and equivalence limits.
The Three Pillars of Sample Size Calculation
Every BE study hinges on these three inputs:
- Within-subject coefficient of variation (CV%) - This measures how much a person’s own response to the drug varies across doses. For example, if a volunteer’s AUC after taking Drug A is 120 one day and 180 another, that’s high variability. CV% for most drugs ranges from 10% to 35%. But for highly variable drugs (like warfarin or clopidogrel), it can hit 40% or higher. Higher CV% means you need more subjects. A 30% CV might require 52 people, while a 20% CV only needs 26.
- Geometric mean ratio (GMR) - This is the expected ratio of test to reference drug exposure. Most generic drugs aim for a GMR close to 1.00 (meaning identical exposure). But assuming 1.00 is dangerous. If the real ratio is 0.95, and you plan for 1.00, your required sample size jumps by 32%. Always use conservative estimates based on pilot data, not literature.
- Equivalence margins - The legal boundaries for bioequivalence. For most drugs, it’s 80% to 125% on the log scale. That means the test drug’s exposure can be as low as 80% or as high as 125% of the reference and still be considered equivalent. Some drugs, especially for Cmax, allow wider ranges under EMA rules (75-133%), which can cut sample size by 15-20%.
These aren’t optional. If you plug in the wrong numbers, your whole study is flawed. The FDA found that 63% of submissions using literature-based CV% underestimated true variability by 5-8 percentage points. That’s a recipe for failure.
How Variability Changes Everything
Let’s say you’re studying a drug with a CV% of 20%, an expected GMR of 0.95, and 80% power. You’d need 26 subjects. Now increase the CV% to 30%. Suddenly, you need 52. Double the variability? Double the sample size. That’s why pilot studies matter. If you don’t have real data, you’re guessing - and guessing in BE studies is expensive.
Highly variable drugs (CV > 30%) are a special case. The FDA allows reference-scaled average bioequivalence (RSABE) for these. Instead of fixed 80-125% limits, the range expands based on how variable the reference drug is. For example, if the reference drug has a CV of 40%, the equivalence limits might stretch to 67-150%. This reduces sample size from 120+ down to 24-48. But RSABE isn’t automatic - you need regulatory approval upfront, and the math gets more complex.
Dropouts and Other Real-World Messiness
Real studies aren’t perfect. People drop out. They miss doses. They get sick. So you don’t just calculate the bare minimum. You add a buffer. Industry best practice? Add 10-15% extra subjects to account for dropouts. If your calculation says 26, plan for 30. If it says 52, plan for 60. Skip this step, and your final power might drop from 80% to 70% - and you’ll fail.
Also, most BE studies look at two endpoints: Cmax and AUC. You can’t just power for one. If you only power for AUC, and Cmax is more variable, you might pass AUC but fail Cmax. The American Statistical Association found only 45% of sponsors calculate joint power for both. That’s a blind spot. Always plan for the worst-case parameter.
Tools of the Trade
No one does this by hand anymore. Specialized software does the heavy lifting:
- PASS - Industry standard for BE studies. Handles RSABE, crossover designs, and joint power calculations.
- nQuery - Popular among CROs. Easy interface, regulatory templates built-in.
- FARTSSIE - Free tool developed by regulatory scientists. Great for small labs.
- ClinCalc - Online calculator. Good for quick estimates, but always validate with full software.
Don’t use generic sample size tools meant for superiority trials. BE is different. The math is based on log-normal distributions and confidence intervals, not p-values. Using the wrong tool is like using a ruler to measure weight.
Regulatory Red Flags
The FDA’s 2022 Bioequivalence Review Template lists exact documentation requirements:
- Software name and version used
- All input parameters with justification
- Dropout adjustment
- Joint power for Cmax and AUC
- Source of CV% (pilot data, not literature)
18% of statistical deficiencies in 2021 submissions came from incomplete documentation. That’s not a technical error - it’s a procedural one. And it’s preventable.
Also, sequence effects in crossover designs are often ignored. If subjects absorb the drug differently depending on whether they got test or reference first, and you don’t account for that in your model, your power drops. The EMA rejected 29% of BE studies in 2022 for this reason alone.
The Cost of Getting It Wrong
A failed BE study can cost $2 million to $5 million. And it takes 6-12 months to restart. The FDA reported that 22% of Complete Response Letters cited inadequate sample size or power. Dr. Donald Schuirmann, a leading BE expert, calls underpowered studies “one of the most common statistical failures in generic drug development.”
Worse, optimistic assumptions are deadly. Dr. Laszlo Endrenyi found that 37% of BE study failures in oncology generics between 2015 and 2020 came from using literature CV% instead of pilot data. That’s not ignorance - it’s negligence.
What’s Next? The Future of BE Power Analysis
The field is evolving. The FDA’s 2023 draft guidance introduces adaptive designs, where sample size can be adjusted mid-study based on interim data. This could cut costs and speed up approvals. Model-informed bioequivalence - using pharmacokinetic modeling instead of traditional metrics - is also emerging. Early results show it can reduce sample size by 30-50% for complex drugs like inhalers or injectables. But as of 2023, only 5% of submissions use it. Regulatory uncertainty keeps most sponsors in the traditional lane.
Still, the core hasn’t changed. You still need to know your CV%, your GMR, and your margins. You still need to power for both endpoints. You still need to document everything. The tools may get smarter, but the math stays the same.
Bottom Line
Power and sample size aren’t just statistical chores. They’re the foundation of your entire BE study. Get them right, and you clear the biggest hurdle to market. Get them wrong, and you waste years and millions. There’s no shortcut. No magic number. Just careful planning, real data, and strict adherence to regulatory standards.
If you’re planning a BE study, start with pilot data. Don’t trust literature. Calculate joint power. Add 15% for dropouts. Use the right software. Document everything. And remember: in bioequivalence, the smallest mistake can have the biggest cost.
What is the standard power level for a BE study?
Regulatory agencies like the FDA and EMA require a minimum power of 80%. Many sponsors aim for 90%, especially for drugs with narrow therapeutic indexes or when submitting globally. Power below 80% is generally not accepted.
Why is CV% so important in sample size calculations?
CV% (coefficient of variation) measures how much a person’s response to the drug varies across doses. Higher CV% means more variability, which requires a larger sample size to detect equivalence. For example, a drug with 30% CV needs twice as many subjects as one with 20% CV under the same conditions. Using inaccurate CV% estimates is the leading cause of BE study failures.
Can I use CV% from published literature to plan my study?
It’s risky. The FDA found that literature-based CV% estimates underestimate true variability by 5-8 percentage points in 63% of cases. Always use pilot data from your own formulation or a closely related product. Relying on literature has caused 37% of BE study failures in oncology generics between 2015 and 2020.
What is RSABE and when is it used?
RSABE stands for Reference-Scaled Average Bioequivalence. It’s a method used for highly variable drugs (CV > 30%) where fixed 80-125% equivalence limits become impractical. RSABE widens the acceptance range based on the reference drug’s variability, allowing smaller sample sizes - often cutting them from 100+ to 24-48 subjects. It requires regulatory approval before the study begins.
Do I need to calculate power for both Cmax and AUC?
Yes. A BE study must demonstrate equivalence for both Cmax (peak concentration) and AUC (total exposure). Most drugs have different variability for each. If you only power for the less variable parameter, you risk failing the other. Only 45% of sponsors currently calculate joint power - but regulatory agencies expect it.
How much should I increase my sample size for dropouts?
Add 10-15% to your calculated sample size to account for dropouts. If your calculation says 26 subjects, plan for 30. If it says 52, plan for 60. Failing to adjust for dropouts can reduce your final power by 5-10%, turning a pass into a fail.
What happens if my BE study fails due to sample size?
A failure due to inadequate power or sample size typically results in a Complete Response Letter from the FDA or EMA. You’ll need to redesign the study, collect new data, and resubmit - costing $2 million to $5 million and delaying launch by 6-12 months. It’s one of the most preventable - and costly - mistakes in generic drug development.