Statistical Analysis in BE Studies: How Power and Sample Size Determine Success

Mar, 18 2026

When a generic drug company wants to bring a new product to market, they don’t need to run full clinical trials like the original brand. Instead, they must prove bioequivalence - that their version behaves in the body just like the brand-name drug. This is done through a bioequivalence (BE) study, usually a small, tightly controlled crossover trial where volunteers take both the test and reference drugs. But here’s the catch: if the study is underpowered or has the wrong sample size, it fails. And failing a BE study isn’t just a delay - it’s a costly, months-long setback that can kill a product launch.

So how do you know how many people you need? It’s not guesswork. It’s math. And that math has to meet strict regulatory standards from the FDA and EMA. The goal? To be 80% to 90% sure you’ll correctly say two drugs are bioequivalent - if they really are. Miss that target, and you risk a Type II error: concluding the drugs are different when they’re not. That’s a failure no company can afford.

Why Power and Sample Size Are Non-Negotiable

Power in a BE study is the probability that your test will correctly detect bioequivalence when it exists. A power of 80% means you have an 80% chance of passing the study if the drugs are truly equivalent. The FDA and EMA both require at least 80% power. Many sponsors aim for 90%, especially for drugs with narrow therapeutic indexes - where small differences can mean serious safety risks.

Sample size is the direct result of that power calculation. Too few subjects? You might miss a real difference and fail the study. Too many? You’re wasting money, time, and exposing more people than necessary to drug dosing. The sweet spot is precise, and it depends on three things: variability, expected ratio, and equivalence limits.

The Three Pillars of Sample Size Calculation

Every BE study hinges on these three inputs:

Within-subject coefficient of variation (CV%) - This measures how much a person’s own response to the drug varies across doses. For example, if a volunteer’s AUC after taking Drug A is 120 one day and 180 another, that’s high variability. CV% for most drugs ranges from 10% to 35%. But for highly variable drugs (like warfarin or clopidogrel), it can hit 40% or higher. Higher CV% means you need more subjects. A 30% CV might require 52 people, while a 20% CV only needs 26.
Geometric mean ratio (GMR) - This is the expected ratio of test to reference drug exposure. Most generic drugs aim for a GMR close to 1.00 (meaning identical exposure). But assuming 1.00 is dangerous. If the real ratio is 0.95, and you plan for 1.00, your required sample size jumps by 32%. Always use conservative estimates based on pilot data, not literature.
Equivalence margins - The legal boundaries for bioequivalence. For most drugs, it’s 80% to 125% on the log scale. That means the test drug’s exposure can be as low as 80% or as high as 125% of the reference and still be considered equivalent. Some drugs, especially for Cmax, allow wider ranges under EMA rules (75-133%), which can cut sample size by 15-20%.

These aren’t optional. If you plug in the wrong numbers, your whole study is flawed. The FDA found that 63% of submissions using literature-based CV% underestimated true variability by 5-8 percentage points. That’s a recipe for failure.

How Variability Changes Everything

Let’s say you’re studying a drug with a CV% of 20%, an expected GMR of 0.95, and 80% power. You’d need 26 subjects. Now increase the CV% to 30%. Suddenly, you need 52. Double the variability? Double the sample size. That’s why pilot studies matter. If you don’t have real data, you’re guessing - and guessing in BE studies is expensive.

Highly variable drugs (CV > 30%) are a special case. The FDA allows reference-scaled average bioequivalence (RSABE) for these. Instead of fixed 80-125% limits, the range expands based on how variable the reference drug is. For example, if the reference drug has a CV of 40%, the equivalence limits might stretch to 67-150%. This reduces sample size from 120+ down to 24-48. But RSABE isn’t automatic - you need regulatory approval upfront, and the math gets more complex.

Contrasting success and failure in bioequivalence studies with pilot data vs. literature estimates.

Dropouts and Other Real-World Messiness

Real studies aren’t perfect. People drop out. They miss doses. They get sick. So you don’t just calculate the bare minimum. You add a buffer. Industry best practice? Add 10-15% extra subjects to account for dropouts. If your calculation says 26, plan for 30. If it says 52, plan for 60. Skip this step, and your final power might drop from 80% to 70% - and you’ll fail.

Also, most BE studies look at two endpoints: Cmax and AUC. You can’t just power for one. If you only power for AUC, and Cmax is more variable, you might pass AUC but fail Cmax. The American Statistical Association found only 45% of sponsors calculate joint power for both. That’s a blind spot. Always plan for the worst-case parameter.

Tools of the Trade

No one does this by hand anymore. Specialized software does the heavy lifting:

PASS - Industry standard for BE studies. Handles RSABE, crossover designs, and joint power calculations.
nQuery - Popular among CROs. Easy interface, regulatory templates built-in.
FARTSSIE - Free tool developed by regulatory scientists. Great for small labs.
ClinCalc - Online calculator. Good for quick estimates, but always validate with full software.

Don’t use generic sample size tools meant for superiority trials. BE is different. The math is based on log-normal distributions and confidence intervals, not p-values. Using the wrong tool is like using a ruler to measure weight.

Regulatory Red Flags

The FDA’s 2022 Bioequivalence Review Template lists exact documentation requirements:

Software name and version used
All input parameters with justification
Dropout adjustment
Joint power for Cmax and AUC
Source of CV% (pilot data, not literature)

18% of statistical deficiencies in 2021 submissions came from incomplete documentation. That’s not a technical error - it’s a procedural one. And it’s preventable.

Also, sequence effects in crossover designs are often ignored. If subjects absorb the drug differently depending on whether they got test or reference first, and you don’t account for that in your model, your power drops. The EMA rejected 29% of BE studies in 2022 for this reason alone.

A balance scale showing unequal weighting of Cmax and AUC endpoints in BE study design.

The Cost of Getting It Wrong

A failed BE study can cost $2 million to $5 million. And it takes 6-12 months to restart. The FDA reported that 22% of Complete Response Letters cited inadequate sample size or power. Dr. Donald Schuirmann, a leading BE expert, calls underpowered studies “one of the most common statistical failures in generic drug development.”

Worse, optimistic assumptions are deadly. Dr. Laszlo Endrenyi found that 37% of BE study failures in oncology generics between 2015 and 2020 came from using literature CV% instead of pilot data. That’s not ignorance - it’s negligence.

What’s Next? The Future of BE Power Analysis

The field is evolving. The FDA’s 2023 draft guidance introduces adaptive designs, where sample size can be adjusted mid-study based on interim data. This could cut costs and speed up approvals. Model-informed bioequivalence - using pharmacokinetic modeling instead of traditional metrics - is also emerging. Early results show it can reduce sample size by 30-50% for complex drugs like inhalers or injectables. But as of 2023, only 5% of submissions use it. Regulatory uncertainty keeps most sponsors in the traditional lane.

Still, the core hasn’t changed. You still need to know your CV%, your GMR, and your margins. You still need to power for both endpoints. You still need to document everything. The tools may get smarter, but the math stays the same.

Bottom Line

Power and sample size aren’t just statistical chores. They’re the foundation of your entire BE study. Get them right, and you clear the biggest hurdle to market. Get them wrong, and you waste years and millions. There’s no shortcut. No magic number. Just careful planning, real data, and strict adherence to regulatory standards.

If you’re planning a BE study, start with pilot data. Don’t trust literature. Calculate joint power. Add 15% for dropouts. Use the right software. Document everything. And remember: in bioequivalence, the smallest mistake can have the biggest cost.

What is the standard power level for a BE study?

Regulatory agencies like the FDA and EMA require a minimum power of 80%. Many sponsors aim for 90%, especially for drugs with narrow therapeutic indexes or when submitting globally. Power below 80% is generally not accepted.

Why is CV% so important in sample size calculations?

CV% (coefficient of variation) measures how much a person’s response to the drug varies across doses. Higher CV% means more variability, which requires a larger sample size to detect equivalence. For example, a drug with 30% CV needs twice as many subjects as one with 20% CV under the same conditions. Using inaccurate CV% estimates is the leading cause of BE study failures.

Can I use CV% from published literature to plan my study?

It’s risky. The FDA found that literature-based CV% estimates underestimate true variability by 5-8 percentage points in 63% of cases. Always use pilot data from your own formulation or a closely related product. Relying on literature has caused 37% of BE study failures in oncology generics between 2015 and 2020.

What is RSABE and when is it used?

RSABE stands for Reference-Scaled Average Bioequivalence. It’s a method used for highly variable drugs (CV > 30%) where fixed 80-125% equivalence limits become impractical. RSABE widens the acceptance range based on the reference drug’s variability, allowing smaller sample sizes - often cutting them from 100+ to 24-48 subjects. It requires regulatory approval before the study begins.

Do I need to calculate power for both Cmax and AUC?

Yes. A BE study must demonstrate equivalence for both Cmax (peak concentration) and AUC (total exposure). Most drugs have different variability for each. If you only power for the less variable parameter, you risk failing the other. Only 45% of sponsors currently calculate joint power - but regulatory agencies expect it.

How much should I increase my sample size for dropouts?

Add 10-15% to your calculated sample size to account for dropouts. If your calculation says 26 subjects, plan for 30. If it says 52, plan for 60. Failing to adjust for dropouts can reduce your final power by 5-10%, turning a pass into a fail.

What happens if my BE study fails due to sample size?

A failure due to inadequate power or sample size typically results in a Complete Response Letter from the FDA or EMA. You’ll need to redesign the study, collect new data, and resubmit - costing $2 million to $5 million and delaying launch by 6-12 months. It’s one of the most preventable - and costly - mistakes in generic drug development.

Tags: bioequivalence study power analysis sample size CV% GMR regulatory guidelines

12 Comments

Gaurav Kumar March 20, 2026 AT 02:32

This is exactly why Indian pharma companies are getting crushed in global markets. You can't just copy-paste literature CV% from some 2018 paper and call it a day. We've got the talent, the infrastructure, and the regulatory experience - but too many teams are lazy. Pilot data isn't optional. It's the bedrock. And if you're using FARTSSIE without validating against PASS? You're asking for a CRL. 🤦‍♂️
David Robinson March 21, 2026 AT 01:50

I've reviewed 47 BE submissions in the last 3 years. 32 of them failed because of sample size. Not because of chemistry. Not because of PK. Because someone thought 20 subjects would be enough. You don't get to wing it. The FDA doesn't care if you're 'on a budget.' Your math is either right or it's a waste of $3M. Period.
Jeremy Van Veelen March 21, 2026 AT 16:04

I remember the day our team submitted a BE study with 38 subjects and got slapped with a CRL. We were so proud. We’d used the latest nQuery template. We’d run simulations. We’d even included joint power. Then we found out - we’d used a CV% of 22% from a *different* drug class. The actual CV? 38%. We had to restart. Took 11 months. Cost $4.2M. I cried in the parking lot. Don’t be us.
Laura Gabel March 21, 2026 AT 21:59

Why are we still using 80% power? We're in 2024. We have AI. We have adaptive designs. We have real-time PK modeling. 80% is medieval. If you're not at 90%+, you're not serious. And stop using literature CVs. That's like building a bridge with napkin math.
jerome Reverdy March 22, 2026 AT 01:25

Just to clarify for folks who are new to this - BE isn't like a superiority trial. You're not trying to prove one drug is better. You're proving it's the same. That means you need tight confidence intervals around the ratio. That means you need power. That means you need real CV data. And yeah, if your Cmax is wilder than your AUC, you gotta power for both. No shortcuts. But also - don't panic. Use PASS. Document everything. Talk to your biostat team. It's not magic. It's math with consequences.
Aileen Nasywa Shabira March 22, 2026 AT 09:09

LOL so the FDA is the ‘gold standard’? Funny how every time a US company fails, they blame the math. Meanwhile, EU and China are approving generics with half the data. You’re all just scared of innovation. RSABE? Adaptive designs? You’d rather pay $5M to redo a study than admit the old system is broken. Pathetic.
Kal Lambert March 24, 2026 AT 04:56

If you're doing BE and not adding 15% for dropouts, you're just gambling. I've seen too many studies where the math looked perfect on paper - then 3 people got sick, 2 quit, and 1 moved countries. Suddenly your power drops to 68%. You didn't fail because of bad science. You failed because you didn't plan for humans. Add the buffer. It's cheap insurance.
Manish Singh March 24, 2026 AT 08:26

From India to the world - we’ve got the labs, the talent, and the grit. But we need to stop copying Western templates blindly. Our local populations have different PK profiles. A CV% from a US study? Useless here. We need more local pilot data. And yes, we need better training. But we’re getting there. Proud to be part of this shift.
Nilesh Khedekar March 25, 2026 AT 09:06

bro i think the fda is just trying to keep big pharma rich. why do we need 90% power? 70% is fine. and why do we even need pilot data? i mean, if the brand drug works, the copy should too. its not rocket science. also i heard the cdc is hiding the real cv data. just saying.
jared baker March 27, 2026 AT 02:30

Simple version: More variation = more people. Don't guess. Use real data. Add 10-15%. Power both endpoints. Use the right tool. Document everything. That's it. No need to overcomplicate.
gemeika hernandez March 27, 2026 AT 05:12

I just read this whole thing and I'm crying. I work in a small lab. We don't have PASS. We don't have pilot data. We use ClinCalc and hope. And then we get rejected. And then we cry. And then we try again. Why is this so hard? Why can't they just let us submit and fix it later? We're not monsters. We're just trying to make medicine affordable.
Nicole Blain March 28, 2026 AT 13:55

This is why I love biostats 🤓 Seriously - it’s like solving a puzzle with real-world consequences. Get the CV right? You save millions. Get it wrong? You watch your team’s 18-month effort vanish. And yeah, the dropout buffer? Non-negotiable. I always add 15% and then send a Slack message to the ops team: 'Don’t you dare lose anyone.' 😅

Write a comment